• Title/Summary/Keyword: Sequence Classification

Search Result 400, Processing Time 0.027 seconds

A Design of Index/XML Sequence Relation Information System for Product Abstraction and Classification (산출물 추출 및 분류를 위한 Index/XML순서관계 시스템 설계)

  • Sun Su-Kyun
    • The KIPS Transactions:PartD
    • /
    • v.12D no.1 s.97
    • /
    • pp.111-120
    • /
    • 2005
  • Software development creates many product that class components, Class Diagram, form, object, and design pattern. So this Paper suggests Index/XML Sequence Relation information system for product abstraction and classification, the system of design product Sequence Relation abstraction which can store, reuse design patterns in the meta modeling database with pattern Relation information. This is Index/XML Sequence Relation system which can easily change various relation information of product for product abstraction and classification. This system designed to extract and classify design pattern efficiently and then functional indexing, sequence base indexing for standard pattern, code indexing to change pattern into code and grouping by Index-ID code, and its role information can apply by structural extraction and design pattern indexing process. and it has managed various products, class item, diagram, forms, components and design pattern.

A K-Nearest Neighbor Algorithm for Categorical Sequence Data (범주형 시퀀스 데이터의 K-Nearest Neighbor알고리즘)

  • Oh Seung-Joon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.2 s.34
    • /
    • pp.215-221
    • /
    • 2005
  • TRecently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. In this Paper, we study how to classify these sequence datasets. There are several kinds techniques for data classification such as decision tree induction, Bayesian classification and K-NN etc. In our approach, we use a K-NN algorithm for classifying sequences. In addition, we propose a new similarity measure to compute the similarity between two sequences and an efficient method for measuring similarity.

  • PDF

Image Sequence Compression based on Adaptive Classification of Interframe Difference Image Blocks (프레임간 차영상 블록의 적응분류에 의한 영상시퀀스 압축)

  • Ahn, Chul-Joon;Kong, Seong-Gon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.8 no.6
    • /
    • pp.122-128
    • /
    • 1998
  • This paper presents compression of image sequences based on the classification of interframe difference image blocks. classification process consists of image activity classification and energy distribution classification. In the activity classification, interframe difference image blocks are classified into activity blocks and non-activity blocks using the edge detection. In the distribution classification, activity blocks are further classified into vertical blocks, horizontal blocks, and small activity blocks using the AC energy distribution features. The RBFN, trained with numerical classification results, successfully classifies difference image blocks according to image details. Image sequence compressing based on the classification of interframe difference image blocks using the RBFN shows better compression results and less training time than the classical sorting method and the MLP network.

  • PDF

Light-weight Classification Model for Android Malware through the Dimensional Reduction of API Call Sequence using PCA

  • Jeon, Dong-Ha;Lee, Soo-Jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.11
    • /
    • pp.123-130
    • /
    • 2022
  • Recently, studies on the detection and classification of Android malware based on API Call sequence have been actively carried out. However, API Call sequence based malware classification has serious limitations such as excessive time and resource consumption in terms of malware analysis and learning model construction due to the vast amount of data and high-dimensional characteristic of features. In this study, we analyzed various classification models such as LightGBM, Random Forest, and k-Nearest Neighbors after significantly reducing the dimension of features using PCA(Principal Component Analysis) for CICAndMal2020 dataset containing vast API Call information. The experimental result shows that PCA significantly reduces the dimension of features while maintaining the characteristics of the original data and achieves efficient malware classification performance. Both binary classification and multi-class classification achieve higher levels of accuracy than previous studies, even if the data characteristics were reduced to less than 1% of the total size.

Malware Classification using Dynamic Analysis with Deep Learning

  • Asad Amin;Muhammad Nauman Durrani;Nadeem Kafi;Fahad Samad;Abdul Aziz
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.8
    • /
    • pp.49-62
    • /
    • 2023
  • There has been a rapid increase in the creation and alteration of new malware samples which is a huge financial risk for many organizations. There is a huge demand for improvement in classification and detection mechanisms available today, as some of the old strategies like classification using mac learning algorithms were proved to be useful but cannot perform well in the scalable auto feature extraction scenario. To overcome this there must be a mechanism to automatically analyze malware based on the automatic feature extraction process. For this purpose, the dynamic analysis of real malware executable files has been done to extract useful features like API call sequence and opcode sequence. The use of different hashing techniques has been analyzed to further generate images and convert them into image representable form which will allow us to use more advanced classification approaches to classify huge amounts of images using deep learning approaches. The use of deep learning algorithms like convolutional neural networks enables the classification of malware by converting it into images. These images when fed into the CNN after being converted into the grayscale image will perform comparatively well in case of dynamic changes in malware code as image samples will be changed by few pixels when classified based on a greyscale image. In this work, we used VGG-16 architecture of CNN for experimentation.

Identification of Viral Taxon-Specific Genes (VTSG): Application to Caliciviridae

  • Kang, Shinduck;Kim, Young-Chang
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.23.1-23.5
    • /
    • 2018
  • Virus taxonomy was initially determined by clinical experiments based on phenotype. However, with the development of sequence analysis methods, genotype-based classification was also applied. With the development of genome sequence analysis technology, there is an increasing demand for virus taxonomy to be extended from in vivo and in vitro to in silico. In this study, we verified the consistency of the current International Committee on Taxonomy of Viruses taxonomy using an in silico approach, aiming to identify the specific sequence for each virus. We applied this approach to norovirus in Caliciviridae, which causes 90% of gastroenteritis cases worldwide. First, based on the dogma "protein structure determines its function," we hypothesized that the specific sequence can be identified by the specific structure. Firstly, we extracted the coding region (CDS). Secondly, the CDS protein sequences of each genus were annotated by the conserved domain database (CDD) search. Finally, the conserved domains of each genus in Caliciviridae are classified by RPS-BLAST with CDD. The analysis result is that Caliciviridae has sequences including RNA helicase in common. In case of Norovirus, Calicivirus coat protein C terminal and viral polyprotein N-terminal appears as a specific domain in Caliciviridae. It does not include in the other genera in Caliciviridae. If this method is utilized to detect specific conserved domains, it can be used as classification keywords based on protein functional structure. After determining the specific protein domains, the specific protein domain sequences would be converted to gene sequences. This sequences would be re-used one of viral bio-marks.

Malware Classification Possibility based on Sequence Information (순서 정보 기반 악성코드 분류 가능성)

  • Yun, Tae-Uk;Park, Chan-Soo;Hwang, Tae-Gyu;Kim, Sung Kwon
    • Journal of KIISE
    • /
    • v.44 no.11
    • /
    • pp.1125-1129
    • /
    • 2017
  • LSTM(Long Short-term Memory) is a kind of RNN(Recurrent Neural Network) in which a next-state is updated by remembering the previous states. The information of calling a sequence in a malware can be defined as system call function that is called at each time. In this paper, we use calling sequences of system calls in malware codes as input for malware classification to utilize the feature remembering previous states via LSTM. We run an experiment to show that our method can classify malware and measure accuracy by changing the length of system call sequences.

Phylogeny of Flavobacteria Group Isolated from Freshwater Using Multilocus Sequencing Analysis

  • Mun, Seyoung;Lee, Jungnam;Lee, Siwon;Han, Kyudong;Ahn, Tae-Young
    • Genomics & Informatics
    • /
    • v.11 no.4
    • /
    • pp.272-276
    • /
    • 2013
  • Sequence analysis of the 16S rRNA gene has been widely used for the classification of microorganisms. However, we have been unable to clearly identify five Flavobacterium species isolated from a freshwater by using the gene as a single marker, because the evolutionary history is incomplete and the pace of DNA substitutions is relatively rapid in the bacteria. In this study, we tried to classify Flavobacterium species through multilocus sequence analysis (MLSA), which is a practical and reliable technique for the identification or classification of bacteria. The five Flavobacterium species isolated from freshwater and 37 other strains were classified based on six housekeeping genes: gyrB, dnaK, tuf, murG, atpA, and glyA. The genes were amplified by PCR and subjected to DNA sequencing. Based on the combined DNA sequence (4,412 bp) of the six housekeeping genes, we analyzed the phylogenetic relationship among the Flavobacterium species. The results indicated that MLSA, based on the six housekeeping genes, is a trustworthy method for the identification of closely related Flavobacterium species.

Negative Selection Algorithm for DNA Pattern Classification

  • Lee, Dong-Wook;Sim, Kwee-Bo
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.190-195
    • /
    • 2004
  • We propose a pattern classification algorithm using self-nonself discrimination principle of immune cells and apply it to DNA pattern classification problem. Pattern classification problem in bioinformatics is very important and frequent one. In this paper, we propose a classification algorithm based on the negative selection of the immune system to classify DNA patterns. The negative selection is the process to determine an antigenic receptor that recognize antigens, nonself cells. The immune cells use this antigen receptor to judge whether a self or not. If one composes ${\eta}$ groups of antigenic receptor for ${\eta}$ different patterns, these receptor groups can classify into ${\eta}$ patterns. We propose a pattern classification algorithm based on the negative selection in nucleotide base level and amino acid level. Also to show the validity of our algorithm, experimental results of RNA group classification are presented.

  • PDF

Fuzzy-based Threshold Controlling Method for ART1 Clustering in GPCR Classification (GPCR 분류에서 ART1 군집화를 위한 퍼지기반 임계값 제어 기법)

  • Cho, Kyu-Cheol;Ma, Yong-Beom;Lee, Jong-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.6
    • /
    • pp.167-175
    • /
    • 2007
  • Fuzzy logic is used to represent qualitative knowledge and provides interpretability to a controlling system model in bioinformatics. This paper focuses on a bioinformatics data classification which is an important bioinformatics application. This paper reviews the two traditional controlling system models The sequence-based threshold controller have problems of optimal range decision for threshold readjustment and long processing time for optimal threshold induction. And the binary-based threshold controller does not guarantee for early system stability in the GPCR data classification for optimal threshold induction. To solve these problems, we proposes a fuzzy-based threshold controller for ART1 clustering in GPCR classification. We implement the proposed method and measure processing time by changing an induction recognition success rate and a classification threshold value. And, we compares the proposed method with the sequence-based threshold controller and the binary-based threshold controller The fuzzy-based threshold controller continuously readjusts threshold values with membership function of the previous recognition success rate. The fuzzy-based threshold controller keeps system stability and improves classification system efficiency in GPCR classification.

  • PDF