• Title/Summary/Keyword: Sequence Classification

Search Result 400, Processing Time 0.021 seconds

Could Decimal-binary Vector be a Representative of DNA Sequence for Classification?

  • Sanjaya, Prima;Kang, Dae-Ki
    • International journal of advanced smart convergence
    • /
    • v.5 no.3
    • /
    • pp.8-15
    • /
    • 2016
  • In recent years, one of deep learning models called Deep Belief Network (DBN) which formed by stacking restricted Boltzman machine in a greedy fashion has beed widely used for classification and recognition. With an ability to extracting features of high-level abstraction and deal with higher dimensional data structure, this model has ouperformed outstanding result on image and speech recognition. In this research, we assess the applicability of deep learning in dna classification level. Since the training phase of DBN is costly expensive, specially if deals with DNA sequence with thousand of variables, we introduce a new encoding method, using decimal-binary vector to represent the sequence as input to the model, thereafter compare with one-hot-vector encoding in two datasets. We evaluated our proposed model with different contrastive algorithms which achieved significant improvement for the training speed with comparable classification result. This result has shown a potential of using decimal-binary vector on DBN for DNA sequence to solve other sequence problem in bioinformatics.

Online Selective-Sample Learning of Hidden Markov Models for Sequence Classification

  • Kim, Minyoung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.15 no.3
    • /
    • pp.145-152
    • /
    • 2015
  • We consider an online selective-sample learning problem for sequence classification, where the goal is to learn a predictive model using a stream of data samples whose class labels can be selectively queried by the algorithm. Given that there is a limit to the total number of queries permitted, the key issue is choosing the most informative and salient samples for their class labels to be queried. Recently, several aggressive selective-sample algorithms have been proposed under a linear model for static (non-sequential) binary classification. We extend the idea to hidden Markov models for multi-class sequence classification by introducing reasonable measures for the novelty and prediction confidence of the incoming sample with respect to the current model, on which the query decision is based. For several sequence classification datasets/tasks in online learning setups, we demonstrate the effectiveness of the proposed approach.

Korean Semantic Role Labeling Using Structured SVM (Structural SVM 기반의 한국어 의미역 결정)

  • Lee, Changki;Lim, Soojong;Kim, Hyunki
    • Journal of KIISE
    • /
    • v.42 no.2
    • /
    • pp.220-226
    • /
    • 2015
  • Semantic role labeling (SRL) systems determine the semantic role labels of the arguments of predicates in natural language text. An SRL system usually needs to perform four tasks in sequence: Predicate Identification (PI), Predicate Classification (PC), Argument Identification (AI), and Argument Classification (AC). In this paper, we use the Korean Propbank to develop our Korean semantic role labeling system. We describe our Korean semantic role labeling system that uses sequence labeling with structured Support Vector Machine (SVM). The results of our experiments on the Korean Propbank dataset reveal that our method obtains a 97.13% F1 score on Predicate Identification and Classification (PIC), and a 76.96% F1 score on Argument Identification and Classification (AIC).

Improving Malicious Web Code Classification with Sequence by Machine Learning

  • Paik, Incheon
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.5
    • /
    • pp.319-324
    • /
    • 2014
  • Web applications make life more convenient. Many web applications have several kinds of user input (e.g. personal information, a user's comment of commercial goods, etc.) for the activities. On the other hand, there are a range of vulnerabilities in the input functions of Web applications. Malicious actions can be attempted using the free accessibility of many web applications. Attacks by the exploitation of these input vulnerabilities can be achieved by injecting malicious web code; it enables one to perform a variety of illegal actions, such as SQL Injection Attacks (SQLIAs) and Cross Site Scripting (XSS). These actions come down to theft, replacing personal information, or phishing. The existing solutions use a parser for the code, are limited to fixed and very small patterns, and are difficult to adapt to variations. A machine learning method can give leverage to cover a far broader range of malicious web code and is easy to adapt to variations and changes. Therefore, this paper suggests the adaptable classification of malicious web code by machine learning approaches for detecting the exploitation user inputs. The approach usually identifies the "looks-like malicious" code for real malicious code. More detailed classification using sequence information is also introduced. The precision for the "looks-like malicious code" is 99% and for the precise classification with sequence is 90%.

Protein Disorder/Order Region Classification Using EPs-TFP Mining Method (EPs-TFP 마이닝 기법을 이용한 단백질 Disorder/Order 지역 분류)

  • Lee, Heon Gyu;Shin, Yong Ho
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.6
    • /
    • pp.59-72
    • /
    • 2012
  • Since a protein displays its specific functions when disorder region of protein sequence transits to order region with provoking a biological reaction, the separation of disorder region and order region from the sequence data is urgently necessary for predicting three dimensional structure and characteristics of the protein. To classify the disorder and order region efficiently, this paper proposes a classification/prediction method using sequence data while acquiring a non-biased result on a specific characteristics of protein and improving the classification speed. The emerging patterns based EPs-TFP methods utilizes only the essential emerging pattern in which the redundant emerging patterns are removed. This classification method finds the sequence patterns of disorder region, such sequence patterns are frequently shown in disorder region but relatively not frequently in the order region. We expand P-tree and T-tree conceptualized TFP method into a classification/prediction method in order to improve the performance of the proposed algorithm. We used Disprot 4.9 and CASP 7 data to evaluate EPs-TFP technique, the results of order/disorder classification show sensitivity 73.6, specificity 69.51 and accuracy 74.2.

Reference String Recognition based on Word Sequence Tagging and Post-processing: Evaluation with English and German Datasets

  • Kang, In-Su
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.5
    • /
    • pp.1-7
    • /
    • 2018
  • Reference string recognition is to extract individual reference strings from a reference section of an academic article, which consists of a sequence of reference lines. This task has been attacked by heuristic-based, clustering-based, classification-based approaches, exploiting lexical and layout characteristics of reference lines. Most classification-based methods have used sequence labeling to assign labels to either a sequence of tokens within reference lines, or a sequence of reference lines. Unlike the previous token-level sequence labeling approach, this study attempts to assign different labels to the beginning, intermediate and terminating tokens of a reference string. After that, post-processing is applied to identify reference strings by predicting their beginning and/or terminating tokens. Experimental evaluation using English and German reference string recognition datasets shows that the proposed method obtains above 94% in the macro-averaged F1.

Negative Selection Algorithm for DNA Sequence Classification

  • Lee, Dong Wook;Sim, Kwee-Bo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.2
    • /
    • pp.231-235
    • /
    • 2004
  • According to revealing the DNA sequence of human and living things, it increases that a demand on a new computational processing method which utilizes DNA sequence information. In this paper we propose a classification algorithm based on negative selection of the immune system to classify DNA patterns. Negative selection is the process to determine an antigenic receptor that recognize antigens, nonself cells. The immune cells use this antigen receptor to judge whether a self or not. If one composes n group of antigenic receptor for n different patterns, they can classify into n patterns. In this paper we propose a pattern classification algorithm based on negative selection in nucleotide base level and amino acid level.

A Domain Action Classification Model Using Conditional Random Fields (Conditional Random Fields를 이용한 영역 행위 분류 모델)

  • Kim, Hark-Soo
    • Korean Journal of Cognitive Science
    • /
    • v.18 no.1
    • /
    • pp.1-14
    • /
    • 2007
  • In a goal-oriented dialogue, speakers' intentions can be represented by domain actions that consist of pairs of a speech act and a concept sequence. Therefore, if we plan to implement an intelligent dialogue system, it is very important to correctly infer the domain actions from surface utterances. In this paper, we propose a statistical model to determine speech acts and concept sequences using conditional random fields at the same time. To avoid biased learning problems, the proposed model uses low-level linguistic features such as lexicals and parts-of-speech. Then, it filters out uninformative features using the chi-square statistic. In the experiments in a schedule arrangement domain, the proposed system showed good performances (the precision of 93.0% on speech act classification and the precision of 90.2% on concept sequence classification).

  • PDF

Phylogenetic Relationships of the Aphyllophorales Inferred from Sequence analysis of Nuclear Small Subunit Ribosomal DNA

  • Kim, Seon-Young;Jung, Hack-Sung
    • Journal of Microbiology
    • /
    • v.38 no.3
    • /
    • pp.122-131
    • /
    • 2000
  • Phylogenetic classification of the Aphyllophorales was conducted based on the analysis of nuclear small subunit ribosomal RNA (nuc SSU rDNA) sequence. Based on phylogenetic groupings and taxonomic characters, 16 families were recognized and discussed. Although many of the characters had more or less homoplasies, miroscopic characters such ad the mitic system and clamp, spore amyloidity and rot type appeared to be important in the classification of the Aphyllophorales. Phylogenetically significant families were newly defined to improve the classification of the order Aphyllophorales.

  • PDF

A New Galaxy Classification Scheme in the WISE Color-Luminosity Diagram

  • Lee, Gwang-Ho;Sohn, Jubee;Lee, Myung Gyoon
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.38 no.2
    • /
    • pp.49.1-49.1
    • /
    • 2013
  • We present a new galaxy classification scheme in the Wide-field Infrared Survey Explorer (WISE) [$3.4{\mu}m$]-[$12{\mu}m$] color versus $12{\mu}m$ luminosity diagram. In this diagram, galaxies can be classified into three groups in different evolutionary stages. Late-type galaxies are distributed linearly along "MIR star-forming sequence" identified by Hwang et al. (2012). Some early-type galaxies show another sequence at [3.4]-[12] $(AB){\simeq}-2.0$, and we call this 'MIR blue sequence'. They are quiescent systems with old stellar population older than 10 Gyr. Between the MIR star-forming sequence and the MIR blue sequence, some early- and late-type galaxies are sparsely distributed, and we call these galaxies 'MIR green cloud galaxies'. Interestingly, both MIR blue sequence galaxies and MIR green cloud ones lie on the red sequence in the optical color-magnitude diagram. However, MIR green cloud galaxies have lower stellar masses and younger stellar populations (smaller $D_n4000$) than MIR blue sequence galaxies, suggesting that MIR green cloud galaxies are in the transition stage from MIR star-forming sequence galaxies to MIR blue sequence ones. We present differences in various galaxy properties between the three MIR classes using a multi-wavelength data, combined with the WISE and Sloan Digital Sky Survey Data Release 10, of local (0.03 < z < 0.07) galaxies.

  • PDF