• Title/Summary/Keyword: Sequence Classification

Search Result 401, Processing Time 0.027 seconds

An Efficient Feature Point Extraction and Comparison Method through Distorted Region Correction in 360-degree Realistic Contents

  • Park, Byeong-Chan;Kim, Jin-Sung;Won, Yu-Hyeon;Kim, Young-Mo;Kim, Seok-Yoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.1
    • /
    • pp.93-100
    • /
    • 2019
  • One of critical issues in dealing with 360-degree realistic contents is the performance degradation in searching and recognition process since they support up to 4K UHD quality and have all image angles including the front, back, left, right, top, and bottom parts of a screen. To solve this problem, in this paper, we propose an efficient search and comparison method for 360-degree realistic contents. The proposed method first corrects the distortion at the less distorted regions such as front, left and right parts of the image excluding severely distorted regions such as upper and lower parts, and then it extracts feature points at the corrected region and selects the representative images through sequence classification. When the query image is inputted, the search results are provided through feature points comparison. The experimental results of the proposed method shows that it can solve the problem of performance deterioration when 360-degree realistic contents are recognized comparing with traditional 2D contents.

Prediction of subcellular localization of proteins using pairwise sequence alignment and support vector machine

  • Kim, Jong-Kyoung;Raghava, G. P. S.;Kim, Kwang-S.;Bang, Sung-Yang;Choi, Seung-Jin
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2004.11a
    • /
    • pp.158-166
    • /
    • 2004
  • Predicting the destination of a protein in a cell gives valuable information for annotating the function of the protein. Recent technological breakthroughs have led us to develop more accurate methods for predicting the subcellular localization of proteins. The most important factor in determining the accuracy of these methods, is a way of extracting useful features from protein sequences. We propose a new method for extracting appropriate features only from the sequence data by computing pairwise sequence alignment scores. As a classifier, support vector machine (SVM) is used. The overall prediction accuracy evaluated by the jackknife validation technique reach 94.70% for the eukaryotic non-plant data set and 92.10% for the eukaryotic plant data set, which show the highest prediction accuracy among methods reported so far with such data sets. Our numerical experimental results confirm that our feature extraction method based on pairwise sequence alignment, is useful for this classification problem.

  • PDF

Effect of Training Sequence Control in On-line Learning for Multilayer Perceptron (다계층 퍼셉트론의 온라인 학습에서 학습 순서 제어의 효과)

  • Lee, Jae-Young;Kim, Hwang-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.7
    • /
    • pp.491-502
    • /
    • 2010
  • When human beings acquire and develop knowledge through education, their prior knowledge influences the next learning process. As this is a fact that should be considered in machine learning, we need to examine the effects of controlling the order of training sequence on machine learning. In this research, the role of the supervisor is extended to control the order of training samples, in addition to just instructing the target values for classification problems. The supervisor sequences the training examples categorized by SOM to the learning model which in this case is MLP. The proposed method is distinguished in that it selects the most instructive example from categories formed by SOM to assist the learning progress, while others use SOM only as a preprocessing method for training samples. The result shows that the method is effective in terms of the number of samples used and time taken in training.

Distribution of Soil Series in Jeju Island by Proximity and Altitude (해발고도 및 인접성에 의한 제주도 토양통 분포특성)

  • Moon, Kyung-Hwan;Lim, Han-Cheol;Hyun, Hae-Nam
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.40 no.3
    • /
    • pp.221-228
    • /
    • 2007
  • Quantitative analysis of distribution characteristics of soils in Jeju Island was conducted by using geographic information system (GIS) technology. Soil series could be classified 5 groups after cluster analysis with proximity ratios among soil series which mean ratios of boundary lengths of other soils to total boundary length. Classification with proximity only was similar to conventional classification system at detailed soil map although conventional system was made from several criteria such as soil color, altitude and chemical characteristics of soils. Altitudinal sequence of soil series was also suggested from representative altitudes of them which could be found from areal distribution curve along altitudes. The sequence was brown forest soils - black soils - very dark brown soils - dark brown soils from the peak of Halla Mt. to the coast on all sides, which maybe related to pedogenesis process in Jeju Island.

Classification Protein Subcellular Locations Using n-Gram Features (단백질 서열의 n-Gram 자질을 이용한 세포내 위치 예측)

  • Kim, Jinsuk
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.12-16
    • /
    • 2007
  • The function of a protein is closely co-related with its subcellular location(s). Given a protein sequence, therefore, how to determine its subcellular location is a vitally important problem. We have developed a new prediction method for protein subcellular location(s), which is based on n-gram feature extraction and k-nearest neighbor (kNN) classification algorithm. It classifies a protein sequence to one or more subcellular compartments based on the locations of top k sequences which show the highest similarity weights against the input sequence. The similarity weight is a kind of similarity measure which is determined by comparing n-gram features between two sequences. Currently our method extract penta-grams as features of protein sequences, computes scores of the potential localization site(s) using kNN algorithm, and finally presents the locations and their associated scores. We constructed a large-scale data set of protein sequences with known subcellular locations from the SWISS-PROT database. This data set contains 51,885 entries with one or more known subcellular locations. Our method show very high prediction precision of about 93% for this data set, and compared with other method, it also showed comparable prediction improvement for a test collection used in a previous work.

  • PDF

The extension of the largest generalized-eigenvalue based distance metric Dij1) in arbitrary feature spaces to classify composite data points

  • Daoud, Mosaab
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.39.1-39.20
    • /
    • 2019
  • Analyzing patterns in data points embedded in linear and non-linear feature spaces is considered as one of the common research problems among different research areas, for example: data mining, machine learning, pattern recognition, and multivariate analysis. In this paper, data points are heterogeneous sets of biosequences (composite data points). A composite data point is a set of ordinary data points (e.g., set of feature vectors). We theoretically extend the derivation of the largest generalized eigenvalue-based distance metric Dij1) in any linear and non-linear feature spaces. We prove that Dij1) is a metric under any linear and non-linear feature transformation function. We show the sufficiency and efficiency of using the decision rule $\bar{{\delta}}_{{\Xi}i}$(i.e., mean of Dij1)) in classification of heterogeneous sets of biosequences compared with the decision rules min𝚵iand median𝚵i. We analyze the impact of linear and non-linear transformation functions on classifying/clustering collections of heterogeneous sets of biosequences. The impact of the length of a sequence in a heterogeneous sequence-set generated by simulation on the classification and clustering results in linear and non-linear feature spaces is empirically shown in this paper. We propose a new concept: the limiting dispersion map of the existing clusters in heterogeneous sets of biosequences embedded in linear and nonlinear feature spaces, which is based on the limiting distribution of nucleotide compositions estimated from real data sets. Finally, the empirical conclusions and the scientific evidences are deduced from the experiments to support the theoretical side stated in this paper.

Angle Difference Based State Transition Modeling Technique for the Classification of Signal Pattern from the Sensor Array (센서 어레이의 신호패턴 분류를 위한 각도 변이 기반 상태 천이 모델링 기법)

  • Kim, A-Ram;Lee, Seung-Jae;Kim, Sung-Kyung;Park, Soo-Hyun;Kim, Chang-Hwa
    • Journal of the Korea Society for Simulation
    • /
    • v.15 no.3
    • /
    • pp.49-60
    • /
    • 2006
  • We propose a method to use a state transition model so that the sensing object can be distinguished through classification of signal patterns sensed by a sensor array. Focusing on the design of the model that is able to distinguish the sensed object more exactly, we present an idea in which the modeling elements, 'states' and 'transitions' are defined as each same-sized angle intervals into which the angle interval $(-\frac{\pi}{2},\frac{\pi}{2})$ is divided and the angle differences between adjacent signal values on sampling signal value sequence value sequence sensed from the sensor array in the uniform time interval, respectively. In addition we show the usefulness of our model through experiments.

  • PDF

LSTM RNN-based Korean Speech Recognition System Using CTC (CTC를 이용한 LSTM RNN 기반 한국어 음성인식 시스템)

  • Lee, Donghyun;Lim, Minkyu;Park, Hosung;Kim, Ji-Hwan
    • Journal of Digital Contents Society
    • /
    • v.18 no.1
    • /
    • pp.93-99
    • /
    • 2017
  • A hybrid approach using Long Short Term Memory (LSTM) Recurrent Neural Network (RNN) has showed great improvement in speech recognition accuracy. For training acoustic model based on hybrid approach, it requires forced alignment of HMM state sequence from Gaussian Mixture Model (GMM)-Hidden Markov Model (HMM). However, high computation time for training GMM-HMM is required. This paper proposes an end-to-end approach for LSTM RNN-based Korean speech recognition to improve learning speed. A Connectionist Temporal Classification (CTC) algorithm is proposed to implement this approach. The proposed method showed almost equal performance in recognition rate, while the learning speed is 1.27 times faster.

IoT Malware Detection and Family Classification Using Entropy Time Series Data Extraction and Recurrent Neural Networks (엔트로피 시계열 데이터 추출과 순환 신경망을 이용한 IoT 악성코드 탐지와 패밀리 분류)

  • Kim, Youngho;Lee, Hyunjong;Hwang, Doosung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.5
    • /
    • pp.197-202
    • /
    • 2022
  • IoT (Internet of Things) devices are being attacked by malware due to many security vulnerabilities, such as the use of weak IDs/passwords and unauthenticated firmware updates. However, due to the diversity of CPU architectures, it is difficult to set up a malware analysis environment and design features. In this paper, we design time series features using the byte sequence of executable files to represent independent features of CPU architectures, and analyze them using recurrent neural networks. The proposed feature is a fixed-length time series pattern extracted from the byte sequence by calculating partial entropy and applying linear interpolation. Temporary changes in the extracted feature are analyzed by RNN and LSTM. In the experiment, the IoT malware detection showed high performance, while low performance was analyzed in the malware family classification. When the entropy patterns for each malware family were compared visually, the Tsunami and Gafgyt families showed similar patterns, resulting in low performance. LSTM is more suitable than RNN for learning temporal changes in the proposed malware features.

A Novel Algorithm for Fault Classification in Transmission Lines Using a Combined Adaptive Network and Fuzzy Inference System

  • Yeo, Sang-Min;Kim, Chun-Hwan
    • KIEE International Transactions on Power Engineering
    • /
    • v.3A no.4
    • /
    • pp.191-197
    • /
    • 2003
  • Accurate detection and classification of faults on transmission lines is vitally important. In this respect, many different types of faults occur, such as inter alia low impedance faults (LIF) and high impedance faults (HIF). The latter in particular pose difficulties for the commonly employed conventional overcurrent and distance relays, and if undetected, can cause damage to expensive equipment, threaten life and cause fire hazards. Although HIFs are far less common than LIFs, it is imperative that any protection device should be able to satisfactorily deal with both HIFs and LIFs. Because of the randomness and asymmetric characteristics of HIFs, their modeling is difficult and numerous papers relating to various HIF models have been published. In this paper, the model of HIFs in transmission lines is accomplished using the characteristics of a ZnO arrester, which is then implemented within the overall transmission system model based on the electromagnetic transients program (EMTP). This paper proposes an algorithm for fault detection and classification for both LIFs and HIFs using Adaptive Network-based Fuzzy Inference System (ANFIS). The inputs into ANFIS are current signals only based on Root-Mean-Square (RMS) values of 3-phase currents and zero sequence current. The performance of the proposed algorithm is tested on a typical 154 kV Korean transmission line system under various fault conditions. Test results demonstrate that the ANFIS can detect and classify faults including LIFs and HIFs accurately within half a cycle.