• Title/Summary/Keyword: sequence analysis

Search Result 6,346, Processing Time 0.036 seconds

Subband PRI analysis algorithm (Subband PRI 분석 알고리즘)

  • 윤원식
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.6
    • /
    • pp.1425-1429
    • /
    • 1996
  • A conventional sequence search algorithm for PRI analysis occurs the harmonic problem under missing pulses. An improved PRI analysis algorithm is proposedto remedy the harmonic problem. After dividing an overall PRI range into subbands withoug harmonic, a sequence search is done into forward and backward in time. The proposed algorithm increases the preformance compared with that of conventional sequence search algorithm.

  • PDF

A Pattern Summary System Using BLAST for Sequence Analysis

  • Choi, Han-Suk;Kim, Dong-Wook;Ryu, Tae-W.
    • Genomics & Informatics
    • /
    • v.4 no.4
    • /
    • pp.173-181
    • /
    • 2006
  • Pattern finding is one of the important tasks in a protein or DNA sequence analysis. Alignment is the widely used technique for finding patterns in sequence analysis. BLAST (Basic Local Alignment Search Tool) is one of the most popularly used tools in bio-informatics to explore available DNA or protein sequence databases. BLAST may generate a huge output for a large sequence data that contains various sequence patterns. However, BLAST does not provide a tool to summarize and analyze the patterns or matched alignments in the BLAST output file. BLAST lacks of general and robust parsing tools to extract the essential information out from its output. This paper presents a pattern summary system which is a powerful and comprehensive tool for discovering pattern structures in huge amount of sequence data in the BLAST. The pattern summary system can identify clusters of patterns, extract the cluster pattern sequences from the subject database of BLAST, and display the clusters graphically to show the distribution of clusters in the subject database.

Linear-Time Korean Morphological Analysis Using an Action-based Local Monotonic Attention Mechanism

  • Hwang, Hyunsun;Lee, Changki
    • ETRI Journal
    • /
    • v.42 no.1
    • /
    • pp.101-107
    • /
    • 2020
  • For Korean language processing, morphological analysis is a critical component that requires extensive work. This morphological analysis can be conducted in an end-to-end manner without requiring a complicated feature design using a sequence-to-sequence model. However, the sequence-to-sequence model has a time complexity of O(n2) for an input length n when using the attention mechanism technique for high performance. In this study, we propose a linear-time Korean morphological analysis model using a local monotonic attention mechanism relying on monotonic alignment, which is a characteristic of Korean morphological analysis. The proposed model indicates an extreme improvement in a single threaded environment and a high morphometric F1-measure even for a hard attention model with the elimination of the attention mechanism formula.

A Reranking Model for Korean Morphological Analysis Based on Sequence-to-Sequence Model (Sequence-to-Sequence 모델 기반으로 한 한국어 형태소 분석의 재순위화 모델)

  • Choi, Yong-Seok;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.4
    • /
    • pp.121-128
    • /
    • 2018
  • A Korean morphological analyzer adopts sequence-to-sequence (seq2seq) model, which can generate an output sequence of different length from an input. In general, a seq2seq based Korean morphological analyzer takes a syllable-unit based sequence as an input, and output a syllable-unit based sequence. Syllable-based morphological analysis has the advantage that unknown words can be easily handled, but has the disadvantages that morpheme-based information is ignored. In this paper, we propose a reranking model as a post-processor of seq2seq model that can improve the accuracy of morphological analysis. The seq2seq based morphological analyzer can generate K results by using a beam-search method. The reranking model exploits morpheme-unit embedding information as well as n-gram of morphemes in order to reorder K results. The experimental results show that the reranking model can improve 1.17% F1 score comparing with the original seq2seq model.

Correlation Analysis between Regulatory Sequence Motifs and Expression Profiles by Kernel CCA

  • Rhee, Je-Keun;Joung, Je-Gun;Chang, Jeong-Ho;Zhang, Byoung-Tak
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.63-68
    • /
    • 2005
  • Transcription factors regulate gene expression by binding to gene upstream region. Each transcription factor has the specific binding site in promoter region. So the analysis of gene upstream sequence is necessary for understanding regulatory mechanism of genes, under a plausible idea that assumption that DNA sequence motif profiles are closely related to gene expression behaviors of the corresponding genes. Here, we present an effective approach to the analysis of the relation between gene expression profiles and gene upstream sequences on the basis of kernel canonical correlation analysis (kernel CCA). Kernel CCA is a useful method for finding relationships underlying between two different data sets. In the application to a yeast cell cycle data set, it is shown that gene upstream sequence profile is closely related to gene expression patterns in terms of canonical correlation scores. By the further analysis of the contributing values or weights of sequence motifs in the construction of a pair of sequence motif profiles and expression profiles, we show that the proposed method can identify significant DNA sequence motifs involved with some specific gene expression patterns, including some well known motifs and those putative, in the process of the yeast cell cycle.

  • PDF

A Study on the Sequence Impedance Modeling of Underground Transmission Systems (지중송전선로의 대칭분 임피던스 모델링에 관한 연구)

  • Hwang, Young-Rok;Kim, Kyung-Chul
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.28 no.6
    • /
    • pp.60-67
    • /
    • 2014
  • Power system fault analysis is commonly based on well-known symmetrical component method, which describes power system elements by positive, negative and zero sequence impedance. The majority of fault in transmission lines is unbalanced fault, such as line-to-ground faults, so that both positive and zero sequence impedance is required for fault analysis. When unbalanced fault occurs, zero sequence current flows through earth and ground wires in overhead transmission systems and through cable sheaths and earth in underground transmission systems. Since zero sequence current distribution between cable sheath and earth is dependent on both sheath bondings and grounding configurations, care must be taken to calculate zero sequence impedance of underground cable transmission lines. In this paper, EMTP-based sequence impedance calculation method was described and applied to 345kV cable transmission systems. Calculation results showed that detailed circuit analysis is desirable to avoid possible errors of sequence impedance calculation resulted from various configuration of cable sheath bonding and grounding in underground cable transmission systems.

INSTABILITY OF THE BETTI SEQUENCE FOR PERSISTENT HOMOLOGY AND A STABILIZED VERSION OF THE BETTI SEQUENCE

  • JOHNSON, MEGAN;JUNG, JAE-HUN
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.25 no.4
    • /
    • pp.296-311
    • /
    • 2021
  • Topological Data Analysis (TDA), a relatively new field of data analysis, has proved very useful in a variety of applications. The main persistence tool from TDA is persistent homology in which data structure is examined at many scales. Representations of persistent homology include persistence barcodes and persistence diagrams, both of which are not straightforward to reconcile with traditional machine learning algorithms as they are sets of intervals or multisets. The problem of faithfully representing barcodes and persistent diagrams has been pursued along two main avenues: kernel methods and vectorizations. One vectorization is the Betti sequence, or Betti curve, derived from the persistence barcode. While the Betti sequence has been used in classification problems in various applications, to our knowledge, the stability of the sequence has never before been discussed. In this paper we show that the Betti sequence is unstable under the 1-Wasserstein metric with regards to small perturbations in the barcode from which it is calculated. In addition, we propose a novel stabilized version of the Betti sequence based on the Gaussian smoothing seen in the Stable Persistence Bag of Words for persistent homology. We then introduce the normalized cumulative Betti sequence and provide numerical examples that support the main statement of the paper.

Construction Sequence Analysis for Checking Stability in High-Rise Building under Construction (초고층 건물의 시공 중 안정성 검토를 위한 시공단계해석)

  • Kim, Jae-Yo
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 2008.04a
    • /
    • pp.618-623
    • /
    • 2008
  • Due to recent trends of the atypical plan shapes and the zoning construction in high-rise buildings, the building stability under construction is arising as an important issue for design and construction plan. To ensure the stability under construction, the differential column shortening and the lateral movements with unbalanced distributions of self-weight of structure members and the load flows before completion of member connections and lateral load resisting system should be checked by construction sequence analysis. This paper presents the scheme of zone-based construction sequence analysis, to check the stability of high-rise building under construction. This scheme is applied to the construction sequence analysis for real high-rise building under construction.

  • PDF

Protein Sequence Search based on N-gram Indexing

  • Hwang, Mi-Nyeong;Kim, Jin-Suk
    • Bioinformatics and Biosystems
    • /
    • v.1 no.1
    • /
    • pp.46-50
    • /
    • 2006
  • According to the advancement of experimental techniques in molecular biology, genomic and protein sequence databases are increasing in size exponentially, and mean sequence lengths are also increasing. Because the sizes of these databases become larger, it is difficult to search similar sequences in biological databases with significant homologies to a query sequence. In this paper, we present the N-gram indexing method to retrieve similar sequences fast, precisely and comparably. This method regards a protein sequence as a text written in language of 20 amino acid codes, adapts N-gram tokens of fixed-length as its indexing scheme for sequence strings. After such tokens are indexed for all the sequences in the database, sequences can be searched with information retrieval algorithms. Using this new method, we have developed a protein sequence search system named as ProSeS (PROtein Sequence Search). ProSeS is a protein sequence analysis system which provides overall analysis results such as similar sequences with significant homologies, predicted subcellular locations of the query sequence, and major keywords extracted from annotations of similar sequences. We show experimentally that the N-gram indexing approach saves the retrieval time significantly, and that it is as accurate as current popular search tool BLAST.

  • PDF

Analysis of Sequence Impedances of 345kV Cable Transmission Systems (실계통 345kV 지중송전선 대칭좌표 임피던스의 해석)

  • Choi, Jong-Kee;Ahn, Yong-Ho;Yoon, Yong-Beum;Oh, Sei-Ill;Kwa, Yang-Ho;Lee, Myoung-Hee
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.62 no.7
    • /
    • pp.905-912
    • /
    • 2013
  • Power system fault analysis is commonly based on well-known symmetrical component method, which describes power system elements by positive, negative and zero sequence impedance. In case of balanced fault, such as three phase short circuit, transmission line can be represented by positive sequence impedance only. The majority of fault in transmission lines, however, is unbalanced fault, such as line-to-ground faults, so that both positive and zero sequence impedance is required for fault analysis. When unbalanced fault occurs, zero sequence current flows through earth and skywires in overhead transmission systems and through cable sheaths and earth in cable transmission systems. Since zero sequence current distribution between cable sheath and earth is dependent on both sheath bondings and grounding configurations, care must be taken to calculate zero sequence impedance of underground cable transmission lines. In this paper, conventional and EMTP-based sequence impedance calculation methods were described and applied to 345kV cable transmission systems (4 circuit, OF 2000mm2). Calculation results showed that detailed circuit analysis is desirable to avoid possible errors of sequence impedance calculation resulted from various configuration of cable sheath bonding and grounding in underground cable transmission systems.