• Title/Summary/Keyword: sequence retrieval

Search Result 108, Processing Time 0.02 seconds

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences

  • Kang, Tae-Ho;Yoo, Jae-Soo;Kim, Hak-Yong;Lee, Byoung-Yup
    • International Journal of Contents
    • /
    • v.3 no.2
    • /
    • pp.18-24
    • /
    • 2007
  • Biological sequences such as DNA and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of more than hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological datasets with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with a fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. The experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

Unification System for Analysis of DNA Sequence (DNA 서열 분석을 위한 통합 시스템)

  • Song, Young-Ohk;Chang, Duk-Jin
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.3
    • /
    • pp.65-72
    • /
    • 2011
  • We stand at real world that some practical use method of gene information appears in succession by entrance on the stage of advanced techonlogy. As a lot of studies and development are achieved based on analysis of bio data, necessity of a tool that can help correct interpretation of data is required more and more in a lot of targets of bioinformatics to search new relation and information are established. In this paper, we are offered in existing I wish to offer user a more convenient study tool developing system that can supplement shortcomings of various tools for data analysis. So we've designed to offer in united environment that is not environment that is parted ORF driving out, bio information retrieval and work of similarity comparison lamp to work for bio data analysis and offers lacking consecutiveness in existing analysis system.

Thai Classical Music Matching Using t-Distribution on Instantaneous Robust Algorithm for Pitch Tracking Framework

  • Boonmatham, Pheerasut;Pongpinigpinyo, Sunee;Soonklang, Tasanawan
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1213-1228
    • /
    • 2017
  • The pitch tracking of music has been researched for several decades. Several possible improvements are available for creating a good t-distribution, using the instantaneous robust algorithm for pitch tracking framework to perfectly detect pitch. This article shows how to detect the pitch of music utilizing an improved detection method which applies a statistical method; this approach uses a pitch track, or a sequence of frequency bin numbers. This sequence is used to create an index that offers useful features for comparing similar songs. The pitch frequency spectrum is extracted using a modified instantaneous robust algorithm for pitch tracking (IRAPT) as a base combined with the statistical method. The pitch detection algorithm was implemented, and the percentage of performance matching in Thai classical music was assessed in order to test the accuracy of the algorithm. We used the longest common subsequence to compare the similarities in pitch sequence alignments in the music. The experimental results of this research show that the accuracy of retrieval of Thai classical music using the t-distribution of instantaneous robust algorithm for pitch tracking (t-IRAPT) is 99.01%, and is in the top five ranking, with the shortest query sample being five seconds long.

A Study of Similarity Measures on Multidimensional Data Sequences Using Semantic Information (의미 정보를 이용한 다차원 데이터 시퀀스의 유사성 척도 연구)

  • Lee, Seok-Lyong;Lee, Ju-Hong;Chun, Seok-Ju
    • The KIPS Transactions:PartD
    • /
    • v.10D no.2
    • /
    • pp.283-292
    • /
    • 2003
  • One-dimensional time-series data have been studied in various database applications such as data mining and data warehousing. However, in the current complex business environment, multidimensional data sequences (MDS') become increasingly important in addition to one-dimensional time-series data. For example, a video stream can be modeled as an MDS in the multidimensional space with respect to color and texture attributes. In this paper, we propose the effective similarity measures on which the similar pattern retrieval is based. An MDS is partitioned into segments, each of which is represented by various geometric and semantic features. The similarity measures are defined on the basis of these segments. Using the measures, irrelevant segments are pruned from a database with respect to a given query. Both data sequences and query sequences are partitioned into segments, and the query processing is based upon the comparison of the features between data and query segments, instead of scanning all data elements of entire sequences.

A motion descriptor design combining the global feature of an image and the local one of an moving object (영상의 전역 특징과 이동객체의 지역 특징을 융합한 움직임 디스크립터 설계)

  • Jung, Byeong-Man;Lee, Kyu-Won
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.10a
    • /
    • pp.898-902
    • /
    • 2012
  • A descriptor which is suitable for motion analysis by using the motion features of moving objects from the real time image sequence is proposed. To segment moving objects from the background, the background learning is performed. We extract motion trajectories of individual objects by using the sequence of the $1^{st}$ order moment of moving objects. The center points of each object are managed by linked list. The descriptor includes the $1^{st}$ order coordinates of moving object belong to neighbor of the per-defined position in grid pattern, the start frame number which a moving object appeared in the scene and the end frame number which it disappeared. A video retrieval by the proposed descriptor combining global and local feature is more effective than conventional methods which adopt a single feature among global and local features.

  • PDF

Implementation of an Information Management System for Nucleotide Sequences based on BSML using Active Trigger Rules (BSML 기반 능동 트리거 규칙을 이용한 염기서열정보관리시스템의 구현)

  • Park Sung Hee;Jung Kwang Su;Ryu Keun Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.1
    • /
    • pp.24-42
    • /
    • 2005
  • Characteristics of biological data including genome sequences are heterogeneous and various. Although the need of management systems for genome sequencing which should reflect biological characteristics has been raised, most current biological databases provide restricted function as repositories for biological data. Therefore, this paper describes a management system of nucleotide sequences at the level of biological laboratories. It includes format transformation, editing, storing and retrieval for collected nucleotide sequences from public databases, and handles sequence produced by experiments. It uses BSML based on XML as a common format in order to extract data fields and transfer heterogeneous sequence formats. To manage sequences and their changes, version management system for originated DNA is required so as to detect transformed new sequencing appearance and trigger database update. Our experimental results show that applying active trigger rules to manage changes of sequences can automatically store changes of sequences into databases.

Video Browsing Service Using An Efficient Scene Change Detection (효율적인 장면전환 검출을 이용한 비디오 브라우징 서비스)

  • Seong-Yoon Shin;Yang-Won Rhee
    • Journal of Internet Computing and Services
    • /
    • v.3 no.2
    • /
    • pp.69-77
    • /
    • 2002
  • Recently, Digital video is one of the important information media delivered on the Internet and playing an increasingly important role in multimedia. This paper proposes a Video Browsing Service(VBS) that provides both the video content retrieval and the video browsing by the real-time user interface on Web, For the scene segmentation and key frame extraction of video sequence, we proposes an efficient scene change detection method that combines the RGB color histogram with the $x^2$(Chi Square) histogram. Resulting key frames are linked by both physical and logical indexing, This system involves the video editing and retrieval function of a VCR's, Three elements that are the date, the field and the subject are used for video browsing. A Video Browsing Service is implemented with MySQL, PHP and JMF under Apache Web Server.

  • PDF

Pathway Retrieval for Transcriptome Analysis using Fuzzy Filtering Technique andWeb Service

  • Lee, Kyung-Mi;Lee, Keon-Myung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.12 no.2
    • /
    • pp.167-172
    • /
    • 2012
  • In biology the advent of the high-throughput technology for sequencing, probing, or screening has produced huge volume of data which could not be manually handled. Biologists have resorted to software tools in order to effectively handle them. This paper introduces a bioinformatics tool to help biologists find potentially interesting pathway maps from a transcriptome data set in which the expression levels of genes are described for both case and control samples. The tool accepts a transcriptome data set, and then selects and categorizes some of genes into four classes using a fuzzy filtering technique where classes are defined by membership functions. It collects and edits the pathway maps related to those selected genes without analyst' intervention. It invokes a sequence of web service functions from KEGG, which an online pathway database system, in order to retrieve related information, locate pathway maps, and manipulate them. It maintains all retrieved pathway maps in a local database and presents them to the analysts with graphical user interface. The tool has been successfully used in identifying target genes for further analysis in transcriptome study of human cytomegalovirous. The tool is very helpful in that it can considerably save analysts' time and efforts by collecting and presenting the pathway maps that contain some interesting genes, once a transcriptome data set is just given.

Fast-Converging Algorithm for Wavefront Reconstruction based on a Sequence of Diffracted Intensity Images

  • Chen, Ni;Yeom, Jiwoon;Hong, Keehoon;Li, Gang;Lee, Byoungho
    • Journal of the Optical Society of Korea
    • /
    • v.18 no.3
    • /
    • pp.217-224
    • /
    • 2014
  • A major advantage of wavefront reconstruction based on a series of diffracted intensity images using only single-beam illumination is the simplicity of setup. Here we propose a fast-converging algorithm for wavefront calculation using single-beam illumination. The captured intensity images are resampled to a series of intensity images, ranging from highest to lowest resampling; each resampled image has half the number of pixels as the previous one. Phase calculation at a lower resolution is used as the initial solution phase at a higher resolution. This corresponds to separately calculating the phase for the lower- and higher-frequency components. Iterations on the low-frequency components do not need to be performed on the higher-frequency components, thus making the convergence of the phase retrieval faster than with the conventional method. The principle is verified by both simulation and optical experiments.

Korean Abbreviation Generation using Sequence to Sequence Learning (Sequence-to-sequence 학습을 이용한 한국어 약어 생성)

  • Choi, Su Jeong;Park, Seong-Bae;Kim, Kweon-Yang
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.3
    • /
    • pp.183-187
    • /
    • 2017
  • Smart phone users prefer fast reading and texting. Hence, users frequently use abbreviated sequences of words and phrases. Nowadays, abbreviations are widely used from chat terms to technical terms. Therefore, gathering abbreviations would be helpful to many services, including information retrieval, recommendation system, and so on. However, manually gathering abbreviations needs to much effort and cost. This is because new abbreviations are continuously generated whenever a new material such as a TV program or a phenomenon is made. Thus it is required to generate of abbreviations automatically. To generate Korean abbreviations, the existing methods use the rule-based approach. The rule-based approach has limitations, in that it is unable to generate irregular abbreviations. Another problem is to decide the correct abbreviation among candidate abbreviations generated rules. To address the limitations, we propose a method of generating Korean abbreviations automatically using sequence-to-sequence learning in this paper. The sequence-to-sequence learning can generate irregular abbreviation and does not lead to the problem of deciding correct abbreviation among candidate abbreviations. Accordingly, it is suitable for generating Korean abbreviations. To evaluate the proposed method, we use dataset of two type. As experimental results, we prove that our method is effective for irregular abbreviations.