• Title/Summary/Keyword: burrows-wheeler transform

Search Result 5, Processing Time 0.022 seconds

Improved First-Phoneme Searches Using an Extended Burrows-Wheeler Transform (확장된 버로우즈-휠러 변환을 이용한 개선된 한글 초성 탐색)

  • Kim, Sung-Hwan;Cho, Hwan-Gue
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.12
    • /
    • pp.682-687
    • /
    • 2014
  • First phoneme queries are important functionalities that provide an improvement in the usability of interfaces that produce errors frequently due to their restricted input environment, such as in navigators and mobile devices. In this paper, we propose a time-space efficient data structure for Korean first phoneme queries that disassembles Korean strings in a phoneme-wise manner, rearranges them into circular strings, and finally, indexes them using the extended Burrows-Wheeler Transform. We also demonstrate that our proposed method can process more types of query using less space than previous methods. We also show it can improve the search time when the query length is shorter and the proportion of first phonemes is higher.

Burrows-Wheeler Transform based Lossless Image Compression using Subband Decomposition and Gradient Adjusted Prediction (대역분할과 GAP를 이용한 BWT기반의 무손실 영상 압축)

  • 윤정오;고승권;성우석;황찬식
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.9B
    • /
    • pp.1259-1266
    • /
    • 2001
  • 최근에 텍스트 압축에 뛰어난 성능을 가지는 블록 정렬 알고리즘인 BW변환 (Burrows-Wheeler Transform)이 소개되었다. 그러나 영상 압축에 BW변환을 직접 적용하면 영상과 텍스트가 갖는 상관성이 서로 다르기 때문에 만족할 만한 압축효과를 기대할 수 없게 된다. 따라서 본 논문에서는 가역의 L-SSKF(Lossless Symmetric Short Kernel Filter)를 사용한 계층적 대역분할로 영상화소 사이의 상관성을 줄인 후 BW변환을 하는 방법과 GAP(Gradient Adjusted Prediction)를 사용하여 LL 대역에 많이 분포된 상관성을 줄인 후 BW변환을 하는 방법을 제안한다. 실험결과 제안한 방법이 기존의 무손실 JPEG 표준안과 LZ 기반의 압축방법(PKZIP) 등에 비해 압축성능이 개선됨을 확인할 수 있었다.

  • PDF

Integrative Comparison of Burrows-Wheeler Transform-Based Mapping Algorithm with de Bruijn Graph for Identification of Lung/Liver Cancer-Specific Gene

  • Ajaykumar, Atul;Yang, Jung Jin
    • Journal of Microbiology and Biotechnology
    • /
    • v.32 no.2
    • /
    • pp.149-159
    • /
    • 2022
  • Cancers of the lung and liver are the top 10 leading causes of cancer death worldwide. Thus, it is essential to identify the genes specifically expressed in these two cancer types to develop new therapeutics. Although many messenger RNA (mRNA) sequencing data related to these cancer cells are available due to the advancement of next-generation sequencing (NGS) technologies, optimized data processing methods need to be developed to identify the novel cancer-specific genes. Here, we conducted an analytical comparison between Bowtie2, a Burrows-Wheeler transform-based alignment tool, and Kallisto, which adopts pseudo alignment based on a transcriptome de Bruijn graph using mRNA sequencing data on normal cells and lung/liver cancer tissues. Before using cancer data, simulated mRNA sequencing reads were generated, and the high Transcripts Per Million (TPM) values were compared. mRNA sequencing reads data on lung/liver cancer cells were also extracted and quantified. While Kallisto could directly give the output in TPM values, Bowtie2 provided the counts. Thus, TPM values were calculated by processing the Sequence Alignment Map (SAM) file in R using package Rsubread and subsequently in python. The analysis of the simulated sequencing data revealed that Kallisto could detect more transcripts and had a higher overlap over Bowtie2. The evaluation of these two data processing methods using the known lung cancer biomarkers concludes that in standard settings without any dedicated quality control, Kallisto is more effective at producing faster and more accurate results than Bowtie2. Such conclusions were also drawn and confirmed with the known biomarkers specific to liver cancer.

Lossless image compression using subband decomposition and BW transform (대역분할과 BW 변환을 이용한 무손실 영상압축)

  • 윤정오;박영호;황찬식
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.5 no.1
    • /
    • pp.102-107
    • /
    • 2000
  • In general text compression techniques cannot be used directly in image compression because the model of text and image are different Recently, a new class of text compression, namely, block-sorting algorithm which involves Burrows and Wheeler transformation(BWT) gives excellent results in text compression. However, if we apply it directly into image compression, the result is poor. So, we propose simple method in order to improve the lossless compression performance of image. The proposed method can be divided into three steps. It is decomposed into ten subbands with the help of symmetric short kernel filter. The resulting subbands are block-sorted according to the method by BWT, and the redundancy is removed with the help of an adaptive arithmetic coder. Experimental results show that the proposed method is better than lossless JPEG and LZ-based compression method(PKZIP).

  • PDF

A Fragmentation and Search Method of Query Document for Partially Plagiarized Section Detection (부분표절구간 검출을 위한 질의문서의 분할 및 탐색 기법)

  • Ock, Chang-Seok;Seo, Jong-Kyu;Cho, Hwan-Gue
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.586-589
    • /
    • 2012
  • 표절과 관련된 이슈가 주목받고 있는 상황에서 표절을 검출하는 방법에 대한 연구가 활발히 진행되고 있다. 일반적으로 표절구간 검출을 위해 복잡한 자연어처리와 같은 의미론적 접근방법이 아닌 비교적 단순한 어휘기반의 문자열 처리 방법을 사용한다. 대표적인 방법으로는 지문법 (Fingerprinting)과 서열정렬 (Sequence alignment) 등이 있다. 하지만 이 방법들을 이용하여 대용량 문서에 대한 표절검사를 수행하기에는 시공간적 복잡도의 문제가 발생한다. 본 논문에서는 이러한 단점을 극복하기 위해 NGS (Next Generation Sequencing)에서 사용하는 BWT (Burrows-Wheeler Transform)[1]를 이용한 탐색방법을 응용한다. 또한 부분표절구간을 검출하고 정확도를 향상시키기 위해 질의문서를 분할하여 작은 조각으로 만든 뒤, 조각들에 대한 질의탐색을 수행한다. 본 논문에서는 질의문서를 분할하는 두 가지 방법을 소개한다. 두 가지 방법은 k-mer analysis를 이용한 방법과 random-split analysis를 이용한 방법으로, 각 방법의 장단점을 실험을 통해 분석하고 실제 부분표절구간의 검출 정확도를 측정하였다.