Parallel Approximate String Matching with k-Mismatches for Multiple Fixed-Length Patterns in DNA Sequences on Graphics Processing Units

Ho, ThienLuan;Kim, HyunJin;Oh, SeungRohk;

doi:10.5370/KIEE.2017.66.6.955

전기학회논문지 (The Transactions of The Korean Institute of Electrical Engineers)

제66권6호
/
Pages.955-961
/
2017
/
1975-8359(pISSN)
/
2287-4364(eISSN)

대한전기학회 (The Korean Institute of Electrical Engineers)

DOI QR Code

GPU을 이용한 다중 고정 길이 패턴을 갖는 DNA 시퀀스에 대한 k-Mismatches에 의한 근사적 병열 스트링 매칭

Parallel Approximate String Matching with k-Mismatches for Multiple Fixed-Length Patterns in DNA Sequences on Graphics Processing Units

Ho, ThienLuan (Dept. of Electronics and Electrical Eng., Dankook University) ;
Kim, HyunJin (Dept. of Electronics and Electrical Eng., Dankook University) ;
Oh, SeungRohk (Dept. of Electronics and Electrical Eng., Dankook University)

투고 : 2016.08.17
심사 : 2017.05.02
발행 : 2017.06.01

https://doi.org/10.5370/KIEE.2017.66.6.955 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

In this paper, we propose a parallel approximate string matching algorithm with k-mismatches for multiple fixed-length patterns (PMASM) in DNA sequences. PMASM is developed from parallel single pattern approximate string matching algorithms to effectively calculate the Hamming distances for multiple patterns with a fixed-length. In the preprocessing phase of PMASM, all target patterns are binary encoded and stored into a look-up memory. With each input character from the input string, the Hamming distances between a substring and all patterns can be updated at the same time based on the binary encoding information in the look-up memory. Moreover, PMASM adopts graphics processing units (GPUs) to process the data computations in parallel. This paper presents three kinds of PMASM implementation methods in GPUs: thread PMASM, block-thread PMASM, and shared-mem PMASM methods. The shared-mem PMASM method gives an example to effectively make use of the GPU parallel capacity. Moreover, it also exploits special features of the CUDA (Compute Unified Device Architecture) memory structure to optimize the performance. In the experiments with DNA sequences, the proposed PMASM on GPU is 385, 77, and 64 times faster than the traditional naive algorithm, the shift-add algorithm and the single thread PMASM implementation on CPU. With the same NVIDIA GPU model, the performance of the proposed approach is enhanced up to 44% and 21%, compared with the naive, and the shift-add algorithms.

키워드

참고문헌

S. Li, Q. Jiang, and D. Wei, "An optimized algorithm for finding approximate tandem repeats in DNA sequences," Proc. 2nd Int. Workshop on Education Technology and Computer Science (ETCS), IEEE, vol. 3, pp. 68-71, 2010.
L.-L. Cheng, D. W. Cheung, and S.-M. Yiu, "Approximate string matching in DNA sequences," Proc. 8th Int. Conf. Database Systems for Advanced Applications, (DASFAA), IEEE, pp. 303-310, 2003.
K. Inoue, et al, "Application of approximate pattern matching in two dimensional spaces to grid layout for biochemical network maps," PloS One, vol. 7, no. 6, p. e37739, 2012. https://doi.org/10.1371/journal.pone.0037739
Y. Liu, L. Guo, J. Li, M. Ren, and K. Li, "Parallel algorithms for approximate string matching with k-mismatches on CUDA," Proc. 26th Int. Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), IEEE, pp. 2414-2422, 2012.
G. Navarro, "A guided tour to approximate string matching," ACM computing surveys (CSUR), vol. 33, no. 1, pp. 31-88, 2001. https://doi.org/10.1145/375360.375365
K. Xu, W. Cui, Y. Hu, L. Guo, "Bit-parallel multiple approximate string matching based on GPU," Procedia Computer Science, vol. 17, pp. 523-529, 2013. https://doi.org/10.1016/j.procs.2013.05.067
Z. Galil and R. Giancarlo, "Improved string matching with k-mismatches," ACM SIGACT News, vol. 17, no. 4, pp. 52-54, 1986. https://doi.org/10.1145/8307.8309
R. Baeza-Yates and G. H. Gonnet, "A new approach to text searching," Communications of the ACM, vol. 35, no. 10, pp. 74-82, 1992.
K. Abrahamson, "Generalized string matching," SIAM Journal on Computing (SICOMP), vol. 16, no. 6, pp. 1039-1051, 1987. https://doi.org/10.1137/0216067
A. Amir, M. Lewenstein, and E. Porat, "Faster algorithms for string matching with k-mismatches," Journal of Algorithms, vol. 50, no. 2, pp. 257-275, 2004. https://doi.org/10.1016/S0196-6774(03)00097-X
M. Nicolae, et al, "On string matching with mismatches," Algorithms, vol. 8, no. 2, pp. 248-270, 2015. https://doi.org/10.3390/a8020248
DNA patterns, Bioinformatics;. Available: http://www.bioinformatics.org/sms2/dna_pattern.html
Saccharomyces Genome Database;. Available: http://downloads.yeastgenome.org/sequence/S288C_reference/orf_dna.
GeForce GTX 660;. Available: http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-660.
Intel Xeon CPU E31270;. Available: http://ark.intel.com/products/52276/Intel-Xeon-Processor-E3-1270-8M-Cache-3_40-GHz.

전기학회논문지 (The Transactions of The Korean Institute of Electrical Engineers)

GPU을 이용한 다중 고정 길이 패턴을 갖는 DNA 시퀀스에 대한 k-Mismatches에 의한 근사적 병열 스트링 매칭

Parallel Approximate String Matching with k-Mismatches for Multiple Fixed-Length Patterns in DNA Sequences on Graphics Processing Units

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)