An Efficient Local Alignment Algorithm for DNA Sequences including N and X

Kim, Jin-Wook;

Journal of KIISE:Computing Practices and Letters (한국정보과학회논문지:컴퓨팅의 실제 및 레터)

Volume 16 Issue 3
/
Pages.275-280
/
2010
/
1229-7712(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

An Efficient Local Alignment Algorithm for DNA Sequences including N and X

N과 X를 포함하는 DNA 서열을 위한 효율적인 지역정렬 알고리즘

Kim, Jin-Wook

김진욱 (인하대학교 컴퓨터정보공학부)

Received : 2009.11.30
Accepted : 2010.01.11
Published : 2010.03.15

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

A local alignment algorithm finds a substring pair of given two strings where two substrings of the pair are similar to each other. A DNA sequence can consist of not only A, C, G, and T but also N and X where N and X are used when the original bases lose their information for various reasons. In this paper, we present an efficient local alignment algorithm for two DNA sequences including N and X using the affine gap penalty metric. Our algorithm is an extended version of the Kim-Park algorithm and can be extended in case of including other characters which have similar properties to N and X.

지역정렬(local alignment) 알고리즘은 주어진 두 서열에서 서로 유사한 부분 문자열을 찾아내는 알고리즘이다. DNA 서열은 A, C, G, T 외에 N과 X도 가질 수 있는데, N과 X는 DNA로부터 염기배열 정보를 뽑아낼 때 실험적인 이유로 혹은 다른 이유로 일부 배열 정보를 잃어버린 경우에 사용된다. 본 논문에서는 A, C, G, T 이외에 N과 X를 모두 갖는 DNA 서열의 affine gap penalty metric에 대한 지역정렬을 찾는 효율적인 알고리즘을 제시한다. 이는 N만 처리할 수 있는 Kim-Park 알고리즘을 N과 X를 모두 처리할 수 있도록 성공적으로 확장한 결과이며, 더불어 새로운 문자가 추가되더라도 바로 적용이 가능한 일반화된 결과이다.

Keywords

DNA 서열;
지역정렬;

References

D. Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge University Press, New York, 1997.
P. Green, PHRAP, http://www.phrap.org.
E.W. Myers, G.G. Sutton, A.L. Delcher, I.M. Dew, D.P. Fasulo, et al., A Whole-Genome Assembly of Drosophila, Science, 287, pp.2196-2204, 2000. https://doi.org/10.1126/science.287.5461.2196
A. Batzoglou, D.B. Jaffe, K. Stanley, J. Butler, et al., ARACHNE: A Whole-Genome Shotgun Assembler, Genome Research, 12, pp.177-189, 2002. https://doi.org/10.1101/gr.208902
J. Wang, G.K. Wong, P. Ni, et al., RePS: A Sequence Assembler that Masks Exact Repeats Identified from the Shotgun Data, Genome Research, 12, pp.824-831, 2002. https://doi.org/10.1101/gr.165102
J.W. Kim, K. Roh, K. Park, H. Park, J. Seo, MLP: Mate-Based Sequence Layout with PHRAP, Bioinformatics and Biosystems, 1(1), pp.61-66, 2006.
T.F. Smith, M.S. Waterman, Identification of Common Molecular Subsequences, Journal of Molecular Biology, 147, pp.195-197, 1981. https://doi.org/10.1016/0022-2836(81)90087-5
O. Gotoh, An Improved Algorithm for Matching Biological Sequences, Journal of Molecular Biology, 162, pp.705-708, 1982. https://doi.org/10.1016/0022-2836(82)90398-9
J.W. Kim, K. Park, An Efficient Alignment Algorithm for Masked Sequences, Theoretical Computer Science, 370, pp.19-33, 2007. https://doi.org/10.1016/j.tcs.2006.10.003
NC-UIB, Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences. Recommendations 1984, The European Journal of Biochemistry, 150, pp.1-5, 1985. https://doi.org/10.1111/j.1432-1033.1985.tb08977.x
J.W. Kim, A. Amir, G.M. Landau, K. Park, Computing Similarity of Run-Length Encoded Strings with Affine Gap Penalty, Theoretical Computer Science, 395, pp.268-282, 2008. https://doi.org/10.1016/j.tcs.2008.01.008

Journal of KIISE:Computing Practices and Letters (한국정보과학회논문지:컴퓨팅의 실제 및 레터)

An Efficient Local Alignment Algorithm for DNA Sequences including N and X

N과 X를 포함하는 DNA 서열을 위한 효율적인 지역정렬 알고리즘

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)