DNA 염기 서열의 단편 조립 프로그램 개발

  • Lee, Byung-Uk (Korea Research Institute of Bioscience and Biotechnology, KIST) ;
  • Park, Kie-Jung (Korea Research Institute of Bioscience and Biotechnology, KIST) ;
  • Park, Wan (Department of Microbiology, Kyungpook National University) ;
  • Park, Yong-Ha (Korea Research Institute of Bioscience and Biotechnology, KIST)
  • 이병욱 (한국과학기술연구원 생명공학연구소) ;
  • 박기정 (한국과학기술연구원 생명공학연구소) ;
  • 박완 (경북대학교 미생물학과) ;
  • 박용하 (한국과학기술연구원 생명공학연구소)
  • Published : 1997.12.01

Abstract

DNA fragment assembly is a major concem in shot-gun DNA sequencing project. It is to reconstruct a consensus DNA sequence from a collection of random oritented fragments. We developed a computer program that is useful for DNA fragment assembly. Inputs to the program are DNA fragment sequences including IUB-IUPAC bases. The program produces the most probable reconstruction ot the original DNA sequence as a text format or a PostScript format. The program consists of four phases: the first phase quickly eliminates fragment pairs that can not possibly overlap. In the second phase, the quality of overlap between each pair is calculated to a score. In the third phase, overlap pairs are sorted by their scores and consistency of the overlaps is checked. The last phase determines consensus sequences and displays them. The performance of fragment assembly program was tested on a set of DNA fragment sequences which were generated from long DNA sequences of GenBank by a fragmentation program.

Keywords

DNA fragment;Conting;IUB-IUPAC base;DNA assemble program

References

  1. Nucl. Acids. Res. v.8 A new computer method for the storage and manipulation of DNA gel reading data Staden,R.
  2. Nucl. Acids Res. v.12 SEQAID: a DNA sequence assembling program based on a mathematical model Hannu,P.;H.Soderlund;E.Ukkonen
  3. Genomics v.2 Genome Mapping by Fingerprinting Random Clones: A Mathematical Analysis Eric,S.L.;M.S.Waterman
  4. Nucl. Acids Res. v.16 A Nomenclature for incompletely specified bases in nucleic acid sequences:recommendation Cornish,B.
  5. Algorithm(2th ed.) Sedgewick,R.
  6. Communications of the ACM v.18 A Linear Space Algorithm for Computing Maximal Common Subsequences Hirschberg,D.S.
  7. Genomics v.14 A Contig Assembly Program Based on Sensitive Detection of Fragment Overlaps Xiaoqiu,H.
  8. J. Mol. Biol. v.147 Identification of common molecular subsequences Smith,T.F.;M.S.Waterman
  9. CABIOS v.7 Fast optimal alignment Spouge,J.L.
  10. PostScript by example(1th ed.) Henry,M.;M.Campione
  11. DNA seq. v.12 An estimate of the sequencing error frequency in the DNA sequence database Kristensen,T.;R.Lopez;H.Prydz
  12. Proc. Natl. Acad. Sci. USA v.85 Improved tools for biological sequence comparison Peason,W.R.;D.J.Lipman