• Title/Summary/Keyword: the all-pairs suffix-prefix problem

Search Result 1, Processing Time 0.017 seconds

Experimental Analysis of Recent Works on the Overlap Phase of De Novo Sequence Assembly (De novo 시퀀스 어셈블리의 overlap 단계의 최근 연구 실험 분석)

  • Lim, Jihyuk;Kim, Sun;Park, Kunsoo
    • Journal of KIISE
    • /
    • v.45 no.3
    • /
    • pp.200-210
    • /
    • 2018
  • Given a set of DNA read sequences, de novo sequence assembly reconstructs a target sequence without a reference sequence. For reconstruction, the assembly needs the overlap phase, which computes all overlaps between every pair of reads. Since the overlap phase is the most time-consuming part of the whole assembly, the performance of the assembly depends on that of the overlap phase. There have been extensive studies on the overlap phase in various fields. Among them, three state-of-the-art results for the overlap phase are Readjoiner, SOF, and Lim-Park algorithm. Recently, a rapid development of sequencing technology has made it possible to produce a large read dataset at a low cost, and many platforms for generating a DNA read dataset have been developed. Since the platforms produce datasets with different statistical characteristics, a performance evaluation for the overlap phase should consider datasets with these characteristics. In this paper, we compare and analyze the performances of the three algorithms with various large datasets.