A New merging Algorithm for Constructing suffix Trees for Integer Alphabets

정수 문자집합상의 접미사트리 구축을 위한 새로운 합병 알고리즘

  • 김동규 (부산대학교 전자전기정보컴퓨터공학부) ;
  • 심정섭 (서울대학교 컴퓨터공학부) ;
  • 박근수 (서울대학교 컴퓨터공학부)
  • Published : 2002.02.01

Abstract

A new approach of constructing a suffix tree $T_s$for the given string S is to construct recursively a suffix tree $ T_0$ for odd positions construct a suffix tree $T_e$ for even positions from $ T_o$ and then merge $ T_o$ and $T_e$ into $T_s$ To construct suffix trees for integer alphabets in linear time had been a major open problem on index data structures. Farach used this approach and gave the first linear-time algorithm for integer alphabets The hardest part of Farachs algorithm is the merging step. In this paper we present a new and simpler merging algorithm based on a coupled BFS (breadth-first search) Our merging algorithm is more intuitive than Farachs coupled DFS (depth-first search ) merging and thus it can be easily extended to other applications.

주어진 스트링 S의 접미사트리 $T_s$를 구축하기 위하여 , 먼저 홀수위치들에 대한 접미사트리 $ T_0$를 제귀적으로 구축하고 짝수위치들에 대한 접비사트리 $T_e$$ T_o$/로 부터 구축한 다음 $ T_o$$T_e$를 합병하여 $T_s$를 구축하는 새로운 방식이 사용되고 있다. 인덱스자료구조에 관련된 문제들 중 정수 문자집합상의 접미사트리를 선형시간에 구축하는 문제는 오랫동안 미해결문제로 남아 있었다. Farach은 이 방식을 적용하여 처음으로 성형시간이 소요되는 알고리즘을 제시하였다. 이 알고리즘은 중 가장 어려운 곳은 합병하는 부분이다. 본 논문에서는 BFS(breadth-first search)에 기반하는 새로운 합병알고리즘을 제안한다. 제안된 합병알고리즘은 Farach의 DFS(depth-first search) 방식보다 개념적으로 단순하게 동작하므로 다른 응용의로 쉽게 확장될수 있다.

Keywords

References

  1. E.M. McCreight, A space-economical suffix tree construction algorithms, J. ACM 23 (1976), 262-272 https://doi.org/10.1145/321941.321946
  2. P. Weiner, Linear pattern matching algorithms, Proc. 14th IEEE Symp. Switching and Automata Theory (1973), 1-11
  3. M.T. Chen and J. Seiferas, Efficient and elegant subword tree construction, In A. Apostolico and Z.Galil, editors, Combinatorial Algorithms on Words, NATO ASI Series F: Computer and System Sciences (1985)
  4. E. Ukkonen, 'On-line construction of suffix trees,' Algorithmica 14, pp. 353-364, 1993 https://doi.org/10.1007/BF01206331
  5. Z. Galil, Open problems in stringology, In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, NATO ASI Series F: Computer and System Sciences(1985)
  6. S. Kosaraju and A. Delcher, Large-scale assembly of dna strings and space-efficient construction of suffix trees, ACM Symp. Theory of Computing (1995), 169-177 https://doi.org/10.1145/225058.225108
  7. S. Kosaraju and A. Delcher, Large-scale assembly of dna strings and space-efficient construction of suffix trees (corrections), ACM Symp. Theory of Computing (1996) https://doi.org/10.1145/237814.250975
  8. M. Farach and S. Muthhukrishnan, Optimal logarithmic time randomized suffix tree construction, Int. Colloq. Automata Languages and Programming (1996), 550-561
  9. M. Farach, Optimal suffix tree construction with large alphabets, IEEE Symp. Found. Computer Science (1991), 137-143 https://doi.org/10.1109/SFCS.1997.646102
  10. R. Hariharan, Optimal parallel suffix tree construction, IEEE Symp. Found. Computer Science (1994), 290-299 https://doi.org/10.1145/195058.195162
  11. S.C. Sahinalp and U. Vishkin, Symmetry breaking for suffix tree construction, IEEE Symp Found. Computer Science. (1994), 300-309 https://doi.org/10.1145/195058.195164
  12. R.M. Karp and M.O. Rabin, Efficient randomized pattern-matching algorithms, IBM Journal of Research and Development 31 (1987), 249-260 https://doi.org/10.1147/rd.312.0249
  13. D. Harel and R.E. Tarjan, Fast algorithms for finding nearest common ancestors, SIAM J. Comput. l3(1984), 338-355 https://doi.org/10.1137/0213024
  14. B. Schieber and U. Vishkin, On finding lowest common ancestors: simplification and parallelization, SIAM J. Comput. 17, (1988), 1253-1262 https://doi.org/10.1137/0217079
  15. D.K. Kim and K. Park, Linear-time construction of two-dimensional suffix trees, Int. Colloq. Automata Languages and programming (1999), 463-472