접미사 배열을 이용한 시간과 공간 효율적인 검색

Time and Space Efficient Search with Suffix Arrays

  • 발행 : 2005.06.01

초록

길이가 n인 알파벳 $\Sigma$상의 텍스트 T에서 패턴 P를 효율적으로 검색하기 위해 접미사 트리와 접미사 배열이 널리 쓰이고 있다. 접미사 배열이 접미사 트리보다 더 적은 공간을 사용하기 때문에 텍스트의 길이가 긴 경우에는 접미사 배열이 더 선호되고 있다. 최근에는 접미사 배열을 이용한 O(${\mid}P{\mid}{\codt}{\mid}{\Sigma}{\mid}) 시간과 O(${\mid}P{\mid}{\codt}log{\mid}{\Sigma}{\mid}$) 시간 검색 알고리즘들이 개발되었다. 본 논문에서는 접미사 배열을 이용한 시간과 공간 효율적인 알고리즘들을 제시한다. 하나의 알고리즘은 O(${\mid}P{\mid}{\codt}{\mid}{\Sigma}{\mid}$) 비트 공간을 사용하여 O(${\mid}P{\mid}$) 시간에 수행되고, 다른 하나는 O($n{\cdot}log{\mid}{\Sigma}{\mid}+{\mid}{\Sigma}{\mid}{\cdot}$nlog log n/logn)비트 공간을 사용하여 O(${\mid}P{\mid}{\codt}log{\mid}{\Sigma}{\mid}$) 시간에 수행되는데, 두 번째 알고리즘은 보다 효율적인 공간을 사용하면서 여전히 빠른 알고리즘이다. 본 논문이 제시하는 알고리즘들이 시간과 공간에 있어 기존의 알고리즘들보다 더 효율적인 알고리즘들임을 실험을 통해 보여주고 있다.

To search efficiently a text T of length n for a pattern P over an alphabet 5, suffix trees and suffix arrays are widely used. In case of a large text, suffix arrays are preferred to suffix trees because suffix ways take less space than suffix trees. Recently, O(${\mid}P{\mid}{\codt}{\mid}{\Sigma}{\mid}$-time and O(${\mid}P{\mid}P{\cdot}log{\mid}{\Sigma}{\mid}$)-time search algorithms in suffix ways were developed. In this paper we present time and space efficient search algorithms in suffix arrays. One algorithm runs in O(${\mid}P{\mid}$) time using O($n{\cdot}{\mid}{\Sigma}{\mid}$)-bits space, and the other runs in O($n{\cdot}{\mid}{\Sigma}{\mid}$ time using O($nlog{\mid}{\Sigma}{\mid}+{\mid}{\Sigma}{\mid}{\cdot}$nlog log n/logn)-bits space, which is more space efficient and still fast. Experiments show that our algorithms are efficient in both time and space when compared to previous algorithms.

키워드

참고문헌

  1. E. M. McCreight, 'A space-economical suffix tree construction algorithms,' J. ACM 23, pp. 262-272, 1976 https://doi.org/10.1145/321941.321946
  2. P. Weiner, Linear pattern matching algorithms, In Proc. of 14th IEEE Symposium on Switching and Automata Theory, pages 1-11, 1973
  3. U. Manber, G. Myers, 'Suffix arrays: a new method for on-line string searches,' SIAM J. Computing 22, pp. 935-948, 1993 https://doi.org/10.1137/0222058
  4. G. Gonnet, R. Baeza-Yates, and T. Snider, Information Retrieval: Data Structures & Algorithms, Prentice Hall, pages 66-82, 1992
  5. M. Farach-Colton, P. Ferragina and S. Muthukrishnan, On the sorting-complexity of suffix tree construction, Journal of the ACM, 47(6), pages 987-1011, 2000 https://doi.org/10.1145/355541.355547
  6. R. Grossi and J.S. Vitter, Compressed suffix arrays and suffix trees with applications to text indexing and string matching, ACM Symp. Theory of Computing, pp, 397-406, 2000 https://doi.org/10.1145/335305.335351
  7. W. Hon, K. Sadakane, and W. Sung, Breaking a time-and-space barrier in constructing full-text indices, In Proc. of the 44st IEEE Symposium on Foundations of Computer Science, pages 251-260, 2003
  8. P. Ko and S. Aluru, Space efficient linear time construction of suffix arrays, In Proc. 14th Annual Symposium on Combinatorial Pattern Matching, LNCS 2676, pages 200-210, 2003
  9. J. Karkkainen and P. Sanders, Simple linear work suffix array construction, In Proc. 30th International Colloquium on Automata, Languages and Programming, LNCS 2719, pages 943-955, 2003
  10. D.K. Kim, J.S. Sim, H. Park, and K. Park, Linear-time construction of suffix arrays, In Proc. 14th Annual Symposium on Combinatorial Pattern Matching, LNCS 2676, pages 186-199, 2003
  11. K. Sadakane, Succint representation of lcp information and improvement in the compressed suffix arrays, ACM-SIAM Symposium on Discrete Algorithms, pages 225-232, 2002
  12. E. Ukkonen, 'On-line construction of suffix trees,' Algorithmica 14, pp. 353-364, 1993 https://doi.org/10.1007/BF01206331
  13. P. Ferragina and G. Manzini, Opportunistic data structures with applications, In Proc. of the 41st IEEE Symposium on Foundations of Computer Science, pages 390-398, 2000 https://doi.org/10.1109/SFCS.2000.892127
  14. M. Abouelhoda, E. Ohlebusch, and S. Kurtz, Optimal exact string matching based on suffix arrays, In Proc. of the 9th International Symposium on String Processing and Information Retrieval, LNCS 2476, pages 31-43, 2002
  15. J.S. Sim, D.K. Kim, H. Park and K. Park, Linear-time search in suffix arrays, In Proc. of the 14th Australasian Workshop on Combinatorial Algorithms, pages 139-146, 2003
  16. M. Burrows and D. Wheeler, A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, 1994
  17. I. Munro, Tables, In Proc. of the 16th Conference on Foundation of Software Technology and Theoretical Computer Science, pages 37-42, 1996