DOI QR코드

DOI QR Code

Dynamic Compressed Representation of Texts with Rank/Select

  • Lee, Sun-Ho (School of Computer Science and Engineering, Seoul National University) ;
  • Park, Kun-Soo (School of Computer Science and Engineering, Seoul National University)
  • Published : 2009.03.31

Abstract

Given an n-length text T over a $\sigma$-size alphabet, we present a compressed representation of T which supports retrieving queries of rank/select/access and updating queries of insert/delete. For a measure of compression, we use the empirical entropy H(T), which defines a lower bound nH(T) bits for any algorithm to compress T of n log $\sigma$ bits. Our representation takes this entropy bound of T, i.e., nH(T) $\leq$ n log $\sigma$ bits, and an additional bits less than the text size, i.e., o(n log $\sigma$) + O(n) bits. In compressed space of nH(T) + o(n log $\sigma$) + O(n) bits, our representation supports O(log n) time queries for a log n-size alphabet and its extension provides O(($1+\frac{{\log}\;{\sigma}}{{\log}\;{\log}\;n}$) log n) time queries for a $\sigma$-size alphabet.

Keywords

References

  1. CHAN, H.-L., W.-K. HON, AND T.-W. LAM. 2004. Compressed index for a dynamic collection of texts. In Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching. 445-456.
  2. ELIAS, P. 1975. Universal codeword sets and representation of the integers. IEEE Transactions on Information Theory 21(2):194-203. https://doi.org/10.1109/TIT.1975.1055349
  3. FERRAGINA, P. AND G. MANZINI. 2005. Indexing compressed text. Journal of ACM 52(4):552-581. https://doi.org/10.1145/1082036.1082039
  4. FERRAGINA, P., G. MANZINI, V. MAKINEN, AND G. NAVARRO. 2007. Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms 3, 2. https://doi.org/10.1145/1186810.1186812
  5. GONZALEZ, R. AND G. NAVARRO. 2008. Improved dynamic rank-select entropy-bound structures. In Proceedings of the 8th Latin American Symposium on Theoretical Informatics. To Appear. https://doi.org/10.1007/978-3-540-78773-0_33
  6. GROSSI, R., A. GUPTA, AND J. S. VITTER. 2003. High-order entropy-compresssed text indexes. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. 841-850. https://doi.org/10.1007/978-3-540-78773-0_33
  7. GROSSI, R., A. GUPTA, AND J. S. VITTER. 2004. When indexing equals compression: experiments with compressing suffix arrays and applications. In Proceedings of the 15th Annual ACMSIAM Symposium on Discrete Algorithms. 636-645.
  8. GROSSI, R. AND J. S. VITTER. 2005. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM Journal on Computing 35(2):378-407. https://doi.org/10.1137/S0097539702402354
  9. GUPTA, A., W.-K. HON, R. SHAH, AND J. S. VITTER. 2007. A framework for dynamizing succinct data structures. In Proceedings of the 34th International Colloquium on Automata, Languages and Programming. 521-532.
  10. HON, W.-K., K. SADAKANE, AND W.-K. SUNG. 2003. Succinct data structures for searchable partial sums. In Proceedings of the 14th Annual Symposium on Algorithms and Computation. 505-516.
  11. LEE, S. AND K. PARK. 2007. Dynamic rank-select structures with applications to run-length encoded texts. In Proceedings of the 18th Annual Symposium on Combinatorial Pattern Matching. 95-106. https://doi.org/10.1007/978-3-540-73437-6_12
  12. MAKINEN, V. AND G. NAVARRO. 2006. Dynamic entropy-compressed sequences and full-text indexes. In Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching. 306-317.
  13. MAKINEN, V. AND G. NAVARRO. 2007. Rank and select revisited and extended. Theoretical Computer Science 387(3):332-347. https://doi.org/10.1016/j.tcs.2007.07.013
  14. MÄKINEN, V. AND G. NAVARRO. 2008. Dynamic entropy-compressed sequences and full-text indexes. ACM Transactions on Algorithms. To Appear. https://doi.org/10.1145/1367064.1367072
  15. MANZINI, G. 2001. An analysis of the burrows-wheeler transform. Journal of ACM 48(3):407-430. https://doi.org/10.1145/382780.382782
  16. RAMAN, R., V. RAMAN, AND S. S. RAO. 2001. Succinct dynamic data structures. In Proceedings of the 7th International Workshop on Algorithms and Data Structures. 426-437. https://doi.org/10.1007/3-540-44634-6_39
  17. RAMAN, R., V. RAMAN, AND S. S. RAO. 2002. Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms. 233-242.
  18. SADAKANE, K. 2003. New text indexing functionalites of the compressed suffix arrays. Journal of Algorithms 48(2):294-313. https://doi.org/10.1016/S0196-6774(03)00087-7