DOI QR코드

DOI QR Code

Fast Result Enumeration for Keyword Queries on XML Data

  • Zhou, Junfeng (School of Information Science and Engineering, Yanshan University) ;
  • Chen, Ziyang (School of Information Science and Engineering, Yanshan University) ;
  • Tang, Xian (School of Economics and Management, Yanshan University) ;
  • Bao, Zhifeng (School of Computing, National University of Singapore) ;
  • Ling, TokWang (School of Computing, National University of Singapore)
  • Received : 2012.05.21
  • Accepted : 2012.05.22
  • Published : 2012.06.30

Abstract

In this paper, we focus on efficient construction of tightest matched subtree (TMSubtree) results, for keyword queries on extensible markup language (XML) data, based on smallest lowest common ancestor (SLCA) semantics. Here, "matched" means that all nodes in a returned subtree satisfy the constraint that the set of distinct keywords of the subtree rooted at each node is not subsumed by that of any of its sibling nodes, while "tightest" means that no two subtrees rooted at two sibling nodes can contain the same set of keywords. Assume that d is the depth of a given TMSubtree, m is the number of keywords of a given query Q. We proved that if d ${\leq}$ m, a matched subtree result has at most 2m! nodes; otherwise, the size of a matched subtree result is bounded by (d - m + 2)m!. Based on this theoretical result, we propose a pipelined algorithm to construct TMSubtree results without rescanning all node labels. Experiments verify the benefits of our algorithm in aiding keyword search over XML data.

Keywords

References

  1. S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv, "XSEarch: a semantic search engine for XML," Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany, 2003, pp. 45-56.
  2. L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram, "XRANK: ranked keyword search over XML documents," Proceedings of the ACM SIGMOD International Conference on Management of Data, San Diego, CA, 2003, pp. 16-27.
  3. R. Zhou, C. Liu, and J. Li, "Fast ELCA computation for keyword queries on XML data," Proceedings of the 13th International Conference on Extending Database Technology, Lausanne, Switzerland, 2010, pp. 549-560.
  4. Y. Xu and Y. Papakonstantinou, "Efficient keyword search for smallest LCAs in XML databases," Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, MD, 2005, pp. 527-538.
  5. Y. Li, C. Yu, and H. V. Jagadish, "Schema-free XQuery," Proceedings of the 30th International Conference on Very Large Data Base, Toronto, Canada, 2004, pp. 72-83.
  6. C. Sun, C. Y. Chan, and A. K. Coenka, "Multiway SLCAbased keyword search in XML data," Proceedings of the 16th International Conference on World Wide Web, Banff, Alberta, Canada, 2007, pp. 1043-1052.
  7. L. Kong, R. Gilleron, and A. L. Mostrare, "Retrieving meaningful relaxed tightest fragments for XML keyword search," Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, Saint-Petersburg, Russia, 2009, pp. 815-826.
  8. Z. Liu and Y. Cher, "Reasoning and identifying relevant matches for XML keyword search," Proceedings of the VLDB Endowment, vol. 1, no. 1, 2008, pp. 921-932. https://doi.org/10.14778/1453856.1453956
  9. Y. Xu and Y. Papakonstantinou, "Efficient LCA based keyword search in XML data," Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, Nantes, France, 2008, pp. 535-546.
  10. I. Tatarinov, S. D. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang, "Storing and querying ordered XML using a relational database system," Proceedings of the ACM SIGMOD International Conference on Management of Data, Madison, WI, 2002, pp. 204-215.
  11. G. Li, J. Feng, J. Wang, and L. Zhou, "Effective keyword search for valuable LCAs over XML documents," Proceedings of the 16th ACM Conference on Information and Knowledge Management, Lisbon, Portugal, 2007, pp. 31-40.
  12. W. Wang, X. Wang, and A. Zhou, "Hash-search: an efficient SLCA-based keyword search algorithm on XML documents," Proceedings of the 14th International Conference on Database Systems for Advanced Applications, Brisbane, Queensland, Australia, 2009, pp. 496-510.
  13. V. Hristidis, N. Koudas, Y. Papakonstantinou, and D. Srivastava, "Keyword proximity search in XML trees," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 4, pp. 525-539, 2006. https://doi.org/10.1109/TKDE.2006.1599390
  14. L. F. Chen and Y. Papakonstantinou, "Supporting top-K keyword search in XML," Proceedings of the 26th International Conference on Data Engineering, Long Beach, CA, 2010, pp. 689-700.
  15. Z. Liu and Y. Chen, "Identifying meaningful return information for XML keyword search," Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, 2007, pp. 329-340.
  16. Z. Bao, T. W. Ling, B. Chen, and J. Lu, "Effective XML keyword search with relevance oriented ranking," Proceedings of the 25th IEEE International Conference on Data Engineering, Shanghai, China, 2009, pp. 517-528.
  17. J. Lu, T. W. Ling, C. Y. Chan, and T. Chen, "From region encoding to extended dewey: on efficient processing of XML twig pattern matching," Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, 2005, pp. 193-204.
  18. R. Goldman and J. Widom, "Dataguides: enabling query formulation and optimization in semistructured databases," Proceedings of the 23rd International Conference on Very Large Data Bases, Athens, Greece, 1997, pp. 436-445.

Cited by

  1. A Semantic Approach for Transforming XML Data into RDF Ontology vol.73, pp.4, 2013, https://doi.org/10.1007/s11277-013-1256-z