An Efficient Keyword Search Method on RDF Data

RDF 데이타에 대한 효율적인 검색 기법

  • 김진하 (NHN 서비스관리시스템랩 품질관리시스템개발팀) ;
  • 송인철 (KAIST 전산학과) ;
  • 김명호 (KAIST 전산학과)
  • Published : 2008.12.15

Abstract

Recently, there has been much work on supporting keyword search not only for text documents, but a]so for structured data such as relational data, XML data, and RDF data. In this paper, we propose an efficient keyword search method for RDF data. The proposed method first groups related nodes and edges in RDF data graphs to reduce data sizes for efficient keyword search and to allow relevant information to be returned together in the query answers. The proposed method also utilizes the semantics in RDF data to measure the relevancy of nodes and edges with respect to keywords for search result ranking. The experimental results based on real RDF data show that the proposed method reduces RDF data about in half and is at most 5 times faster than the previous methods.

최근 문서나 웹 페이지뿐만 아니라 관계형 데이타나 XML 데이타, RDF 데이타 같은 구조화된 데이타에 대해서도 검색을 지원하고자 하는 연구가 활발히 진행되고 있다. 본 논문에서는 RDF 데이타에 대한 효율적인 검색 기법을 제안한다. 제안하는 기법은 먼저 RDF 데이타의 크기를 줄여 검색 성능을 높이고 검색 결과로 관련 있는 정보를 함께 반환해 주기 위해 RDF 데이타에서 관련 있는 노드와 에지를 묶어 새로운 RDF 그래프를 생성한다. 또한 검색 과정에서 검색의 결과를 정렬하기 위해 RDF 데이타 그래프의 노드와 예지에 키워드와의 연관도를 부여할 때, RDF 온톨로지 데이타의 특성을 활용함으로써 보다 사용자의 의도에 부합하는 검색 결과를 반환한다. 실제 RDF 데이타를 사용한 성능 비교 결과는 제안하는 기법이 RDF 데이타의 크기를 최대 2배까지 줄이고 기존 기법에 비해 검색 속도가 최대 5배 빠르다는 것을 보여준다.

Keywords

References

  1. Agrawal, S., et al., "DBXplorer: A System for Keyword-Based Search over Relational Databases," In Proc. of International Conference on Data Engineering, pp. 5-16, 2002
  2. Hristidis, V. and Papakonstantinou, Y., "DISCOVERY: Keyword Search in Relational Databases," In Proc. of International Conference on Very Large Data Bases, pp. 670-681, 2002
  3. Hristidis, V., et al., "Efficient IR-Style Keyword Search over Relational Databases," In Proc. of International Conference on Very Large Data Bases, pp. 850-861, 2003
  4. Liu, F., et al., "Effective Keyword Search in Relational Databases," In Proc. of ACM SIGMOD Conference, pp. 563-574, 2006
  5. Luo, Y., et al., "Spark: top-k keyword query in relational databases," In Proc. of ACM SIGMOD Conference, pp. 115-126, 2007
  6. Bhalotia, G., et al., "Keyword Searching and Browsing in Databases using BANKS," In Proc. of International Conference on Data Engineering, pp. 431-440, 2002
  7. Kacholia, T., et al., "Bidirectional Expansion For Keyword Search on Graph Databases," In Proc. of International Conference on Very Large Data Bases, pp. 505-516, 2005
  8. He, H., et al., "BLINKS: ranked keyword searches on graphs," In Proc. of ACM SIGMOD Conference, pp. 305-316, 2007
  9. Guo, L., et al., "XRANK: Ranked Keyword Search over XML Documents," In Proc. of ACM SIGMOD Conference, pp. 16-27, 2003
  10. Hristidis, V., et al., "Keyword Proximity Search on XML Graphs," In Proc. of International Conference on Data Engineering, pp. 367-378, 2003
  11. Xu, Y. and Papakonstantinou, Y., "Efficient Keyword Search for Smallest LCAs in XML Databases," In Proc. of ACM SIGMOD Conference, pp. 537-538, 2005
  12. Liu, Z. and Chen, Y., "Identifying meaningful return information for XML keyword search," In Proc. of ACM SIGMOD Conference, pp. 329-340, 2007
  13. Liu, Z., et al., "XSeek: A Semantic XML Search Engine Using Keywords," In Proc. of International Conference on Very Large Data Bases, pp. 1330- 1333, 2007
  14. http://www.w3.org/TR/REC-rdf-syntax
  15. http://www.w3.org/TR/rdf-schema
  16. http://www.w3.org/TR/rdf-sparql-query
  17. Rocha, C., et al., "A Hybrid Approach for Searching in the Semantic Web," In Proc. of International World Wide Web Conference, pp. 374-383, 2004
  18. Zhang, L., et al., "Semplore: An IR Approach to Scalable Hybrid Query of Semantic Web Data," In Proc. of International Semantic Web Conference, pp. 652-665, 2007
  19. Anyanwu, K., et al., "SPARQ2L: towards support for subgraph extraction queries in rdf databases," In Proc. of International World Wide Web Conference, pp. 797-806, 2007
  20. http://www.w3.org/TR/rdf-concepts
  21. http://www.w3c.org/TR/REC-xml-names
  22. http://jena.sourceforge.net
  23. Yates, B. and Neto, B., "Modern Information Retrieval," ACM Press, New York, 1999
  24. Ding, B., et al., "Finding Top-k Min-Cost Connected Trees in Databases," In Proc. of International Conference on Data Engineering, pp. 836-845, 2007
  25. Kimelfeld, B. and Sagiv, Y., "Finding and approximating top-k answers in keyword proximity search," In Proc. of PODS Conference, pp. 173- 182, 2006
  26. http://www.informatik.uni-trier.de/~ley/db
  27. http://lsdis.cs.uga.edu/projects/semdis/swetodblp