Design and frnplernentation of a Query Processing Algorithm for Dtstributed Semistructlred Documents Retrieval with Metadata hterface

메타데이타 인터페이스를 이용한 분산된 반구조적 문서 검색을 위한 질의처리 알고리즘 설계 및 구현

  • Published : 2005.06.01

Abstract

In the semistructured distributed documents, it is very difficult to formalize and implement the query processing system due to the lack of structure and rule of the data. In order to precisely retrieve and process the heterogeneous semistructured documents, it is required to handle multiple mappings such as 1:1, 1:W and W:1 on an element simultaneously and to generate the schema from the distributed documents. In this paper, we have proposed an query processing algorithm for querying and answering on the heterogeneous semistructured data or documents over distributed systems and implemented with a metadata interface. The algorithm for generating local queries from the global query consists of mapping between g1oba1 and local nodes, data transformation according to the mapping types, path substitution, and resolving the heterogeneity among nodes on a global input query with metadata information. The mapping, transformation, and path substitution algorithms between the global schema and the local schemas have been implemented the metadata interface called DBXMI (for Distributed Documents XML Metadata Interface). The nodes with the same node name and different mapping or meanings is resolved by automatically extracting node identification information from the local schema automatically. The system uses Quilt as its XML query language. An experiment testing is reported over 3 different OEM model semistructured restaurant documents. The prototype system is developed under Windows system with Java and JavaCC compiler.

반구조적 분산 문서에서는 구조 정보가 제공되지 않고, 자료 구조에 대한 엄격한 형식이 없기 때문에 질의 처리 시스템을 정형화하고 구현하기 어렴다. 이질적이고 이구조적인 반구조적 문서의 요소를 정확하게 검색하기 위해서는 한 요소가 1:1, 1:N, W:1과 같이 서로 다른 매핑 형태를 취하면서 동시에 여러 요소에 매핑되는 다중 매핑을 처리할 수 있어야 하며, 지역문서의 태그를 파싱하여 구조적인 정보를 얻고 경로 트리를 생성해야 한다. 본 논문에서는 분산된 시스템에 존재하는 이질적인 반구조적 자료나 문서에 대한 동시 다중 매핑을 완벽히 지원하고, 문서 자체를 파싱하여 구조적 정보를 얻을 수 있도록 통합 질의와 검색을 수행하기 위한 추상적인 질의 처리 알고리즘을 설계하고 메타데이타 인터페이스를 이용하여 구현하였다. 이 알고리즘은 전역질의를 기반으로 지역질의를 생성하기 위해서 메타데이타 정보를 이용하여 노드들 사이의 매핑, 매핑 종류에 따른 데이타의 변환, 경로교체 및 노드 사이의 이질성을 해결하기 위한 알고리즘으로 제시하였다. 전역스키마와 지역스키마에 대한 매핑과 함수에 의한 데이타의 변환 및 경로교체는 사용자에 의해 구축된 메타데이타 인터페이스인 DDXMI(for Distributed Documents XML Metadata Interface)를 기반으로 하여 구현되었으며, 같은 이름을 갖지만 다른 의미를 갖는 자료나 노드에 대한 검색은 노드를 구분할수 있는 노드가 가지고 있는 자식정보를 이용하여 노드 구분 조건절을 생성하여 구현하였다. XML 질의언어로 Quilt를 사용하였으며, OEM 모델로 제시한 세 개의 서로 다른 반구조적 레스토랑 안내 문서에서구현한 결과를 보였다. 프로토타입 시스템은 윈도우즈 환경에서 Java와 JavaCC 컴파일러를 이용하여 개발하였다.

Keywords

References

  1. Don Chamberlin, Jonathan Robie, Daniela Florescu. Quilt: An XML Query Language for Heterogeneous Data Sources. Proceedings of WebDB 2000 Conference, in Lecture Notes in Computer Science, Springer-Verlag, 2000
  2. Antonio Badia, Sanjay Kumer Madria. Handling Partial Matches in Semistructured Data with Cooperative Query Answering Techniques, Confederated International Conferences DOA, CoopIS and ODBASE, Pages:449-467, 2002
  3. Dan Suciu. Distributed Query Evaluation on Semistructured Data, ACM Transactions on Database Systems, Vol. 27, No. 1, Pages:1-62, March 2002 https://doi.org/10.1145/507234.507235
  4. Dan Suciu. Semistructured Data and XML. Information organization and databases, 2000
  5. Peter Buneman. Tutorial: Semistructured data. In Proceedings of PODs, 1997
  6. Serge Abiteboul. Querying semistructured data. In Proceedings of ICDT, 1997
  7. Jason McHugh, Jennifer Widom, Serge Abiteboul, Qinghan Luo, Anand Rajaraman. Indexing Semi-structured Data, Technical Report, Stanford University, 1998
  8. Jason McHugh, Serge Abiteboul, Roy Goldman, Dallan Quass, Jennifer Widom. Lore: A database management systems for semistructured data. SIGMOD Record, 26, 1997
  9. Sophie Cluet, Claude Delobel, Jerome Simeon, Katarzyna Smaga. Your Mediators Need Data Conversion! In Proceedings ACM-SIGMOD International Conference on Management of Data, pages:177-188, 1998 https://doi.org/10.1145/276305.276321
  10. Yannis Papakonstantinou. Query Processing in Heterogeneous Information Sources, Technical Report, Stanford University Thesis, 1996
  11. Yannis Papakonstantinou, Hector Garcia-Molina, Jeniffer Widom. Object exchange across heterogeneous information sources. In Proceedings of the 11th ICDE, 1995 https://doi.org/10.1109/ICDE.1995.380386
  12. Chaitanya Baru, Amarnath Gupta, Bertram Ludascher, Richard Marciano, Yannis Papakonstantinou, Pavel Velikhov, Vincent Chu. XML-Based Information Mediation with MIX. Exhibition program, ACM Conf. on Management of Data, SIGMOD'99, Philadelphia, 1999
  13. Yannis Papakonstantinou, Hector Garcia-Molina, Jeffrey Ullman. MedMaker: A Mediation System Based on Declarative Specifications. Data Engineering(ICDE), 1996
  14. Dallan Quass, Anand Rajaraman, Yehoshua Sagiv, Jeffrey Ullman, Jennifer Widom. Querying Semi-structured Heterogeneous Information, Proceedings of the Fourth International Conference on Deductive and Object-Oriented Databases, pages:319-344, December 04-07, 1995
  15. Peter Buneman, Mary Fernandez, Dan Suiciu. UnQL: A Query Language and Algebra for Semistructured Data Based on Structural Recursion, VLDB Journal manuscript, 2000 https://doi.org/10.1007/s007780050084
  16. Peter Buneman, Susan Davidson, Gerd Hillebrand, Dan Suciu. A Query Language and Optimization Techniques for Unstructured Data. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages:505-516, 1996 https://doi.org/10.1145/235968.233368
  17. Svetlozar Nestorov, Serge Abiteboul, Rajeev Motwani. Inferring Structure in Semistructured Data. In Proceedings of the Workshop on Management of Semistructured Data, 1997
  18. Yannis Papakonstantinou, Serge Abiteboul, Hector Garcia-Molina. Object Fusion in Mediator Systems, In Proceedings of Very Large Data Bases, pages:413-424, September 1996
  19. XPath(XML Path Language), http://www.w3.org/TR/xpath
  20. Arnaud Sahuguet. Kweelt: More than just 'yet another framework to query XML!,' Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, 2001 https://doi.org/10.1145/375663.375757
  21. Young-Kwang Nam, Joseph Goguen, Guilian Wang. A Metadata Integration Assistant Generator for Heterogeneous Distributed Databases, by, in Proceedings, International Conference on Ontologies, DataBases, and Applications of Semantics for Large Scale Information Systems, Springer, Lecture Notes in Computer Science, Volume 2519, pages:1332-1344, 2002, from a conference held in Irvine CA, 29-31, October 2002