XML Element Matching Algorithm based on Structural Properties and Rules

룰과 구조적 속성에 기반한 XML 엘리먼트 매칭 알고리즘

  • Published : 2013.03.30

Abstract

XML schema matching is the task of finding semantic correspondences between elements of two schemas. XML schema matching plays an important role in many application, such as schema integration, data integration, data warehousing, data transformation, peer-to-peer data management, semantic web etc. In this paper, we propose an XML element matching algorithm based on rules and structural properties. The proposed algorithm involves classifying elements as unique or non-unique elements according to the structural properties of XML documents and deciding on element matching in accordance with rules. We present experimental results that demonstrate the effectiveness of the proposed approach.

XML 스키마 매칭은 두 스키마의 엘리먼트들 간의 의미적인 유사성을 찾는 작업이다. XML 스키마 매칭은 스키마 통합, 데이터 통합, 데이터 웨어하우징, 데이터 변환, P2P 데이터 관리, 시멘틱 웹 등과 같은 응용체계에서 중요한 역할을 한다. 본 논문은 룰과 구조적 속성에 기반한 XML 엘리먼트 매칭 알고리즘을 제안한다. 제안한 알고리즘에서는 먼저 XML 문서의 구조적 속성을 이용하여 엘리먼트들이 unique와 non-unique로 분류되며, 이후 룰에 따라 엘리먼트의 매칭여부를 결정한다. 제안 알고리즘의 효과성을 보이기 위해 인터넷에 공개된 XML 스키마를 이용하여 성능을 평가하였다. 또한 제안 알고리즘은 문서의 구조적 속성을 이용함으로써 사용자 주관성을 배제하고 객관성을 보장하며 특정 유형이 아닌 다양한 형태의 XML에 적용이 가능하다.

Keywords

References

  1. 정회경, "WWW 문서 작성을 위한 차세대 언어 XML 가이드," 그린, 1998.
  2. Doan, A. H., "Learning to map between structured representations of data," PhD thesis, University of Washington, 2002.
  3. Doan, A. H., "Ontology matching: A machine learning approach," 2004
  4. Kurgan, Kukasz, "Semantic Mapping of XML Tags using Inductive Machine Learning," In Proc. of the 2002 International Conference on Machine Learning and Applications, 2002.
  5. Zamboulis, L., "XML Schema Matching & XML Data Migration & Integration: A Step Towards The Semantic Web Vision," Technical Report, 2003.
  6. Madhavan, "Generic schema matching with cupid," In The VLDB Journal, pp. 49-58, 2001.
  7. Melnik, S., "Similarity Flooding: A versatile Graph Matching Algorithm and its Application to Schema Matching," In Proceedings of the 18th International Conference on Data Engineering, pp. 117-128, 2002.
  8. Liao, Yongxin, "Model-driven Rule-based Mediation in XML Data Exchange," Proceedings of the First International Workshop on Model-Driven Interoperability, pp. 89-97, 2010.
  9. Do, H. H., "COMA-a system for flexible combination of schema matching approaches," In Proceedings of the Very Large Data Bases Conference (VLDB), 2001.
  10. Miller, A. G., "WordNet: A lexical Database for English," Commun. ACM, Vol. 38, No. 11, pp. 39-41, 1995.
  11. Budanitsky, A., "Semantic distance in WordNet. An experimental, application oriented evaluation of five measures," WordNet and Other Lexical Resources Workshop, 2001.
  12. Algergawy, "Improving XML schema matching using prufer sequences," Data & Knowledge Engineering, Vol. 68, Issue 8, pp. 728-747, 2009. https://doi.org/10.1016/j.datak.2009.01.001
  13. Xu, L., "Source Discovery and Schema Mapping for Data Integration," PhD thesis, 2003.
  14. Nayak, "A progressive clustering algorithm to group the XML data by structural and semantic similarity," International Journal of Pattern Recognition and Artificial Intelligence, pp. 723-743, 2007.
  15. Huynh, Thang, "XML Schema Automatic Matching Solution," International Journal of Electrical Computer and Systems Engineering, Vol. 4, Issue 1, 2010.
  16. Kim, Jaewook, "An Optimization Approach for Semantic-based XML Schema Matching," International Journal of Trade, Economics and Finance, Vol. 2, No. 1, 2011.
  17. Rajesh, A., "XML Schema Matching-Using Structural Information," International Journal of Computer Applications, International Journal of Computer Applications, Vol. 8, Issue 2, pp. 34-41, 2010.
  18. Lin, Dekang, "An Information-Theoretic Definition of Similarity," 1998.
  19. Douglas, W. B., Introduction to Graph Theory (2nd ed.), Prentice Hall, Chapter 3, 1999.
  20. Do, H. H., "Comparison of Schema Matching Evaluations," Web, Web-Services, and Database Systems, 2002.
  21. Aumuuller, D., "Schema and ontology matching with COMA++," In Proceedings of the International Conference on Management of Data, Software Demonstration, pp. 906-908, 2005.