유사 구조를 가지는 XML 문서들의 DTD 통합 알고리즘

A Unification Algorithm for DTDs of XML Documents having a Similar Structure

  • 유춘식 (전북대학교 전산통계학과) ;
  • 우선미 (전북대학교 BK21 전자정보사업단) ;
  • 김용성 (전북대학교 전자정보공학부)
  • 발행 : 2004.10.01


논리적으로 동일한 종류에 속하여 서로 유사한 구조를 가지는 많은 XML 문서들이 서로 다른 종류로 분류되어 서로 다른 문서형 정의(DTD)를 가지게 되는 경우가 많다. 이로 인하여 XML 문서를 저장하기 위한 데이타베이스의 스키마가 서로 다르게 되고, 동일한 데이타베이스에 저장되어야 하는 XML 문서들이 서로 다른 데이타베이스에 저장되는 문제점이 발생하게 된다. 이러한 문제점을 해결하기 위하여 본 논문에서는 유한 오토마타와 트리구조를 이용하여 유사한 구조를 가지는 XML 문서들의 DTD를 통합하는 알고리즘을 제안한다. 유한 오토마타는 DTD의 반복연산자나 연결자를 표현하기에 적합하고 표현 방법이 단순하므로 DTD 통합 알고리즘의 복잡도를 감소시킬 수 있다. 또한 제안한 알고리즘의 효과성을 검증하기 위하여 국내 학회 논문지의 논문 DTD를 통합하는데 본 논문에서 제안한 알고리즘을 적용한다.

There are many cases that many XML documents have different DTDs in spite of having a similar structure and being logically same kind of document. For this reason, It occurs a problem that these XML documents have different database schema and are stored in different databases. So, in this paper, we propose an algorithm that unifies DTDs of these XML documents using the finite automata and the tree structure. The finite automata is suitable for representing repetition operators and connectors of DTD, and is a simple representation method for DTD. By using the finite automata, we are able to reduce the complexity of algorithm. And we apply a proposed algorithm to unify DTDs of science journals.



  1. Seung-Jin Lim Yiu-Kai Ng, 'An Automated Integration Approach for Semi-Structured and Structured Data,' 3rd Int'l Symposium on Cooperative Database Systems and Applications(CODAS 2001), pp. 15-24, Beijing, China, Apr. 2001 https://doi.org/10.1109/CODAS.2001.945144
  2. Chantal Reynaud, Jean-Pierre Sirot, Dan Vodislav, 'Semantic Integration of XML Heterogeneous Data Sources,' Int'l Database Engineering & Application Symposium (IDEAS2001), pp. 199-208, Grenoble, France, July 2001 https://doi.org/10.1109/IDEAS.2001.938086
  3. Patricia Rodriguez-Gianelli, John Mylopoulos, 'A Semantic Approach to XML-based Data Integration,' 20th Int'l Conf. on Conceptual Modeling (ER'2001), pp. 117-132, Yokohama, Japan, Nov. 2001
  4. Marie-Christine Rousset, Chantal Reynaud, 'Knowledge representation for information integration,' Information Systems, Vol. 29, pp. 3-22, 2004 https://doi.org/10.1016/S0306-4379(03)00032-2
  5. XML 1.0(Third Edition), W3C Recommendation, http://www. w3.org/TR/2004/REC-xml-20040204, Feb. 2004
  6. Boris Chidlovskii, 'Using Regular Automata as XML schemas,' 4'th IEEE Advances in Digital Libraries Conferencer(ADL 2000), pp. 1-10, Washington, USA, May 2000
  7. Ronaldo dos Santos Mello, Silvana Castano, Carlos Alberto Heuser, 'A method for unification of XML schemata,' Information and Software Technology, Vol. 44, No.4, pp, 241-249, 2002 https://doi.org/10.1016/S0950-5849(02)00014-9
  8. Chun-Sik Yoo, Seon-Mi Woo, Yong-Sung Kim, 'Automatic Generation Algorithm of Uniform DTD for Structured Documents,' Proc. of IEEE Region 10 Conf. TENCON'99, Vol. II, pp. 1095-1098, 1999 https://doi.org/10.1109/TENCON.1999.818614
  9. Euna Jeong, Chun-Nan Hsu, 'Veiw Inference for Heterogeneous XML Information Integration,' Journal of Intelligent Information Systems, Vol. 20, No.1, pp 81-99, 2003 https://doi.org/10.1023/A:1020999107730
  10. Helena Ahonen, 'Generating Grammars for Structured Documents Using Grammatical Inference Methods,' University of Helsinki, Ph. D Thesis, 1996
  11. OmniMark, 'OmniMark : Content Model Algebra,' http://www.exoterica.com/white/ cma/cma.htm
  12. Keith E. Shafer, Roger Thompson, 'Translating Mathematical Markup for Electronic Documents,' http://www.oclc.org/fred/docs/www4.htm
  13. Anhai Doan, Pedro Domingos, 'Learning to Match the Schemas of Data Sources:A Multistrategy Approach,' Machine Learning, Vol. 50, pp. 279-301, 2003 https://doi.org/10.1023/A:1021765902788
  14. Elisa Bertino, Giovanna Geunini, Marco Mesiti, 'A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications,' Information Systems, Vol. 29, pp. 23-46, 2004 https://doi.org/10.1016/S0306-4379(03)00031-0
  15. Murali Mani, Dongwon Lee, 'XML to Relational Conversion using Theory of Regular Tree Grammars,' 1st VLDB Workshop on Efficiency and Effectiveness of XML Tools, and Techniques (EEXTT 2002), pp. 81-103, Hong Kong, China, Aug. 2002
  16. Wolfgang May, Georg Lausen, 'A uniform framework for integration of information from the web,' Information Systems, Vol. 29, pp. 59-91, 2004 https://doi.org/10.1016/S0306-4379(03)00005-X