DOI QR코드

DOI QR Code

An Algorithm for Finding a Relationship Between Entities: Semi-Automated Schema Integration Approach

엔티티 간의 관계명을 생성하는 알고리즘: 반자동화된 스키마 통합

  • Kim, Yongchan (College of Business Administration, Seoul National University) ;
  • Park, Jinsoo (College of Business Administration, Seoul National University) ;
  • Suh, Jihae (Big Data Institute, Seoul National University)
  • 김용찬 (서울대학교 경영대학 경영정보시스템) ;
  • 박진수 (서울대학교 경영대학 경영정보시스템) ;
  • 서지혜 (서울대학교 빅데이터 연구원)
  • Received : 2018.06.03
  • Accepted : 2018.09.28
  • Published : 2018.09.30

Abstract

Database schema integration is a significant issue in information systems. Because schema integration is a time-consuming and labor-intensive task, many studies have attempted to automate it. Researchers typically use XML as the source schema and leave much of the work to be done through DBA intervention, e.g., there are various naming conflicts related to relationship names in schema integration. In the past, the DBA had to intervene to resolve the naming-conflict name. In this paper, we introduce an algorithm that automatically generates relationship names to resolve relationship name conflicts that occur during schema integration. This algorithm is based on an Internet collocation and English sentence example dictionary. The relationship between the two entities is generated by analyzing examples extracted based on dictionary data through natural language processing. By building a semi-automated schema integration system and testing this algorithm, we found that it showed about 90% accuracy. Using this algorithm, we can resolve the problems related to naming conflicts that occur at schema integration automatically without DBA intervention.

데이터 베이스 스키마 통합은 정보 시스템에서 매우 중요한 이슈이다. 스키마 통합은 시간과 노력이 상당히 많이 필요하기 때문에 그동안 많은 연구들은 자동화된 스키마 통합 시스템을 구축하기 위해 노력했다. 하지만 지금까지의 연구에서는 XML을 소스 스키마로 사용하고 여전히 많은 부분을 데이터 베이스 관리자의 개입이 필요하도록 남겨두었다. 예를 들면, 스키마 통합 시 발생하는 관계명 명칭 충돌과 같은 문제는 데이터 베이스 관리자가 직접 개입하여야 해결할 수 있었다. 이 논문에서는 스키마 통합 시 발생하는 관계명 명칭 충돌을 해결하기 위해 관계명을 자동으로 생성해주는 알고리즘을 소개한다. 이 알고리즘은 인터넷 연어(Collocation) 사전과 영어 예문을 기반으로 한다. 사전 데이터를 기반으로 하여 추출한 예문들을 자연어처리 과정을 통해 분석한 후 두 엔티티 사이의 관계명을 생성한다. 반자동화된 스키마 통합 시스템을 구축하여 이 알고리즘을 테스트해보았으며 그 결과 약 90%의 정확도를 나타냈다. 이 알고리즘을 적용하면 스키마 통합 시에 데이터 베이스 관리자의 개입을 최소화할 수 있으며 이는 자동화된 스키마 통합 시스템을 구축하는 데에 큰 도움이 될 것이다.

Keywords

References

  1. Algergawy, A., Richi, N., and Gunter S, "Element similarity measures in XML schema matching." Information Sciences, Vol. 180, No. 24 (2010), 4975-4998. https://doi.org/10.1016/j.ins.2010.08.022
  2. Batini, C., and Lenzerini, M, "A methodology for data schema integration in the entity relationship model," IEEE Transactions on Software Engineering, Vol.10, No.6 (1984), 650-664.
  3. Batini, C., Lenzerini, M., and Navathe, S. B, "A comparative analysis of methodologies for database schema integration," ACM computing surveys, Vol.18, No.4 (1986), 323-364. https://doi.org/10.1145/27633.27634
  4. Beeri, C., and Milo, T, "Schemas for integration and translation of structured and semi-structured data," International conference on database theory, Springer Berlin Heidelberg, 1999.
  5. Castano, S., De Antonellis, V., Fugini, M. G., and Pernici, B, "Conceptual schema analysis: techniques and applications," ACM Transactions on Database Systems, Vol. 23, No.3 (1998), 286-333. https://doi.org/10.1145/293910.293150
  6. Chau, P. Y., and Hu, P. J. H., "Information technology acceptance by individual professionals: A model comparison approach," Decision sciences, Vol. 32, No. 4(2001), 699-719. https://doi.org/10.1111/j.1540-5915.2001.tb00978.x
  7. Chen, P. P. S, "English sentence structure and entity-relationship diagrams," Information Sciences, Vol.29, No.2 (1983), 127-149. https://doi.org/10.1016/0020-0255(83)90014-2
  8. Chen, P. P. S, "The entity-relationship model-toward a unified view of data." ACM Transactions on Database Systems, Vo.1, No.1 (1976), 9-36. https://doi.org/10.1145/320434.320440
  9. Date, C. J. (1990). An Introduction to Database Systems, Vol. 1, Fifth Edn, Reading: Addison-Wesley.
  10. Davies, I., Green, P., Rosemann, M., Indulska, M., and Gallo, S. " How do practitioners use conceptual modeling in practice?" Data & Knowledge Engineering, Vol. 58, No. 3(2006), 358-380. https://doi.org/10.1016/j.datak.2005.07.007
  11. Gotthard, W., Lockemann, P. C., and Neufeld, A, "System-guided view integration for object-oriented databases," IEEE Transactions on knowledge and Data Engineering, Vol.4, No.1 (1992), 1-22. https://doi.org/10.1109/69.124894
  12. Gou, G., and Rada C, "Efficiently querying large XML data repositories: A survey." IEEE Transactions on Knowledge and Data Engineering, Vol.19, No. 10 (2007), 1381-1430 https://doi.org/10.1109/TKDE.2007.1060
  13. Hayne, S., and Ram, S, "Multi-user view integration system (MUVIS): An expert system for view integration," Data Engineering, 1990.
  14. Jin, S., and Kang, W, "Mapping Rules for ER to XML Using XML schema," Proceedings 10th Southern Association for Information Systems Conference. Jacksonville, Florida, USA. 2007.
  15. Kaul, M., Drosten, K., and Neuhold, E. J, "Viewsystem: Integrating heterogeneous information bases by object-oriented views," Data Engineering, 1990.
  16. Kwan, I., and Fong, J, "Schema integration methodology and its verification by use of information capacity," Information Systems, Vol. 24, No.5 (1999), 355-376. https://doi.org/10.1016/S0306-4379(99)00022-8
  17. Lee, M. L., and Ling, T. W, "A methodology for structural conflict resolution in the integration of entity-relationship schemas," Knowledge and Information Systems, Vol.5, No.2 (2003), 225-247. https://doi.org/10.1007/s10115-003-0077-x
  18. Melnik, S., Rahm, E., and Bernstein, P. A, "Rondo: A programming platform for generic model management," Proceedings of the 2003 ACM SIGMOD international conference on Management of data. ACM, 2003.
  19. Motro, A, "Superviews: Virtual integration of multiple databases," IEEE Transactions on Software Engineering, Vol.7 (1987), 785-798.
  20. Pottinger, R., and Bernstein, P. A, "Schema merging and mapping creation for relational sources," Proceedings of the 11th international conference on extending database technology: Advances in database technology. ACM, 2008.
  21. Spaccapietra, S., and Parent, C, "View integration: A step forward in solving structural conflicts," IEEE transactions on Knowledge and data Engineering, Vol. 6, No.2 (1994), 258-274. https://doi.org/10.1109/69.277770
  22. Spaccapietra, S., Parent, C., and Dupont, Y, "Model independent assertions for integration of heterogeneous schemas," The International Journal on Very Large Data Bases, Vol.1, No.1 (1992), 81-126. https://doi.org/10.1007/BF01228708
  23. Storey, V. C, "Understanding semantic relationships," The International Journal on Very Large Data Bases Vol.2, No.4 (1993), 455-488. https://doi.org/10.1007/BF01263048
  24. Suh, J., and Jinsoo P, "Effects of Domain Familiarity on Conceptual Modeling Performance." Journal of Database Management, Vol 28, No. 2 (2017), 27-55. https://doi.org/10.4018/JDM.2017040102
  25. Unal, O., and Afsarmanesh, H, "Semi-automated schema integration with SASMINT," Knowledge and information systems, Vol.23, No.1 (2010), 99-128. https://doi.org/10.1007/s10115-009-0217-z
  26. Zerdazi, A., and Myriam L, "Matching of Enhanced XML Schemas with a Measure of Structural-context Similarity." WEBIST (2007)