DOI QR코드

DOI QR Code

비정형 문서의 정보추출을 통한 OWL 온톨로지 구축 시스템의 설계 및 구현

The Design and Implementation of OWL Ontology Construction System through Information Extraction of Unstructured Documents

  • 조대웅 (숭실대학교 대학원 컴퓨터학과) ;
  • 최지웅 (숭실대학교 컴퓨터학부) ;
  • 김명호 (숭실대학교 컴퓨터학부)
  • Jo, Dae Woong (Department of Computer Science and Engineering, Soongsil University) ;
  • Choi, Ji Woong (School of Computer Science and Engineering, Soongsil University) ;
  • Kim, Myung Ho (School of Computer Science and Engineering, Soongsil University)
  • 투고 : 2014.08.05
  • 심사 : 2014.09.03
  • 발행 : 2014.10.31

초록

정보검색 분야의 발전은 많은 양의 정보를 빠르게 찾아주는 것에서 사람이 원하는 정보를 정확하게 찾아주는 연구 분야로 넓혀가고 있다. 핵심 기술로는 개인화 및 시맨틱 웹 기술을 활용하고 있다. 웹 문서에 대한 자동색인 기술과 처리능력은 연구단계를 넘어 실용 서비스로 나타나고 있다. 하지만 웹 문서 이외의 첨부된 문서 형태에 대한 문서정보검색에 관한 연구는 미진한 상황이다. 본 논문에서는 텍스트, 워드, 한글과 같은 형식으로 작성된 비정형 문서의 본문 내용을 분석하여 OWL 온톨로지로 구축하는 방법에 대해 설명한다. 문서 온톨로지의 TBox를 구축하고, 문서로부터 얻을 수 있는 자원을 선정하여, 구축된 문서 온톨로지의 인스턴스로 활용할 수 있도록 시스템으로 구현한다. 이와 같은 비정형 문서의 온톨로지 자동 구축으로 해당 문서의 시맨틱 기술을 이용한 정보검색 및 문서관리 시스템에서 효과적으로 활용 가능하다.

The development of the information retrieval field is evolving to the research field searching accurately for the information from thing finding rapidly a large amount of information. Personalization and the semantic web technology is a key technology. The automatic indexing technology about the web document and throughput go beyond the research stage and show up as the practical service. However, there is a lack of research on the document information retrieval field about the attached document type of except the web document. In this paper, we illustrate about the method in which it analyzed the text content of the unstructured documents prepared in the text, word, hwp form and it how to construction OWL ontology. To build TBox of the document ontology and the resources which can be obtained from the document is selected, and we implement with the system in order to utilize as the instant of the constructed document ontology. It is effectually usable in the information retrieval and document management system using the semantic technology of the correspondence document as the ontology automatic construction of this kind of the unstructured documents.

키워드

참고문헌

  1. A. N. Langville, and C. D. Meyer, "Google's PageRank and beyond: The science of search engine rankings," Princeton University Press, 2011.
  2. T. Heath, and C. Bizer, "Linked Data: Evolving the Web into a Global Data Space," Synthesis Lectures on the Semantic Web: Theory and Technology, Morgan and Claypool Publishers, 2011.
  3. J. Lehmann, et al., "DBpedia-A Large-scale, Multilingual Knowledge Base Exracted from Wikipedia," Semantic Web Journal, Jun. 2013.
  4. SPARQL Endpoint, http://dbpedia.org/sparql
  5. S. I. Park et al., "A Methodology for Automatic Hierarchy Definition of Sentences in Engineering Documents," Journal of the computational structural engineering institute of Korea, Vol. 22, No. 4, pp. 323-330, Aug. 2009.
  6. N. F. Noy, and D. L. McGuinness. "Ontology development 101:A Guide to Creating Your First Ontology," Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-2001-0880, 2001.
  7. W3C Library Linked Data Incubator Group http://www.w3.org/2005/Incubator/lld/
  8. Simple Knowledge Organization System Reference, http://www.w3.org/TR/skos-reference
  9. Doublin Core, http://dublincore.org
  10. Friend of a Friend, http://xmlns.com/foaf/spec
  11. Naver encyclopedia of knowledge, http://terms.naver.com/entry.nhn?docId=64937&cid=544&categoryId=544
  12. S. H. Han, "A Study on Keyword Extraction From a Single Document Using Term Clustering," Journal of the Korean Society for Library and Information Science, Vol. 44, No. 3, pp. 155-173, Aug. 2010. https://doi.org/10.4275/KSLIS.2010.44.3.155
  13. State of the LOD Cloud, http://lod-cloud.net/state/
  14. M. Horridge, "A Practical Guide To Building OWL Ontologies Using Protege 4 and CO-ODE Tools Editon 1.3," The University of Manchester, 2011.
  15. M. Horridge and S. Bechhofer, "The OWL API_A Java API for OWL Ontologies," Semantic Web Journal, Vol. 2, No. 1, pp. 11-21, 2011.
  16. Apache Jena, https://jena.apache.org/
  17. Java-hwp, https://github.com/ddoleye/java-hwp
  18. The Apache POI Project, https://poi.apache.org/
  19. HanNanum, http://kldp.net/projects/hannanum/
  20. D. W. Jo, J. W. Choi and M. H. Kim, "SPARQL Query Tool for Using OWL Ontology," Journal of The Korea Society of Computer and Information, Vol. 14, No. 11, pp. 21-30, Nov. 2009.
  21. S. S. Cho, D. W. Jo and M. H. Kim, "The Design and Implementation of The Amendment Statement Automatic Generated System for Attached Tables in Legislation," Journal of the computational structural engineering institute of Korea, Vol. 19, No. 4, pp. 111-122, Apr. 2014. https://doi.org/10.9708/jksci.2014.19.4.111

피인용 문헌

  1. 온톨로지 기반 대용량 코호트 DB 검색 시뮬레이션 vol.25, pp.1, 2014, https://doi.org/10.9709/jkss.2016.25.1.029