The Design and Implementation of OWL Ontology Construction System through Information Extraction of Unstructured Documents

Jo, Dae Woong;Choi, Ji Woong;Kim, Myung Ho;

doi:10.9708/jksci.2014.19.10.023

Journal of the Korea Society of Computer and Information (한국컴퓨터정보학회논문지)

Volume 19 Issue 10
/
Pages.23-33
/
2014
/
1598-849X(pISSN)
/
2383-9945(eISSN)

Korean Society of Computer Information (한국컴퓨터정보학회)

DOI QR Code

The Design and Implementation of OWL Ontology Construction System through Information Extraction of Unstructured Documents

비정형 문서의 정보추출을 통한 OWL 온톨로지 구축 시스템의 설계 및 구현

Jo, Dae Woong (Department of Computer Science and Engineering, Soongsil University) ;
Choi, Ji Woong (School of Computer Science and Engineering, Soongsil University) ;
Kim, Myung Ho (School of Computer Science and Engineering, Soongsil University)

조대웅 (숭실대학교 대학원 컴퓨터학과) ;
최지웅 (숭실대학교 컴퓨터학부) ;
김명호 (숭실대학교 컴퓨터학부)

Received : 2014.08.05
Accepted : 2014.09.03
Published : 2014.10.31

https://doi.org/10.9708/jksci.2014.19.10.023 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

The development of the information retrieval field is evolving to the research field searching accurately for the information from thing finding rapidly a large amount of information. Personalization and the semantic web technology is a key technology. The automatic indexing technology about the web document and throughput go beyond the research stage and show up as the practical service. However, there is a lack of research on the document information retrieval field about the attached document type of except the web document. In this paper, we illustrate about the method in which it analyzed the text content of the unstructured documents prepared in the text, word, hwp form and it how to construction OWL ontology. To build TBox of the document ontology and the resources which can be obtained from the document is selected, and we implement with the system in order to utilize as the instant of the constructed document ontology. It is effectually usable in the information retrieval and document management system using the semantic technology of the correspondence document as the ontology automatic construction of this kind of the unstructured documents.

정보검색 분야의 발전은 많은 양의 정보를 빠르게 찾아주는 것에서 사람이 원하는 정보를 정확하게 찾아주는 연구 분야로 넓혀가고 있다. 핵심 기술로는 개인화 및 시맨틱 웹 기술을 활용하고 있다. 웹 문서에 대한 자동색인 기술과 처리능력은 연구단계를 넘어 실용 서비스로 나타나고 있다. 하지만 웹 문서 이외의 첨부된 문서 형태에 대한 문서정보검색에 관한 연구는 미진한 상황이다. 본 논문에서는 텍스트, 워드, 한글과 같은 형식으로 작성된 비정형 문서의 본문 내용을 분석하여 OWL 온톨로지로 구축하는 방법에 대해 설명한다. 문서 온톨로지의 TBox를 구축하고, 문서로부터 얻을 수 있는 자원을 선정하여, 구축된 문서 온톨로지의 인스턴스로 활용할 수 있도록 시스템으로 구현한다. 이와 같은 비정형 문서의 온톨로지 자동 구축으로 해당 문서의 시맨틱 기술을 이용한 정보검색 및 문서관리 시스템에서 효과적으로 활용 가능하다.

Keywords

References

A. N. Langville, and C. D. Meyer, "Google's PageRank and beyond: The science of search engine rankings," Princeton University Press, 2011.
T. Heath, and C. Bizer, "Linked Data: Evolving the Web into a Global Data Space," Synthesis Lectures on the Semantic Web: Theory and Technology, Morgan and Claypool Publishers, 2011.
J. Lehmann, et al., "DBpedia-A Large-scale, Multilingual Knowledge Base Exracted from Wikipedia," Semantic Web Journal, Jun. 2013.
SPARQL Endpoint, http://dbpedia.org/sparql
S. I. Park et al., "A Methodology for Automatic Hierarchy Definition of Sentences in Engineering Documents," Journal of the computational structural engineering institute of Korea, Vol. 22, No. 4, pp. 323-330, Aug. 2009.
N. F. Noy, and D. L. McGuinness. "Ontology development 101:A Guide to Creating Your First Ontology," Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-2001-0880, 2001.
W3C Library Linked Data Incubator Group http://www.w3.org/2005/Incubator/lld/
Simple Knowledge Organization System Reference, http://www.w3.org/TR/skos-reference
Doublin Core, http://dublincore.org
Friend of a Friend, http://xmlns.com/foaf/spec
Naver encyclopedia of knowledge, http://terms.naver.com/entry.nhn?docId=64937&cid=544&categoryId=544
S. H. Han, "A Study on Keyword Extraction From a Single Document Using Term Clustering," Journal of the Korean Society for Library and Information Science, Vol. 44, No. 3, pp. 155-173, Aug. 2010. https://doi.org/10.4275/KSLIS.2010.44.3.155
State of the LOD Cloud, http://lod-cloud.net/state/
M. Horridge, "A Practical Guide To Building OWL Ontologies Using Protege 4 and CO-ODE Tools Editon 1.3," The University of Manchester, 2011.
M. Horridge and S. Bechhofer, "The OWL API_A Java API for OWL Ontologies," Semantic Web Journal, Vol. 2, No. 1, pp. 11-21, 2011.
Apache Jena, https://jena.apache.org/
Java-hwp, https://github.com/ddoleye/java-hwp
The Apache POI Project, https://poi.apache.org/
HanNanum, http://kldp.net/projects/hannanum/
D. W. Jo, J. W. Choi and M. H. Kim, "SPARQL Query Tool for Using OWL Ontology," Journal of The Korea Society of Computer and Information, Vol. 14, No. 11, pp. 21-30, Nov. 2009.
S. S. Cho, D. W. Jo and M. H. Kim, "The Design and Implementation of The Amendment Statement Automatic Generated System for Attached Tables in Legislation," Journal of the computational structural engineering institute of Korea, Vol. 19, No. 4, pp. 111-122, Apr. 2014. https://doi.org/10.9708/jksci.2014.19.4.111

Cited by

온톨로지 기반 대용량 코호트 DB 검색 시뮬레이션 vol.25, pp.1, 2014, https://doi.org/10.9709/jkss.2016.25.1.029

Journal of the Korea Society of Computer and Information (한국컴퓨터정보학회논문지)

The Design and Implementation of OWL Ontology Construction System through Information Extraction of Unstructured Documents

비정형 문서의 정보추출을 통한 OWL 온톨로지 구축 시스템의 설계 및 구현

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)