• Title/Summary/Keyword: 서술어 온톨로지

Search Result 10, Processing Time 0.029 seconds

Automatic Ontology Generation from Natural Language Sentences Using Predicate Ontology (서술어 온톨로지를 이용한 자연어 문장으로부터의 온톨로지 자동 생성)

  • Min, Young-Kun;Lee, Bog-Ju
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.9
    • /
    • pp.1263-1271
    • /
    • 2010
  • Ontologies, the important implementation tools for semantic web, are widely used in various areas such as search, reasoning, and knowledge representation. Developing well-defined ontologies, however, requires a lot of resources in terms of time and materials. There have been efforts to construct ontologies automatically to overcome these problems. In this paper, ontologies are automatically constructed from the natural languages sentences directly. To do this, the analysis of morphemes and a sentence structure is performed at first. then, the program finds predicates inside the sentence and the predicates are transformed to the corresponding ontology predicates. For matching the corresponding ontology predicate from a predicate in the sentence, we develop the "predicate ontology". An experimental comparison between human ontology engineer and the program shows that the proposed system outperforms the human engineer in an accuracy.

Predicate Ontology for Automatic Ontology Building (온톨로지 자동 구축을 위한 서술어 온톨로지)

  • Min, Young-Kun;Lee, Bog-Ju
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.05a
    • /
    • pp.28-31
    • /
    • 2008
  • 시맨틱 웹의 기반인 온톨로지는 검색, 추론, 지식표현 등 다양한 분야에서 사용하고 있다. 하지만 잘 구성된 온톨로지를 개발하는 것은 시간적, 물질적으로 많은 자원이 소모된다. 온톨로지를 자동으로 구축하면 이러한 소모를 줄일 수 있는 장점이 있다. 본 논문에서는 자연어처리를 온톨로지 자동 구축에 사용하기 위하여 자연어의 서술부분을 온톨로지의 서술어로 변환할 수 있는 서술어 온톨로지를 제안한다. 그리고 제안된 서술어 온톨로지를 사용하여 자연어 문장의 서술어 부분을 온톨로지의 predicate 로 변환하는 알고리즘을 소개한다. 또한 제안된 온톨로지를 온톨로지 언어인 OWL을 사용하여 구축하였다.

Linking Korean Predicates to Knowledge Base Properties (한국어 서술어와 지식베이스 프로퍼티 연결)

  • Won, Yousung;Woo, Jongseong;Kim, Jiseong;Hahm, YoungGyun;Choi, Key-Sun
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1568-1574
    • /
    • 2015
  • Relation extraction plays a role in for the process of transforming a sentence into a form of knowledge base. In this paper, we focus on predicates in a sentence and aim to identify the relevant knowledge base properties required to elucidate the relationship between entities, which enables a computer to understand the meaning of a sentence more clearly. Distant Supervision is a well-known approach for relation extraction, and it performs lexicalization tasks for knowledge base properties by generating a large amount of labeled data automatically. In other words, the predicate in a sentence will be linked or mapped to the possible properties which are defined by some ontologies in the knowledge base. This lexical and ontological linking of information provides us with a way of generating structured information and a basis for enrichment of the knowledge base.

Design of a Contextual Lexical Knowledge Graph Extraction Algorithm (맥락적 어휘 지식 그래프 추출 알고리즘의 설계)

  • Nam, Sangha;Choi, Gyuhyeon;Hahm, Younggyun;Choi, Key-Sun
    • 한국어정보학회:학술대회논문집
    • /
    • 2016.10a
    • /
    • pp.147-151
    • /
    • 2016
  • 본 논문에서는 Reified 트리플 추출을 위한 한국어 개방형 정보추출 방법을 제시한다. 시맨틱웹 분야에서 지식은 흔히 RDF 트리플 형태로 표현되지만, 자연언어문장은 복수개의 서술어와 논항간의 관계로 구성되어 있다. 이러한 이유로, 시맨틱웹의 대표적인 지식표현법인 트리플을 따름과 동시에 문장의 의존구조를 반영하여 복수개의 술어와 논항간의 관계를 지식화하는 새로운 개방형 정보추출 시스템이 필요하다. 본 논문에서는 문장 구조에 대한 일관성있는 변환을 고려한 새로운 개방형 정보추출 방법을 제안하며, 개체중심의 지식과 사건중심의 지식을 함께 표현할 수 있는 Reified 트리플 추출방법을 제안한다. 본 논문에서 제안한 방법의 우수성과 실효성을 입증하기 위해 한국어 위키피디아 알찬글 본문을 대상으로 추출된 지식의 양과 정확도 측정 실험을 수행하였고, 본 논문에서 제안한 방식을 응용한 의사 SPARQL 질의 생성 모듈에 대해 소개한다.

  • PDF

Design of a Contextual Lexical Knowledge Graph Extraction Algorithm (맥락적 어휘 지식 그래프 추출 알고리즘의 설계)

  • Nam, Sangha;Choi, Gyuhyeon;Hahm, Younggyun;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.147-151
    • /
    • 2016
  • 본 논문에서는 Reified 트리플 추출을 위한 한국어 개방형 정보추출 방법을 제시한다. 시맨틱웹 분야에서 지식은 흔히 RDF 트리플 형태로 표현되지만, 자연언어문장은 복수개의 서술어와 논항간의 관계로 구성되어 있다. 이러한 이유로, 시맨틱웹의 대표적인 지식표현법인 트리플을 따름과 동시에 문장의 의존구조를 반영하여 복수개의 술어와 논항간의 관계를 지식화하는 새로운 개방형 정보추출 시스템이 필요하다. 본 논문에서는 문장 구조에 대한 일관성있는 변환을 고려한 새로운 개방형 정보추출 방법을 제안하며, 개체 중심의 지식과 사건중심의 지식을 함께 표현할 수 있는 Reified 트리플 추출방법을 제안한다. 본 논문에서 제안한 방법의 우수성과 실효성을 입증하기 위해 한국어 위키피디아 알찬글 본문을 대상으로 추출된 지식의 양과 정확도 측정 실험을 수행하였고, 본 논문에서 제안한 방식을 응용한 의사 SPARQL 질의 생성 모듈에 대해 소개한다.

  • PDF

Construct ion of Metaphor Ontology Using HowNet : Based on the Concept, 'Culture' (HowNet 기반 은유 온톨로지 구축: 추상개념 '문화'를 중심으로)

  • An, Dong-Gun;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 2006.10e
    • /
    • pp.205-212
    • /
    • 2006
  • 본 연구에서는 추상적 사고를 가능하게 해주는 개념은유 표현의 서술어를 분석하여, 추상개념의 근원영역을 찾는 알고리즘을 HowNet 지식 시스템을 이용하여 제안하고자 한다. 실제로 추상개념 '문화' 가 쓰인 242개의 은유 표현 용례 문장을 가지고 제안된 알고리즘으로 근원영역을 찾고. 이를 토대로. 목표영역 '문화' 의 근원영역이 추론기에 의하여 자동적으로 추론되는 HowNet 기반 은유 온톨로지의 구축 방안을 제시하고자 한다. 또한, 한국어 '문화' 와 영어표현 'Culture'의 근원영역 비교를 통하여 구축된 온톨로지를 영어 번역 및 작문에 어떻게 활용할 수 있는지 보이고자 한다.

  • PDF

Ontology Knowledge Base Scheme for User Query Semantic Interpretation (사용자 질의 의미 해석을 위한 온톨로지 지식베이스 스키마 구축)

  • Doh, Hana;Lee, Moo-Hun;Jeong, Hoon;Choi, Eui-In
    • Journal of Digital Convergence
    • /
    • v.11 no.3
    • /
    • pp.285-292
    • /
    • 2013
  • The method of recent information retrieval passes into an semantic search to provide more accurate results than keyword-based search. But in common user case, they are still accustomed to using existing keyword-based search. Hence they are hard to create a typed structured query language. In this paper, we propose to ontology knowledge-base scheme for query interpretation of these user. The proposed scheme was designed based on the OWL-DL for description logic reasoning, it can provide a richer representation of the relationship between the object by using SWRL(Semantic Web Rule Language). Finally, we are describe the experimental results of the similarity measurement for verification of a user query semantic interpretation.

Movie Recommended System base on Analysis for the User Review utilizing Ontology Visualization (온톨로지 시각화를 활용한 사용자 리뷰 분석 기반 영화 추천 시스템)

  • Mun, Seong Min;Kim, Gi Nam;Choi, Gyeong cheol;Lee, Kyung Won
    • Design Convergence Study
    • /
    • v.15 no.2
    • /
    • pp.347-368
    • /
    • 2016
  • Recently, researches for the word of mouth(WOM) imply that consumers use WOM informations of products in their purchase process. This study suggests methods using opinion mining and visualization to understand consumers' opinion of each goods and each markets. For this study we conduct research that includes developing domain ontology based on reviews confined to "movie" category because people who want to have watching movie refer other's movie reviews recently, and it is analyzed by opinion mining and visualization. It has differences comparing other researches as conducting attribution classification of evaluation factors and comprising verbal dictionary about evaluation factors when we conduct ontology process for analyzing. We want to prove through the result if research method will be valid. Results derived from this study can be largely divided into three. First, This research explains methods of developing domain ontology using keyword extraction and topic modeling. Second, We visualize reviews of each movie to understand overall audiences' opinion about specific movies. Third, We find clusters that consist of products which evaluated similar assessments in accordance with the evaluation results for the product. Case study of this research largely shows three clusters containing 130 movies that are used according to audiences'opinion.

A Trustworthiness Improving Link Evaluation Technique for LOD considering the Syntactic Properties of RDFS, OWL, and OWL2 (RDFS, OWL, OWL2의 문법특성을 고려한 신뢰향상적 LOD 연결성 평가 기법)

  • Park, Jaeyeong;Sohn, Yonglak
    • Journal of KIISE:Databases
    • /
    • v.41 no.4
    • /
    • pp.226-241
    • /
    • 2014
  • LOD(Linked Open Data) is composed of RDF triples which are based on ontologies. They are identified, linked, and accessed under the principles of linked data. Publications of LOD data sets lead to the extension of LOD cloud and ultimately progress to the web of data. However, if ontologically the same things in different LOD data sets are identified by different URIs, it is difficult to figure out their sameness and to provide trustworthy links among them. To solve this problem, we suggest a Trustworthiness Improving Link Evaluation, TILE for short, technique. TILE evaluates links in 4 steps. Step 1 is to consider the inference property of syntactic elements in LOD data set and then generate RDF triples which have existed implicitly. In Step 2, TILE appoints predicates, compares their objects in triples, and then evaluates links between the subjects in the triples. In Step 3, TILE evaluates the predicates' syntactic property at the standpoints of subject description and vocabulary definition and compensates the evaluation results of Step 2. The syntactic elements considered by TILE contain RDFS, OWL, OWL2 which are recommended by W3C. Finally, TILE makes the publisher of LOD data set review the evaluation results and then decide whether to re-evaluate or finalize the links. This leads the publishers' responsibility to be reflected in the trustworthiness of links among the data published.

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.