• Title/Summary/Keyword: 백과사전

Search Result 125, Processing Time 0.032 seconds

Restoring Omitted Sentence Constituents in Encyclopedia Documents Using Structural SVM (Structural SVM을 이용한 백과사전 문서 내 생략 문장성분 복원)

  • Hwang, Min-Kook;Kim, Youngtae;Ra, Dongyul;Lim, Soojong;Kim, Hyunki
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.131-150
    • /
    • 2015
  • Omission of noun phrases for obligatory cases is a common phenomenon in sentences of Korean and Japanese, which is not observed in English. When an argument of a predicate can be filled with a noun phrase co-referential with the title, the argument is more easily omitted in Encyclopedia texts. The omitted noun phrase is called a zero anaphor or zero pronoun. Encyclopedias like Wikipedia are major source for information extraction by intelligent application systems such as information retrieval and question answering systems. However, omission of noun phrases makes the quality of information extraction poor. This paper deals with the problem of developing a system that can restore omitted noun phrases in encyclopedia documents. The problem that our system deals with is almost similar to zero anaphora resolution which is one of the important problems in natural language processing. A noun phrase existing in the text that can be used for restoration is called an antecedent. An antecedent must be co-referential with the zero anaphor. While the candidates for the antecedent are only noun phrases in the same text in case of zero anaphora resolution, the title is also a candidate in our problem. In our system, the first stage is in charge of detecting the zero anaphor. In the second stage, antecedent search is carried out by considering the candidates. If antecedent search fails, an attempt made, in the third stage, to use the title as the antecedent. The main characteristic of our system is to make use of a structural SVM for finding the antecedent. The noun phrases in the text that appear before the position of zero anaphor comprise the search space. The main technique used in the methods proposed in previous research works is to perform binary classification for all the noun phrases in the search space. The noun phrase classified to be an antecedent with highest confidence is selected as the antecedent. However, we propose in this paper that antecedent search is viewed as the problem of assigning the antecedent indicator labels to a sequence of noun phrases. In other words, sequence labeling is employed in antecedent search in the text. We are the first to suggest this idea. To perform sequence labeling, we suggest to use a structural SVM which receives a sequence of noun phrases as input and returns the sequence of labels as output. An output label takes one of two values: one indicating that the corresponding noun phrase is the antecedent and the other indicating that it is not. The structural SVM we used is based on the modified Pegasos algorithm which exploits a subgradient descent methodology used for optimization problems. To train and test our system we selected a set of Wikipedia texts and constructed the annotated corpus in which gold-standard answers are provided such as zero anaphors and their possible antecedents. Training examples are prepared using the annotated corpus and used to train the SVMs and test the system. For zero anaphor detection, sentences are parsed by a syntactic analyzer and subject or object cases omitted are identified. Thus performance of our system is dependent on that of the syntactic analyzer, which is a limitation of our system. When an antecedent is not found in the text, our system tries to use the title to restore the zero anaphor. This is based on binary classification using the regular SVM. The experiment showed that our system's performance is F1 = 68.58%. This means that state-of-the-art system can be developed with our technique. It is expected that future work that enables the system to utilize semantic information can lead to a significant performance improvement.

An Analytic Study of the Current Korean printed Encyclopedia (한국의 현행 인쇄본 백과사전에 관한 분석적 연구)

  • 김문영;박준식
    • Journal of Korean Library and Information Science Society
    • /
    • v.31 no.1
    • /
    • pp.1-21
    • /
    • 2000
  • This study intends to analyze the general printed encyclopedia published and circulated in Korea. The main purpose it is to offer extensive information helping users or librarians to select superior encyclopedia. 4 printed encyclopedia published and circulated in Korea since 1993 are selected for this purpose And they are analyzed on the basis of 6 evaluation standards (purpose & scope, authority, accessibility, clarity & readability, format and special features & cost). Especially, 4 printed encyclopedia among the best American encyclopedia having long history were also co-analyzed to certificate the validity of the evaluative standards.

  • PDF

The Korean Dictionary and the Buddhist Language: Description of Popularity of Buddhism Terminologies (국어사전과 불교 언어: '불교' 영역의 전문용어 기술을 중심으로)

  • Kim, Han-saem
    • Cross-Cultural Studies
    • /
    • v.45
    • /
    • pp.195-218
    • /
    • 2016
  • This paper focuses on the terminology of the dictionary as an encyclopedic element in the Buddhist language. The previous study on Buddhist language can be divided into a philosophical approach to the language itself, a review of the linguistic perspective of a specific monk, and a linguistic examination of the Buddhist language. The linguistic examination of the Buddhist language is further divided into analysis of certain Buddhist scriptures, and a study of vocabulary used throughout the Buddhist sphere. The Buddhist vocabulary in the existing Korean dictionary is found in specialized areas such as name, place name, history, and Buddhism. By advancing the generalized words of the terminologies, the meanings of the terminologies and the general words are described as polysemy. It is possible to identify the degree of transition from terminology to general word depending on the distribution of senses. In the case of Buddhist vocabulary, the usage as a general language and as a Buddhist term was given priority, and vocabulary was also derived from a Buddhist term, although it was described as a general language. If a dictionary is changed according to the language unit it contains, the difference in a Buddhist terminology for each dictionary, and a conflict between the existing academic research result and the prior description, will need to be resolved through the collaboration of religious experts, linguists, and lexicographers.

Knowledge-Based Question Answering System for Aquisition of Concept Word (개념어의 습득을 위한 지식기반 질의응답 시스템)

  • Lee, Jae-Hong;Choe, Ho-Seop;Ock, Cheol-Young
    • Annual Conference on Human and Language Technology
    • /
    • 2003.10d
    • /
    • pp.95-100
    • /
    • 2003
  • 본 논문에서는 현실 세계가 가지고 있는 지식이 어느 정도 체계적으로 정제되어 있는 국어사전, 백과사전 등을 중심으로, Hybrid Method를 이용한 통계(Statistics)기반 지식베이스와 어휘분류(Lexicon Classification)기반 지식베이스를 효율적으로 구축하여 질의응답시스템에 활용한다. 또한 특정한 문서를 보여주는 일반적인 질의응답시스템과는 달리, 이러한 지식베이스를 이용하여 사용자에게 정확한 개념어(정답어)를 습득하게끔 해주고, 사용자의 인지 체계 속에 어렴풋이 내포되어 있는 개념적 지식을 더욱더 표면적으로 확장해 나갈 수 있는 질의응답시스템을 구축하는 방안을 제시한다.

  • PDF

Usability of Digital Encyclopedia of Korean Local Culture(www.grandculture.net) for local identity (지역정체성 형성을 위한 향토문화전자대전(www.grandculture.net)의 활용)

  • Yun, Yu-Seok
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02b
    • /
    • pp.546-551
    • /
    • 2008
  • "Encyclopedia of Korea Local Culture(www.grandculture.net)" is a digital encyclopedia system for local culture. It contains a local history, culture geography and who is who HCI wants computer to help a human life. Then what is useful information for human and human culture? Maybe a knowing of own local culture makes one more happy. because it will help to understand oneself This article will show how this encyclopedia manages the cultural resources of local.

  • PDF

청소용 로봇의 현황과 미래

  • 박성일
    • Journal of the KSME
    • /
    • v.44 no.4
    • /
    • pp.53-58
    • /
    • 2004
  • 로봇의 백과사전적 의미를 보면, 사람의 손발과 같은 동작을 하는 기계라고 되어 있다 로봇이라는 말은 체코어의 '일한다(robota)'라는 뜻으로, 1920년에 체코의 작가 K. 차페크가 희곡 (로섬의 인조인간 : Rossum's Universal Robots)을 발표한 이래로 쓰이게 되었다.(중략)

  • PDF

Automatic Construction of Class Hierarchies and Named Entity Dictionaries using Korean Wikipedia (한국어 위키피디아를 이용한 분류체계 생성과 개체명 사전 자동 구축)

  • Bae, Sang-Joon;Ko, Young-Joong
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.492-496
    • /
    • 2010
  • Wikipedia as an open encyclopedia contains immense human knowledge written by thousands of volunteer editors and its reliability is also high. In this paper, we propose to automatically construct a Korean named entity dictionary using the several features of the Wikipedia. Firstly, we generate class hierarchies using the class information from each article of Wikipedia. Secondly, the titles of each article are mapped to our class hierarchies, and then we calculate the entropy value of the root node in each class hierarchy. Finally, we construct named entity dictionary with high performance by removing the class hierarchies which have a higher entropy value than threshold. Our experiment results achieved overall F1-measure of 81.12% (precision : 83.94%, recall : 78.48%).

EGGCYCLOPEDIA (달걀백과)

  • 대한양계협회
    • KOREAN POULTRY JOURNAL
    • /
    • v.20 no.6 s.224
    • /
    • pp.99-101
    • /
    • 1988
  • 미국달걀위원회에서 편찬한 책으로 '달걀교육의 백과사전과정'이라고도 불린다. 이 책을 다시 설명하면 달걀지식을 폭넓게 다루었고 이와 관련된 모든 분야의 정보를 수록하였으며 각 주제별 알파벳순으로 정리 되었다. 또한 달걀에 관련된 주제에 대해 정의와 설명을 붙였으며 이의 이해를 돕기위해 도해, 도표, 그래프 등을 적절히 사용한 소책자라고 할 수 있다. 본지는 독자들의 계란에 대한 올바르고 폭넓은 지식을 얻을 수 있도록 연재로 소개한다.

  • PDF

사막

  • Heo, Ju-Hui
    • The Science & Technology
    • /
    • no.3 s.406
    • /
    • pp.12-17
    • /
    • 2003
  • 모래 위에 펼쳐진 세계. 사막은 극히 건조하다. 연평균 강수량이 250mm 혹은 그 이하이며, 생물은 좀처럼 생존을 버티지 못한다. 지구 지표면의 5% 가량이 사막처럼 극히 제한된 강우량을 가지고 있다고 한국브리태니커백과사전에는 씌어있다. 공기가 희박한 히말라야의 고산지대나 물을 구하기 힘든 사막에서 인간은 자연과 더불어 행동이 구속된 삶을 살아갈 수밖에 없다.

  • PDF