• Title/Summary/Keyword: Korean Thesaurus

Search Result 224, Processing Time 0.027 seconds

The Extraction of Head words in Definition for Construction of a Semi-automatic Lexical-semantic Network of Verbs (동사 어휘의미망의 반자동 구축을 위한 사전정의문의 중심어 추출)

  • Kim Hae-Gyung;Yoon Ae-Sun
    • Language and Information
    • /
    • v.10 no.1
    • /
    • pp.47-69
    • /
    • 2006
  • Recently, there has been a surge of interests concerning the construction and utilization of a Korean thesaurus. In this paper, a semi-automatic method for generating a lexical-semantic network of Korean '-ha' verbs is presented through an analysis of the lexical definitions of these verbs. Initially, through the use of several tools that can filter out and coordinate lexical data, pairs constituting a word and a definition were prepared for treatment in a subsequent step. While inspecting the various definitions of each verb, we extracted and coordinated the head words from the sentences that constitute the definition of each word. These words are thought to be the main conceptual words that represent the sense of the current verb. Using these head words and related information, this paper shows that the creation of a thesaurus could be achieved without any difficulty in a semi-automatic fashion.

  • PDF

An Integrated Ontological Approach to Effective Information Management in Science and Technology (과학기술 분야 통합 개념체계의 구축 방안 연구)

  • 정영미;김명옥;이재윤;한승희;유재복
    • Journal of the Korean Society for information Management
    • /
    • v.19 no.1
    • /
    • pp.135-161
    • /
    • 2002
  • This study presents a multilingual integrated ontological approach that enables linking classification systems. thesauri. and terminology databases in science and technology for more effective indexing and information retrieval online. In this integrated system, we designed a thesaurus model with concept as a unit and designated essential data elements for a terminology database on the basis of ISO 12620 standard. The classification system for science and technology adopted in this study provides subject access channels from other existing classification systems through its mapping table. A prototype system was implemented with the field of nuclear energy as an application area.

A Study on Added-Term Relationship of Thesaurus (시소러스의 부가관계에 관한 연구)

  • 한상길
    • Journal of the Korean Society for information Management
    • /
    • v.17 no.2
    • /
    • pp.119-138
    • /
    • 2000
  • The purpose of this study is to present solutions to expanding added-term relations which will fit new information retrieval environment. This report reviews standards for ISO 2788 and ANSI/NISO Z 39.19, and compares and analyzes 20 thesaurus added-term relations currently used to find out problems and limitations. Based on findings of the study, this report suggests how to expand thesaurus added-term relations to accomodate changes in information retrieval environment. In order to solve those problems, this report presents solutions to expanding added-term relationships as follows: First, definition was separated from scope note and scope note is categorized by type. Second, principles for the use of qualifiers are presented. Third, principles for expanding of descriptor term information are presented.

  • PDF

A Study of Ontology Construction Using Thesaurus: Transformation of Thesaurus into SKOS (시소러스를 활용한 온톨로지 구축방안 연구 - 시소러스의 SKOS 변환을 중심으로 -)

  • Han, Sung-Kook;Lee, Hyun-Sil
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.17 no.1
    • /
    • pp.285-303
    • /
    • 2006
  • This study suggests the method of converting thesauri to SKOS step by step and it is formalized in three stages of the conversion process. The study develops output and guidelines for each stage. The converting stages are: (1) Collecting and analyzing thesauri for understanding about structure of terms and semantics of relation. (2) Defining the conversion method and creating ontology of the thesauri. (3) Examining the preservation of forms and various semantic relations between the thesauri and then creating SKOS ontology. This method can be applied to the thesauruses with complicated relations in concepts. In the future, it is needed to have an embodiment of conversion after making the algorithm of conversion by stage with the method suggested in this research.

Automatic Document Classification Based on k-NN Classifier and Object-Based Thesaurus (k-NN 분류 알고리즘과 객체 기반 시소러스를 이용한 자동 문서 분류)

  • Bang Sun-Iee;Yang Jae-Dong;Yang Hyung-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.9
    • /
    • pp.1204-1217
    • /
    • 2004
  • Numerous statistical and machine learning techniques have been studied for automatic text classification. However, because they train the classifiers using only feature vectors of documents, ambiguity between two possible categories significantly degrades precision of classification. To remedy the drawback, we propose a new method which incorporates relationship information of categories into extant classifiers. In this paper, we first perform the document classification using the k-NN classifier which is generally known for relatively good performance in spite of its simplicity. We employ the relationship information from an object-based thesaurus to reduce the ambiguity. By referencing various relationships in the thesaurus corresponding to the structured categories, the precision of k-NN classification is drastically improved, removing the ambiguity. Experiment result shows that this method achieves the precision up to 13.86% over the k-NN classification, preserving its recall.

A Study on Unstructured text data Post-processing Methodology using Stopword Thesaurus (불용어 시소러스를 이용한 비정형 텍스트 데이터 후처리 방법론에 관한 연구)

  • Won-Jo Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.6
    • /
    • pp.935-940
    • /
    • 2023
  • Most text data collected through web scraping for artificial intelligence and big data analysis is generally large and unstructured, so a purification process is required for big data analysis. The process becomes structured data that can be analyzed through a heuristic pre-processing refining step and a post-processing machine refining step. Therefore, in this study, in the post-processing machine refining process, the Korean dictionary and the stopword dictionary are used to extract vocabularies for frequency analysis for word cloud analysis. In this process, "user-defined stopwords" are used to efficiently remove stopwords that were not removed. We propose a methodology for applying the "thesaurus" and examine the pros and cons of the proposed refining method through a case analysis using the "user-defined stop word thesaurus" technique proposed to complement the problems of the existing "stop word dictionary" method with R's word cloud technique. We present comparative verification and suggest the effectiveness of practical application of the proposed methodology.

An Extended Concept-based Image Retrieval System : E-COIRS (확장된 개념 기반 이미지 검색 시스템)

  • Kim, Yong-Il;Yang, Jae-Dong;Yang, Hyoung-Jeong
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.3
    • /
    • pp.303-317
    • /
    • 2002
  • In this paper, we design and implement E-COIRS enabling users to query with concepts and image features used for further refining the concepts. For example, E-COIRS supports the query "retrieve images containing black home appliance to north of reception set. "The query includes two types of concepts: IS-A and composite. "home appliance"is an IS-A concept, and "reception set" is a composite concept. For evaluating such a query. E-COIRS includes three important components: a visual image indexer, thesauri and a query processor. Each pair of objects in an mage captured by the visual image indexer is converted into a triple. The triple consists of the two object identifiers (oids) and their spatial relationship. All the features of an object is referenced by its old. A composite concept is detected by the triple thesaurus and IS-A concept is recolonized by the fuzzy term thesaurus. The query processor obtains an image set by matching each triple in a user with an inverted file and CS-Tree. To support efficient storage use and fast retrieval on high-dimensional feature vectors, E-COIRS uses Cell-based Signature tree(CS-Tree). E-COIRS is a more advanced content-based image retrieval system than other systems which support only concepts or image features.

A Korean Homonym Disambiguation System Using Refined Semantic Information and Thesaurus (정제된 의미정보와 시소러스를 이용한 동형이의어 분별 시스템)

  • Kim Jun-Su;Ock Cheol-Young
    • The KIPS Transactions:PartB
    • /
    • v.12B no.7 s.103
    • /
    • pp.829-840
    • /
    • 2005
  • Word Sense Disambiguation(WSD) is one of the most difficult problem in Korean information processing. We propose a WSD model with the capability to filter semantic information using the specific characteristics in dictionary dictions, and nth added information, useful to sense determination, such as statistical, distance and case information. we propose a model, which can resolve the issues resulting from the scarcity of semantic information data based on the word hierarchy system (thesaurus) developed by Ulsan University's UOU Word Intelligent Network, a dictionary-based toxicological database. Among the WSD models elaborated by this study, the one using statistical information, distance and case information along with the thesaurus (hereinafter referred to as 'SDJ-X model') performed the best. In an experiment conducted on the sense-tagged corpus consisting of 1,500,000 eojeols, provided by the Sejong project, the SDJ-X model recorded improvements over the maximum frequency word sense determination (maximum frequency determination, MFC, accuracy baseline) of $18.87\%$ ($21.73\%$ for nouns and inter-eojeot distance weights by $10.49\%$ ($8.84\%$ for nouns, $11.51\%$ for verbs). Finally, the accuracy level of the SDJ-X model was higher than that recorded by the model using only statistical information, distance and case information, without the thesaurus by a margin of $6.12\%$ ($5.29\%$ for nouns, $6.64\%$ for verbs).

A Korean Noun Semantic Hierarchy (Wordnet) Construction

  • Lee, Juho;Koaunghi Un;Bae, Hee-Sook;Park, Key-Sun
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.290-295
    • /
    • 2002
  • Since thesaurus is used as a knowledge resource in many natural language processing systems, it is very useful and necessary for the high quality systems, especially for dealing with semantics. In this paper, we introduce a semi-automatic method for the construction of Korean noun semantic hierarchy by utilizing a monolingual MRD and an existing thesaurus.

  • PDF

A Study on Automatic Keyword Classification (용어의 자동분류에 관한 연구)

  • Seo, Eun-Gyoung
    • Journal of the Korean Society for information Management
    • /
    • v.1 no.1
    • /
    • pp.78-99
    • /
    • 1984
  • In this paper, the automatic keyword classification which is one of the automatic construction methods of retrieval thesaurus is experimented to the Korean language on the basis that the use of retrieval thesaurus would increase the efficiency of information retrieval in the natural language retrieval system searching machine-readable data base. Furthermore, this paper proposes the application methods. In this experiment, the automatic keyword classification was based on the assumption that semantic relationships between terms can be found out by the statistical patterns of terms occurring in a text.

  • PDF