• Title/Summary/Keyword: Extraction of headword

Search Result 4, Processing Time 0.017 seconds

Study on Extraction of Headwords for Compilation of 「Donguibogam Dictionary」 - Based on Corpus-based Analysis - (『동의보감사전』 편찬을 위한 표제어 추출에 관한 연구 - 코퍼스 분석방법을 바탕으로 -)

  • Jung, Ji-Hun;Kim, Do-Hoon;Kim, Dong-Ryul
    • The Journal of Korean Medical History
    • /
    • v.29 no.1
    • /
    • pp.47-54
    • /
    • 2016
  • This article attempts to extract headwords for complication of "Donguibogam Dictionary" with Corpus-based Analysis. The computerized original text of Donguibogam is changed into a text file by a program 'EM Editor'. Chinese characters of high frequency of exposure among Chinese characters of Donguibogam are extracted by a Corpus-based analytical program 'AntConc'. Two-syllable, three-syllable, four-syllable, and five-syllable words including each Chinese characters of high frequency are extracted through n-cluster, one of functions of AntConc. Lastly, The output that is meaningful as a word is sorted. As a result, words that often appear in Donguibogam can be sorted in this article, and the names of books, medical herbs, disease symptoms, and prescriptions often appear especially. This way to extract headwords by this Corpus-based Analysis can suggest better headwords list for "Donguibogam Dictionary" in the future.

Science and Technology Terminology Dictionary Building Process and Workbench Development in Defense Area (국방과학기술 전문용어 사전 구축을 위한 프로세스 및 워크벤치 개발)

  • Choi, Jung-Whoan;Park, Jeong-Ho;Kim, Kyung-Sun;Kim, Pyung
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.8
    • /
    • pp.420-428
    • /
    • 2012
  • To improve the efficiency of business, it is important to standardize the meaning of terminology. And then, terminology dictionaries have been actively being built and used in various fields. In defense area, the publication of defense terminology dictionary is useful for information exchange of each army and distribution of standardized terminology. Defense agency for technology and quality(DTaQ) publishes terminology dictionary of defense science and technology on a three-year cycle. DTaQ tries to standardize the construction process of terminology dictionary and improve service efficiency by using terminology dictionary in defense area. This proposed method is based on the result of previous study about standardization of terminology dictionary. We suggest the practical steps including terminology dictionary constructing process, composition and role of organization, definition of headword, selection of target documents to be extracted terminology candidate, terminology extraction, generation of terminology candidate group, workbench registration, construction and validation of terminology dictionary. Thesaurus and workbench are developed to use and support terminology dictionary effectively.

Incremental Enrichment of Ontologies through Feature-based Pattern Variations (자질별 관계 패턴의 다변화를 통한 온톨로지 확장)

  • Lee, Sheen-Mok;Chang, Du-Seong;Shin, Ji-Ae
    • The KIPS Transactions:PartB
    • /
    • v.15B no.4
    • /
    • pp.365-374
    • /
    • 2008
  • In this paper, we propose a model to enrich an ontology by incrementally extending the relations through variations of patterns. In order to generalize initial patterns, combinations of features are considered as candidate patterns. The candidate patterns are used to extract relations from Wikipedia, which are sorted out according to reliability based on corpus frequency. Selected patterns then are used to extract relations, while extracted relations are again used to extend the patterns of the relation. Through making variations of patterns in incremental enrichment process, the range of pattern selection is broaden and refined, which can increase coverage and accuracy of relations extracted. In the experiments with single-feature based pattern models, we observe that the features of lexical, headword, and hypernym provide reliable information, while POS and syntactic features provide general information that is useful for enrichment of relations. Based on observations on the feature types that are appropriate for each syntactic unit type, we propose a pattern model based on the composition of features as our ongoing work.

The Automatic Extraction of Hypernyms and the Development of WordNet Prototype for Korean Nouns using Korean MRD (Machine Readable Dictionary) (국어사전을 이용한 한국어 명사에 대한 상위어 자동 추출 및 WordNet의 프로토타입 개발)

  • Kim, Min-Soo;Kim, Tae-Yeon;Noh, Bong-Nam
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.6
    • /
    • pp.847-856
    • /
    • 1995
  • When a human recognizes nouns in a sentence, s/he associates them with the hyper concepts of onus. For computer to simulate the human's word recognition, it should build the knowledge base (WordNet)for the hyper concepts of words. Until now, works for the WordNet haven't been performed in Korea, because they need lots of human efforts and time. But, as the power of computer is radically improved and common MRD becomes available, it is more feasible to automatically construct the WordNet. This paper proposes the method that automatically builds the WordNet of Korean nouns by using the descripti on of onus in Korean MRD, and it proposes the rules for extracting the hyper concepts (hypernyms)by analyzing structrual characteristics of Korean. The rules effect such characteristics as a headword lies on the rear part of sentences and the descriptive sentences of nouns have special structure. In addition, the WordNet prototype of Korean Nouns is developed, which is made by combining the hypernyms produced by the rules mentioned above. It extracts the hypernyms of about 2,500 sample words, and the result shows that about 92per cents of hypernyms are correct.

  • PDF