• Title/Summary/Keyword: Unknown-noun Recognition

Search Result 3, Processing Time 0.017 seconds

Korean Unknown-noun Recognition using Strings Following Nouns in Words (명사후문자열을 이용한 미등록어 인식)

  • Park, Ki-Tak;Seo, Young-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.4
    • /
    • pp.576-584
    • /
    • 2017
  • Unknown nouns which are not in a dictionary make problems not only morphological analysis but also almost all natural language processing area. This paper describes a recognition method for Korean unknown nouns using strings following nouns such as postposition, suffix and postposition, suffix and eomi, etc. We collect and sort words including nouns from documents and divide a word including unknown noun into two parts, candidate noun and string following the noun, by finding same prefix morphemes from more than two unknown words. We use information of strings following nouns extracted from Sejong corpus and decide unknown noun finally. We obtain 99.64% precision and 99.46% recall for unknown nouns occurred more than two forms in news of two portal sites.

Step-by-step Approach for Effective Korean Unknown Word Recognition (한국어 미등록어 인식을 위한 단계별 접근방법)

  • Park, So-Young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.05a
    • /
    • pp.369-372
    • /
    • 2009
  • Recently, newspapers as well as web documents include many newly coined words such as "mid"(meaning "American drama" since "mi" means "America" in Korean and "d" refers to the "d" of drama) and "anseup"(meaning "pathetic" since "an" and "seup" literally mean eyeballs and moist respectively). However, these words cause a Korean analyzing system's performance to decrease. In order to recognize these unknown word automatically, this paper propose a step-by-step approach consisting of an unknown noun recognition phase based on full text analysis, an unknown verb recognition phase based on web document frequency, and an unknown noun recognition phase based on web document frequency. The proposed approach includes the phase based on full text analysis to recognize accurately the unknown words occurred once and again in a document. Also, the proposed approach includes two phases based on web document frequency to recognize broadly the unknown words occurred once in the document. Besides, the proposed model divides between an unknown noun recognition phase and an unknown verb recognition phase to recognize various unknown words. Experimental results shows that the proposed approach improves precision 1.01% and recall 8.50% as compared with a previous approach.

  • PDF

Phase-based Model Using Web Documents for Korean Unknown Word Recognition (웹문서를 이용한 단계별 한국어 미등록어 인식 모델)

  • Park, So-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.9
    • /
    • pp.1898-1904
    • /
    • 2009
  • Recently, real documents such as newspapers as well as blogs include newly coined words such as "Wikipedia". However, most previous information processing technologies cannot deal with these newly coined words because they construct their dictionaries based on materials acquired during system development. In this paper, we propose a model to automatically recognize Korean unknown words excluded from the previously constructed dictionary. The proposed model consists of an unknown noun recognition phase based on full text analysis, an unknown verb recognition phase based on web document frequency, and an unknown noun recognition phase based on web document frequency. The proposed model can recognize accurately the unknown words occurred once and again in a document by the full text analysis. Also, the proposed model can recognize broadly the unknown words occurred once in the document by using web documents. Besides, the proposed model fan recognize both a Korean unknown verb, which syllables can be changed from its base form by inflection, and a Korean unknown noun, which syllables are not changed in any eojeol. Experimental results shows that the proposed model improves precision 1.01% and recall 8.50% as compared with a previous model.