Step-by-step Approach for Effective Korean Unknown Word Recognition

한국어 미등록어 인식을 위한 단계별 접근방법

  • Published : 2009.05.29

Abstract

Recently, newspapers as well as web documents include many newly coined words such as "mid"(meaning "American drama" since "mi" means "America" in Korean and "d" refers to the "d" of drama) and "anseup"(meaning "pathetic" since "an" and "seup" literally mean eyeballs and moist respectively). However, these words cause a Korean analyzing system's performance to decrease. In order to recognize these unknown word automatically, this paper propose a step-by-step approach consisting of an unknown noun recognition phase based on full text analysis, an unknown verb recognition phase based on web document frequency, and an unknown noun recognition phase based on web document frequency. The proposed approach includes the phase based on full text analysis to recognize accurately the unknown words occurred once and again in a document. Also, the proposed approach includes two phases based on web document frequency to recognize broadly the unknown words occurred once in the document. Besides, the proposed model divides between an unknown noun recognition phase and an unknown verb recognition phase to recognize various unknown words. Experimental results shows that the proposed approach improves precision 1.01% and recall 8.50% as compared with a previous approach.

Acknowledgement

Supported by : 한국학술진흥재단