Segmentation of Korean Compound Nouns Using Semantic Category Analysis of Unregistered Nouns

미등록어의 의미 범주 분석을 이용한 복합명사 분해

  • 강유환 (충북대학교 컴퓨터공학과) ;
  • 서영훈 (충북대학교 전기전자컴퓨터공학부)
  • Published : 2004.12.01

Abstract

This paper proposes a method of segmenting compound nouns which include unregistered nouns into a correct combination of unit nouns using characteristics of person's names, loanwords, and location names. Korean person's name is generally composed of 3 syllables, only relatively small number of syllables is used as last names, and the second and the third syllables combination is somewhat restrictive. Also many person's names appear with clue words in compound nouns. Most loanwords have one or more syllables which cannot appear in Korean words, or have sequences of syllables different from usual Korean words. Location names are generally used with clue words designating districts in compound nouns. Use of above characteristics to analyze compound nouns not only makes segmentation more accurate, helps natural language systems use semantic categories of those unregistered nouns. Experimental results show that the precision of our method is approximately 98% on average. The precision of human names and loanwords recognition is about 94% and about 92% respectively.

Keywords