DOI QR코드

DOI QR Code

Automatic Construction of Korean Two-level Lexicon using Lexical and Morphological Information

어휘 및 형태 정보를 이용한 한국어 Two-level 어휘사전 자동 구축

  • 김보겸 (충북대학교 디지털정보융합학과) ;
  • 이재성 (충북대학교 디지털정보융합학과)
  • Received : 2013.07.29
  • Accepted : 2013.10.04
  • Published : 2013.12.31

Abstract

Two-level morphology analysis method is one of rule-based morphological analysis method. This approach handles morphological transformation using rules and analyzes words with morpheme connection information in a lexicon. It is independent of language and Korean Two-level system was also developed. But, it was limited in practical use, because of using very small set of lexicon built manually. And it has also a over-generation problem. In this paper, we propose an automatic construction method of Korean Two-level lexicon for PC-KIMMO from morpheme tagged corpus. We also propose a method to solve over-generation problem using lexical information and sub-tags. The experiment showed that the proposed method reduced over-generation by 68% compared with the previous method, and the performance increased from 39% to 65% in f-measure.

Two-level 형태소 분석 방법은 규칙 기반 방법 중 하나로 형태소의 변화 현상을 규칙으로 처리하고, 기본 어휘 사전을 기반으로 형태소 결합관계를 분석한다. 이는 언어에 독립적인 방법으로 한국어에 대해서도 일부 구축되어 적용됨이 증명되었다. 그러나 기존 한국어에 대한 Two-level 형태소 분석기는 사전을 수동으로 구축하여 규모가 매우 작고 실제 사용에 제한적이었으며, 과분석이 많아 효율성이 매우 떨어졌다. 본 논문은 세종 품사부착 말뭉치에서 대규모의 Two-level 어휘 사전을 자동으로 구축하여 형태소 분석기의 적용 범위를 넓히고, 형태소간의 결합관계를 어휘 정보와 어휘 형태에 따른 하위품사 정보를 이용하여 분석함으로써 형태소 분석기의 성능을 향상시킬 수 있는 방법을 제시한다. 실험 결과, 기존의 방법보다 형태소 분석기의 과분석을 68% 이상 줄여 f-measure를 25.5% point 이상 향상시킬 수 있었다.

Keywords

References

  1. Koskenniemi, Kimmo, "Two-level Model for Morphological Analysis," In IJCAI'83, International Joint Conference on Artificial Intelligence, pp.683-685, 1983.
  2. Koskenniemi, Kimmo, "A general computational model for word-form recognition and production," In Proceedings for COLING-84:Association for Computational Linguistics, pp.178-181, 1984.
  3. Antworth and Evan L, "PC-KIMMO :A Two-level Processor for Morphological Analyzis," Occasional Publications in Academic Computing No.16. Summer Institute of Linguistics, Dallas, TX, 1990.
  4. S. Lee, "A Two-level Morphological Analysis of Korean," Master dissertation, Korea Advanced Institute of Science and Technology, Dept. of Computer Science, 1992. (in Korean)
  5. S. Lee, D. Kim, J. Seo, K. Choi, G. Kim, "A Two-level Approach to Korean Verb Morphology," Proceedings of Fall Korea Information Science Society Conference, Vol.19, No.2, pp.993-996, 1992. (in Korean)
  6. Barton. G. Edward Berwick, Robert C. and Ristad, Eric Sven, "Computational and Natural Language," The MIT Press, Cambridge, 1987.
  7. The national institute of the Korean Language, "Part-Of-Speech Tagged Corpus For Korean," 21C Sejong Project, 2011. (in Korean)
  8. A. Arppe, L. Carlson, K. Linden, J. Piitulainen, M. Suominen, M. Vainio, H. Westerlund and A. Yli-Jyra, "Inquiries Into Words; a Festschrift for Kimmo Koskenniemi on his 60th Birthday," CSLI Publications, Stanford University, pp.71-83, 2005.
  9. W. A. Gale and K. W. Church, "A Program for Aligning Sentences in Bilingual Corpora," In Using Large Corpora (ed. Armstrong, S.), The MIT Press, Cambridge, Massachusettes, London, England, pp.75-102, 1994.
  10. S. Y. Kim, "A morphological analyzer for korean language with tabular parsing method and connectivity information," Master dissertation, Korea Advanced Institute of Science and Technology, Dept. of Computer Science, 1987. (in Korean)
  11. J. W. Kang, "A design and implementation of hangul spelling and word-spacing checker using connectivity information," Master dissertation, Korea Advanced Institute of Science and Technology, Dept. of Computer Science, 1990. (in Korean)
  12. J. S. Lee, B. Kim. "Automatic Construction of Korean Morphotactic for Two-level Lexicon," In LaRC2011, International Conference on Terminology, Language and Content Resources, 2011.