래티스상의 구조적 분류에 기반한 한국어 형태소 분석 및 품사 태깅

Lattice-based Discriminative Approach for Korean Morphological Analysis

  • 나승훈 (부산외국어대학교 임베디드소프트웨어학과) ;
  • 김창현 (한국전자통신연구원 언어처리연구실) ;
  • 김영길 (한국전자통신연구원 언어처리연구실)
  • 투고 : 2013.12.18
  • 심사 : 2014.05.09
  • 발행 : 2014.07.15

초록

본 논문에서는 래티스상의 구조적 분류에 기반한 한국어 형태소 분석 및 품사 태깅을 수행하는 방법을 제안한다. 제안하는 방법은 입력문이 주어질 때 어휘 사전(lexicon)을 참조하여, 형태소를 노드로 취하고 인접형태소간의 에지를 갖도록 래티스를 구성하며, 구성된 래티스상 가장 점수가 높은 경로상에 있는 형태소들을 분석 결과로 제시하는 방법이다. 실험 결과, ETRI 품사 부착 코퍼스에서 기존의 1차 linear-chain CRF에 기반한 방법보다 높은 어절 정확률 그리고 문장 정확률을 얻었다.

In this paper, we propose a lattice-based discriminative approach for Korean morphological analysis and POS tagging. In our approach, for an input sentence, a morpheme lattice is first created from a lexicon where each node corresponds to a morpheme in the lexicon and each edge is formed between two consecutive morphemes. A candidate result of morphological analysis is then represented as a path in the morpheme lattice which is defined as the sequence of edges, starting in the initial state and ending with the final state. In this setting, the morphological analysis is simply considered as the process of finding the best path among all possible paths. Experiment results show that the proposed lattice-based method outperforms the first-order linear-chain CRF.

키워드

과제정보

연구 과제번호 : 지식학습 기반의 다국어 확장이 용이한 관광/국제행사 통역률 90%급 자동 통번역 소프트웨어 원천 기술 개발

연구 과제 주관 기관 : 한국산업기술평가관리원

참고문헌

  1. Shim, K., & Yang, J. (2002), MACH - A Supersonic Korean Morphological Analyzer, COLING. 939-945.
  2. S.-S. Kang, "Multi-level Morphology and Morphological Analysis Model for Korean," in Proceedings of HCLT (Human & Cognitive Language Technology) '94, pp.140-145, 1994. (in Korean)
  3. S.-S. Kang and Y. T. Kim, "A General Morphological Analyzer and Spelling Checker for the Korean Language Using Syllable Characteristics," Journal of Korean Information Science Society (B), vol.23, no.5, pp.530-539, 1996. (in Korean)
  4. S.-S. Kang, "Korean Morphological Analysis and Information Retrieval," Hongrung publishing company, 2002. (in Korean)
  5. O.-W. Kwon, Y. Chung, M.-Y-. Kim, D.-W. Ryu, M.-K. Lee, J.-H. Lee, "Korean Morphological Analyzer and Part-of-Speech Tagger Based on CYK Algorithm Using Syllable Information," in Proceedings of MATEC '99, pp.76-88, 1999. (in Korean)
  6. K. Shim and J. Yang, "High Speed Korean Morphological Analysis based on Adjacency Condition Check," Journal of Korean Information Science Society: Software and Applications, vol.31, no.1, pp. 89-99, 2004. (in Korean)
  7. S.-H. Yeo, Y.-H. Kim, H.-J. Lee, J.-H. Lee, "Design and Implementation of an Effective Morphological Analyzer with Multi - level Filtering competence," in Proceedings of KCC (Korean Computer Congress) '91 (B), pp.797-800, 1991 (in Korean)
  8. S.-J. Lee, D.-B. Kim, J.-Y. Seo, K.-S. Choi, and G.-C. Kim, "A Two-Level Approach to Korean Verb Morphology," in Proceedings of KCC '92 (B), pp.993-996, Oct. 1992. (in Korean)
  9. E.-C. Lee and J.-H. Lee, "The Implementation of Korean morphological Analyzer Using Hierarchical Symbolic Connectivity Information," in Proceedings of HCLT '92, pp.95-104, 1992. (in Korean)
  10. J.-H. Choi and S.-J. Lee, "A method for reducing dictionary access with bidirectional longest match strategy in Korean morphological analyzer," Journal of Korean Information Science Society: Software and Applications, vol.20, no.10, pp.1497-1507, 1993. (in Korean)
  11. Lee, D., & Rim, H. (2005), Probabilistic Models for Korean Morphological Analysis, IJCNLP (pp. 197-202).
  12. Lee, G. G., Cha, J., & Lee, J. (2002), Syllable-Pattern- Based Unknown- Morpheme Segmentation and Estimation for Hybrid Part-of-Speech Tagging of Korean, Computational Linguistics, 28(1).
  13. S.-H. Na, S.-I. Yang, C.-H. Kim, O.-W. Kwon, and Y.-K. Kim, "CRFs for Korean Morpheme Segmentation and POS Tagging," in Proceedings of HCLT'12, pp.12-15, 2012. (in Korean)
  14. S.-H. Na, C.-H. Kim, and Y.-K. Kim, "Two-Stage Compound Morpheme Segmentation in CRF-based Korean Morphological Analysis," in Proceedings of HCLT'13, pp.13-17, 2013. (in Korean)
  15. S.-H. Na, C.-H. Kim, and Y.-K. Kim, "Semi-CRF or Linear-chain CRF? A Comparative Study of Joint Models for Korean Morphological Analysis and POS Tagging," in Proceedings of HCLT'13, pp.9-12, 2013. (in Korean)
  16. J.-C. Shin and C.-Y. Ock, "A Korean morphological analyzer using a preanalyzed partial word-phrase dictionary," Journal of Korean Information Science Society: Software and Applications, vol.39, no.5, pp. 415-424, 2012. (in Korean)
  17. K. Shim, "Morpheme Restoration for Syllable-based Korean POS Tagging," Journal of Korean Information Science Society: Software and Applications, vol.40, no.3, pp.182-189, 2013. (in Korean)
  18. K. Shim, "Syllable-based POS Tagging without Korean Morphological Analysis," Journal of Korean Journal of Cognitive Science, vol.22, no.3, pp.327- 345, 2011. (in Korean)
  19. J.-S. Lee, "Three-Step Probabilistic Model for Korean Morphological Analysis," Journal of Korean Information Science Society: Software and Applications, vol.38, no.5, pp.257-268, 2011. (in Korean)
  20. C. Lee, "Joint Models for Korean Word Spacing and POS Tagging using Structural SVM," in Proceedings of KCC '13, pp.604-606, Jun. 2013. (in Korean)
  21. J.-P. Hong and J.-W. Cha, "A New Korean Morphological Analyzer using Eojeol Pattern Dictionary," in Proceedings of KCC '08, pp.279-284, Jun. 2008. (in Korean)
  22. S.-I. Yang, M.-P. Hong, Y.-K. Kim, and S.-K. Choi. Morphological Analysis for Korean Compounded Morpheme Using Whiter Space Information, In Proceedings of HCI Korea '01, pp.612-616, 2001. (in Korean)
  23. Kudo, T., Yamamoto, K., & Matsumoto, Y. (2004). Applying Conditional Random Fields to Japanese Morphological Analysis. EMNLP, 230-237.
  24. Mcdonald, R., Grammer, K., & Pereira, F. Online Large-Margin Training of Dependency Parsers. ACL, 91-98, 2005.
  25. Collins, M. Discriminative Traning Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms, EMNLP, pp.1-8, 2002.
  26. Freund, Y. and Schapire, R. E. Large Margin Classification Using the Perceptron Algorithm. Machine Learning, vol.37, no.3, pp.277-296. 1999. https://doi.org/10.1023/A:1007662407062
  27. N. Nam and Y. Guo. Comparisons of sequence labeling algorithms and extensions, Proceedings of the 24th international conference on Machine learning, pp.681-688, 2007.