DOI QR코드

DOI QR Code

Probabilistic Part-Of-Speech Determination for Efficient English-Korean Machine Translation

효율적 영한기계번역을 위한 확률적 품사결정

  • 김성동 (한성대학교 컴퓨터공학과) ;
  • 김일민 (한성대학교 컴퓨터공학과)
  • Received : 2010.09.10
  • Accepted : 2010.11.22
  • Published : 2010.12.31

Abstract

Natural language processing has several ambiguity problems, and English-Korean machine translation especially includes those problems to be solved in each translation step. This paper focuses on resolving part-of-speech ambiguity of English words in order to improve the efficiency of English analysis, which is in part of efforts for developing practical English-Korean machine translation system. In order to improve the efficiency of the English analysis, the part-of-speech determination must be fast and accurate for being integrated with machine translation system. This paper proposes the probabilistic models for part-of-speech determination. We use Penn Treebank corpus in building the probabilistic models. In experiment, we present the performance of the part-of-speech determination models and the efficiency improvement of the machine translation system by the proposed part-of-speech determination method.

자연언어처리는 여러 가지 모호성 문제를 가지는데, 특히 영한기계번역은 번역 과정의 각 단계마다 해결해야 할 모호성 문제를 가진다. 본 논문에서는 실용적인 영한기계번역 시스템의 개발을 목적으로 영어 분석의 효율성을 높이기 위해 영어 단어의 품사 모호성 해소 문제에 초점을 두었다. 기계번역의 효율성 제고를 위해 영한기계번역 시스템에 통합하기 위한 품사결정 모듈은 빠른 시간에 정확한 품사결정을 하면서도 오류를 최소화 하여야 한다. 본 논문에서는 확률적 품사결정 방법을 제안하고 3가지 품사결정 확률 모델을 제시하였다. Penn Treebank 말뭉치로부터의 통계 정보를 이용하여 확률 모델을 구축하였으며 실험을 통해 제안한 품사결정 방법의 정확성과 품사결정에 의한 기계번역 시스템의 효율 향상 정도를 제시하였다.

Keywords

References

  1. B. B. Greene and G. M. Rubin, “Automatic grammatical tagging of English,” Technical report, Department of Linguistics, Brown University, Providence, Rhode Island, 1971.
  2. K. Koskenniemi, “Finite-state parsing and disambiguation,” In Proceedings of the 13th International Conference on Computational Linguistics, pp.229-232, Helsinki 1990.
  3. Eric Brill, “A Simple Rule-Based Part of Speech Tagger,” Proceedings of the Applied Natural Language Processing, pp.152-155, 1992.
  4. J. Jupiec, “Robust part-of-speech tagging using a hidden Markov model,” Computer Speech and Language, Vol.6, pp.225-242, 1992. https://doi.org/10.1016/0885-2308(92)90019-Z
  5. B. Merialdo, “Tagging English text with a probabilistic model,” Computational Linguistics, Vol.20, No.2, pp.155-172, 1994.
  6. Jae-Hoon Kim and Jungyun Seo, “A Hidden Markov Model Imbedding Multiword Units for Part-Of-Speech Tagging,” Journal of Electrical Engineering and Information Science, Vol.2, No.6, pp.7-13, 1997.
  7. L. E. Baum, T. Petrie, G. Soules, and N. Weiss, “A maximization technique occurring in the statistical analysis of probabilistic function of Markov chains,” Annals of Mathematical Statistics, Vol.41, No.1, pp.164-171, 1970. https://doi.org/10.1214/aoms/1177697196
  8. Adwait Ratnaparkhi, “A Maximum Entropy Model for Part-Of-Speech Tagging,” Proceedings of the Empirical Method in Natural Language Processing, pp. 133-142, 1996.
  9. Adam Berger, “The Improved Iterative Scaling Algorithm: A Gentle Introduction,” www.cs.cmu.edu/afs/~aberger/www/ps/scaling.ps
  10. Rong Jin, Rong Yan, Jian Zhang and Alex Hauptmann, “A Faster Iterative Scaling Algorithm for Conditional Exponential Model,” Proceedings of the 20th International Conference on Machine Learning, pp.282-289, 2003.
  11. Jesus Gimenez and Lluis Marquez, “SVM-Tool: A General POS tagger generator based on support vector machines,” Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal, 2004.
  12. Helmut Schmid, “Probabilistic Part-Of-Speech Tagging Using Decision Trees,” Proceedings of International Conference on New Methods in Language Processing, 1994.
  13. 이성욱, 이공주, 서정연, “영한기계번역 품사 집합과 펜트리뱅크 코퍼스 품사 집합간의 품사 대응”, 한국정보과학회 1999년도 가을 학술발표논문집, 제26권 제2호(II), pp.184-186, 1999.
  14. 김성동, 박성훈, “영한기계번역에서의 영어 품사결정 모델”, 지능정보연구, 2009.
  15. Zaid Md Abdul Wahab Sheikh, Felips Sanchez-Martinez, “A Trigram Part-of-Speech Tagger for the Apertium Free/ Open-Source Machine Translation Platform,” Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation, pp.67-74, 2009.
  16. K. Toutanova, D. Klein, C. D. Manning, Y. Singer, “Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network,” Proceedings of HLT-NAACL, pp.252-259, 2003.
  17. Michele Banko and Robert C. Moore, “Part of Speech Tagging in Context,” Proceedings of the 20th international conference on Computational Linguistics, pp.556-561, August, 23-27, 2004.
  18. S. Goldwater and T. L. Griffiths, “A Tully Bayesian Approach to Unsupervised Part-of-Speech Tagging,” Proceedings of the ACL, pp.744-751, 2007.