(Resolving Prepositional Phrase Attachment and POS Tagging Ambiguities using a Maximum Entropy Boosting Model)

최대 엔트로피 부스팅 모델을 이용한 영어 전치사구 접속과 품사 결정 모호성 해소

  • 박성배 (서울대학교 컴퓨터신기술공동연구소)
  • Published : 2003.06.01

Abstract

Maximum entropy models are promising candidates for natural language modeling. However, there are two major hurdles in applying maximum entropy models to real-life language problems, such as prepositional phrase attachment: feature selection and high computational complexity. In this paper, we propose a maximum entropy boosting model to overcome these limitations and the problem of imbalanced data in natural language resources, and apply it to prepositional phrase (PP) attachment and part-of-speech (POS) tagging. According to the experimental results on Wall Street Journal corpus, the model shows 84.3% of accuracy for PP attachment and 96.78% of accuracy for POS tagging that are close to the state-of-the-art performance of these tasks only with small efforts of modeling.

최대 엔트로피 모델은 자연언어를 모델링하기 위한 좋은 방법이다. 하지만, 최대 엔트로피 모델을 전치사구 접속과 같은 실제 언어 문제에 적용할 때, 자질 선택과 계산 복잡도의 두 가지 문제가 발생한다. 본 논문에서는, 이런 문제와 자연언어 자원에 존재하는 불균형 데이터 문제를 해결하기 위한 최대 엔트로피 부스팅 모델(maximum entropy boosting model)을 제시하고, 이를 영어의 전치사구 접속과 품사 결정 모호성 해소에 적용한다. Wall Street Journal 말뭉치에 대한 실험 결과, 문제의 모델링에 아주 작은 노력을 들였음에도 불구하고, 전치사구 접속 문제에 대해 84.3%의 정확도와 품사 결정 문제에 대해 96.78%의 정확도를 보여 지금까지 알려진 최고의 성능과 비슷한 결과를 보였다.

Keywords

References

  1. A. Ratnaparkhi, J. Reynar, and S. Roukos, 'A maximum entropy model for prepositional phrase attachment,' In Proceedings of the Human Language Technology Workshop, pp. 250-255, 1994 https://doi.org/10.3115/1075812.1075868
  2. M. Collins and J. Brooks, 'Prepositional phrase attachment through a backed-off model,' In Proceedings of the Third Workshop on Very Large Corpora, pp. 27-38, 1995
  3. E. Brill and P. Resnik, 'A rule-based approach to prepositional phrase attachment disambiguation,' In Proceedings of the 15th International Conference on Computational Linguistics, pp. 1198-1204, 1994
  4. E. Brill, 'Some advances in transformation-based part of speech tagging,' In Proceedings of the 12th National Conference on Artificial Intelligence, pp. 722-727, 1994
  5. H. Schmid, 'Part-of-speech tagging with neural networks,' In Proceedings of the 15th International Conference on Computational Linguistics, pp. 172-176, 1994
  6. R. Weischedel, M. Meteer, R. Schwartz, L. Ramshaw and J. Palmucci, 'Coping with ambiguity and unknown words through probabilistic models,' Computational Linguistics, Vol. 19, No. 2, pp. 359-382, 1994
  7. A. Ratnaparkhi, Maximum Entropy Models for Natural Language Ambiguity Resolution, Ph.D thesis, University of Pennsylvannia, 1998
  8. M. Kubat and S. Matwin, 'Addressing the curse of imbalanced training sets: one-sided selection,' In Proceedings of the 14th International Conference on Machine Learning, pp. 179-186, 1997
  9. D. Darroch and D. Ratcliff, 'Generalized iterative scaling for log-linear models,' The Annals of Mathenntical Statistics, Vol. 43, No. 5, pp 1470-1480, 1972 https://doi.org/10.1214/aoms/1177692379
  10. T. Cover and J. Thomas, Element of information theory, John Wiley, 1991
  11. Y. Freund and R. Schapire, 'Experiments with a new boosting algorithm,' In Proceedings of the 13th International Conference on Machine Learning, pp. 148-156, 1996
  12. M. Kubat and S. Matwin, 'Addressing the curse of imbalanced training sets-' One-sided selection,' In Proceedings of the 14th International Conference on Machine Learning, pp. 179-186, 1997
  13. J. Steina and M. Nagao, 'Corpus based PP attachment ambiguity resolution with a semantic dictionary,' In Proceedings of the Fifth Workshop on Very Large Corpora, pp. 66-80, 1997
  14. P. Pantel and D. Lin, 'An supervised approach to prepositional phrase attachment using con-textually similar words,' In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, pp. 101-108, 2000
  15. H. Baayen adn R. Sproat, 'Estimating lexical priors for low-frequency morphologically ambiguous forms,' Computational Linguistics, Vol. 22, No. 2, pp. 155-166, 1996
  16. A. Ratnaparkhi, 'A maximum entropy model for part-of-speech tagging,' In Proceedings of the Empirical Methods in Natural Language Processing, pp. 133-142, 1996
  17. R. Quinlan, C4.5:Programs for Machine Learning, Morgan Kaufmann Publishers, 1993
  18. S. Katz, 'Estimation of probabilities from sparse data for the language model component of a speech recognizer,' IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 35, No. 3, pp. 400-401, 1987 https://doi.org/10.1109/TASSP.1987.1165125
  19. S. Chen and J. Goodman, 'An empirical study of smoothing techniques for language modeling,' In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pp. 310-318, 1996 https://doi.org/10.3115/981863.981904
  20. S. Abney, R. Schapire, and Y. Singer, 'Boosting applied to tagging and PP-attachment,' In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 38-45, 1999