DOI QR코드

DOI QR Code

A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors

  • Son, Jeong-Woo (School of Computer Science and Engineering, Kyungpook National University) ;
  • Noh, Tae-Gil (School of Computer Science and Engineering, Kyungpook National University) ;
  • Park, Seong-Bae (School of Computer Science and Engineering, Kyungpook National University)
  • Received : 2011.06.26
  • Accepted : 2012.03.07
  • Published : 2012.03.25

Abstract

All types of part-of-speech (POS) tagging errors have been equally treated by existing taggers. However, the errors are not equally important, since some errors affect the performance of subsequent natural language processing seriously while others do not. This paper aims to minimize these serious errors while retaining the overall performance of POS tagging. Two gradient loss functions are proposed to reflect the different types of errors. They are designed to assign a larger cost for serious errors and a smaller cost for minor errors. Through a series of experiments, it is shown that the classifier trained with the proposed loss functions not only reduces serious errors but also achieves slightly higher accuracy than ordinary classifiers.

Keywords

References

  1. T. Brants, "TnT-A Statistical Part-of-Speech Tagger," In Proceedings of the Sixth Applied Natural Language Processing Conference, pp. 224-231, 2000.
  2. J. Lafferty, A. McCallum, and F. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," In Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282-289, 2001.
  3. A. Ratnaparkhi, "A Maximum Entropy Model for Part-Of-Speech Tagging," In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 133-142, 1996.
  4. T. Kudo, K. Yamamoto, and Y. Matsumoto, "Applying Conditional Random Fields to Japanese Morphological Analysis," In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 230-237, 2004.
  5. K. Toutanova and C. Manning, "Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger," In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 63-70, 2000.
  6. K. Toutanova, D. Klein, C. Manning, and Y. Singer, "Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network," In Proceedings of HLTNAACL, pp. 252-259, 2003.
  7. Y. Tsuruoka and J. Tsujii, "Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data," In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 467-474, 2005.
  8. S. Goldwater and T. Griffiths, "A fully Bayesian Approach to Unsupervised Part-of-Speech Tagging," In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 744-751, 2007.
  9. A. Haghighi and D. Klein, "Prototype-driven Learning for Sequence Models," In Proceedings of the North American Chapter of the Association for Computational Linguistics, pp. 320-327, 2006.
  10. Y. Altun, M. Johnson, and T. Hofmann, "Investigating Loss Functions and Optimization Methods for Discriminative Learning of Label Sequences," In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 145-152, 2003.
  11. I. Tsochantaridis, T. Hofmann, T. Joachims, and T. Altun, "Support Vector Learning for Interdependent and Structured Output Spaces," In Proceedings of the 21st International Conference on Machine Learning, pp. 104-111, 2004.
  12. J. Gimenez and L. M'arquez, "SVMTool: A general POS tagger generator based on Support Vector Machines," In Proceedings of the Fourth International Conference on Language Resources and Evaluation, pp. 43-46, 2004.
  13. M. Marcus, B. Santorini, and M. Marcinkiewicz, "Building a Large Annotated Corpus of English: The Penn Treebank," Computational Linguistics, vol. 19, no.2, pp. 313-330, 1994.
  14. T. Berg-Kirkpatrick, A. Cote, J. DeNero, and D. Klein, "Painless Unsupervised Learning with Features," In Proceedings of the North American Chapter of the Association for Computational Linguistics, pp. 582-590, 2010.
  15. J. Graca, K. Ganchev, B. Taskar, and F. Pereira, "Posterior vs Parameter Sparsity in Latent Variable Models," In Advances in Neural Information Processing Systems 22, pp. 664-672, 2009.
  16. M. Johnson, "Why doesn't EM find goodHMMPOStaggers?," In Proceedings of the 2007 Joint Meeting of the Conference on Empirical Methods in Natural Language Processing and the Conference on Computational Natural Language Learning, pp. 296-305, 2007.
  17. L. Cai and T. Hofmann, "Hierarchical Document Categorization with Support Vector Machines," In Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 78-87, 2004.
  18. C. Elkan, "The Foundations of Cost-Sensitive Learning," In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 973-978, 2001.
  19. Z. Zhou and X. Liu, "On Multi-Class Cost-Sensitive Learning," In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 567-572, 2006.
  20. Q. Zhao and M. Marcus, "A Simple Unsupervised Learner for POS Disambiguation Rules Given Only a Minimal Lexicon," In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 688-697, 2009.
  21. J. Sunghae, "Support Vector Machine based on Stratified Sampling," International Journal of Fuzzy Logic and Intelligent Systems, vol. 9, no. 2, pp. 141-146, 2009. https://doi.org/10.5391/IJFIS.2009.9.2.141
  22. K. Crammer, Y. Singer, N. Cristianini, J. Shawetaylor, and B. Williamson, "On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines," Journal of Machine Learning Research, vol. 2, pp. 265-292, 2001.
  23. T. Nakagawa, T. Kudo, and Y. Matsumoto, "Unknown Word Guessing and Part-of-Speech Tagging Using Support Vector Machines," In Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium, pp. 325-331, 2001.