DOI QR코드

DOI QR Code

A Study on Optimization of Support Vector Machine Classifier for Word Sense Disambiguation

단어 중의성 해소를 위한 SVM 분류기 최적화에 관한 연구

  • Lee, Yong-Gu (Dept. of Library & Information Science, Keimyung University)
  • 이용구 (계명대학교 문헌정보학과)
  • Received : 2011.03.22
  • Accepted : 2011.04.06
  • Published : 2011.04.30

Abstract

The study was applied to context window sizes and weighting method to obtain the best performance of word sense disambiguation using support vector machine. The context window sizes were used to a 3-word, sentence, 50-bytes, and document window around the targeted word. The weighting methods were used to Binary, Term Frequency(TF), TF ${\times}$ Inverse Document Frequency(IDF), and Log TF ${\times}$ IDF. As a result, the performance of 50-bytes in the context window size was best. The Binary weighting method showed the best performance.

이 연구는 단어 중의성 해소를 위해 SVM 분류기가 최적의 성능을 가져오는 문맥창의 크기와 다양한 가중치 방법을 파악하고자 하였다. 실험집단으로 한글 신문기사를 적용하였다. 문맥창의 크기로 지역 문맥은 좌우 3단어, 한 문장, 그리고 좌우 50바이트 크기를 사용하였으며, 전역문맥으로 신문기사 전체를 대상으로 하였다. 가중치 부여 기법으로는 단순빈도인 이진 단어빈도와 단순 단어빈도를, 정규화 빈도로 단순 또는 로그를 취한 단어빈도 ${\times}$ 역문헌빈도를 사용하였다. 실험 결과 문맥창의 크기는 좌우 50 바이트가 가장 좋은 성능을 보였으며, 가중치 부여 방법은 이진 단어빈도가 가장 좋은 성능을 보였다.

Keywords

References

  1. 정영미. 2005. 정보검색연구. 서울: 구미무역(주) 출판부.
  2. 정영미, 이용구. 2005. 정보검색 성능 향상을 위한 단어 중의성 해소모형에 관한 연구. 정보관리학회지, 22(2): 125-145.
  3. 정영미, 임혜영. 2000. SVM 분류기를 이용한 문서 범주화 연구. 정보관리학회지, 17(4): 229-248
  4. Chang, C. and C. Lin. 2001. LIBSVM: a library for support vector machines. [cited 2011. 01. 30]. .
  5. Debole, F. and F. Sebastiani. 2003. "Supervised term weighting for automated text categorization." Proceedings of SAC-03, 18th ACM Symposium on Applied Computing, 784-788.
  6. Florian, R., and D. Yarowsky. 2002. "Modeling Consensus: Classifier Combination for Word Sense Disambiguation." Proceedings of EMNLP, 25-32.
  7. Gale, W., K. Church, and D. Yarowsky. 1992. "One sense per discourse." Proceedings of the Speech and Natural Language Workshop, 233-237.
  8. Gale, W., K. Church, and D. Yarowsky. 1993. "A method for disambiguating word senses in a large corpus." Computers and the Humanities, 26(5-6): 415-439.
  9. Ide, N., and J. Veronis. 1998. "Word sense disambiguation: the state of the art." Computational Linguistics, 24(1): 1-40.
  10. Joachims, T. 1998. "Text categorization with Support Vector Machines :Learning with many relevant features." Proceedings of the 10th European Conference on Machine Learning, 137-142.
  11. Leacock, C., G. Miller, and M. Chodorow. 1998. "Using corpus statistics and WordNet relations for sense identification." Computational Linguistics, 24(1): 147-166.
  12. Leacock, C., G. Towell, and E. Voorhees. 1993. "Corpus based statistical sense resolution." Proceedings of the ARPA Workshop on Human Language Technology, 260-265.
  13. Lee, Y., and H. Ng. 2002. "An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation." Proceedings of the 7th Conference on Empirical Methods in Natural Language Processing(EMNLP), Philadelphia, U.S.A., 41-48.
  14. Levinson, D. 1999. "Corpus-based method for unsupervised word sense disambiguation." Proceedings of the Workshop on Machine Learning in Human Language Technology, Advanced Course on Artificial Intelligence, Chania, Greece, 267-273.
  15. Mihalcea, R., and D. Moldovan. 2001. "A Highly Accurate Bootstrapping Algorithm for Word Sense Disambiguation." International Journal on Artificial Intelligence Tools, 10(1-2): 5-21. https://doi.org/10.1142/S0218213001000398
  16. Ng, H. T. and H. Lee. 1996. "Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Examplar Based Approach." Proceedings of the 34th Annual Meeting of the ACL, University of California, California, U.S.A., ACL Press, 40-47.
  17. Pedersen, T. 2000. "A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation." Proceedings of the First Conference of the North American Chapter of the Association for Computational Linguistics, 63-69.
  18. Pedersen, T. 2002. "A Baseline Methodology for Word Sense Disambiguation." Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, 126-135.
  19. Schutze, H. 1998. "Automatic word sense discrimination." Computational Linguistics archive, 24(1): 97-123.
  20. Stevenson, M. 2003. Word Sense Disambiguation: the Case for Combinations for Knowledge Sources. California: CSLI Publications
  21. Vapnik, V. N. 2000. The nature of statistical learning theory. 2nd ed. New York: Springer.
  22. Yang, Y., and X. Liu. 1999. "A re-examination of text categorization methods." Proceedings of the ACM SIGIR Conference on Research and Development in International Retrieval, 42-49.
  23. Yarowsky, D. 1993. "One sense per collocation." Proceeding of ARPA Human Language Technology Workshop, 266- 271.
  24. Yarowsky, D. 1995. "Unsupervised word sense disambiguation rivaling supervised methods." Annual Meeting of the ACL Archive Proceedings of the 33rd conference on Association for Computational Linguistics, 189-196.