Robust Algorithms for Combining Multiple Term Weighting Vectors for Document Classification

Kim, Minyoung;

doi:10.5391/IJFIS.2016.16.2.81

International Journal of Fuzzy Logic and Intelligent Systems

Volume 16 Issue 2
/
Pages.81-86
/
2016
/
1598-2645(pISSN)
/
2093-744X(eISSN)

Korean Institute of Intelligent Systems (한국지능시스템학회)

DOI QR Code

Robust Algorithms for Combining Multiple Term Weighting Vectors for Document Classification

Kim, Minyoung (Department of Electronics & IT Media Engineering, Seoul National University of Science & Technology)

Received : 2016.04.03
Accepted : 2016.06.20
Published : 2016.06.30

https://doi.org/10.5391/IJFIS.2016.16.2.81 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Term weighting is a popular technique that effectively weighs the term features to improve accuracy in document classification. While several successful term weighting algorithms have been suggested, none of them appears to perform well consistently across different data domains. In this paper we propose several reasonable methods to combine different term weight vectors to yield a robust document classifier that performs consistently well on diverse datasets. Specifically we suggest two approaches: i) learning a single weight vector that lies in a convex hull of the base vectors while minimizing the class prediction loss, and ii) a mini-max classifier that aims for robustness of the individual weight vectors by minimizing the loss of the worst-performing strategy among the base vectors. We provide efficient solution methods for these optimization problems. The effectiveness and robustness of the proposed approaches are demonstrated on several benchmark document datasets, significantly outperforming the existing term weighting methods.

Keywords

References

A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, "Learning word vectors for sentiment analysis," 2011. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies.
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz, "A bayesian approach to filtering junk e-mail," 1998. Proceedings of the 21st national conference on Artificial intelligence (AAAI).
Upasana and S. Chakravarty, "A survey on text classification techniques for e-mail filtering," 2010. International Conference on Machine Learning and Computing.
T. Joachims, "Text categorization with suport vector machines: Learning with many relevant features," 1998. European Conference on Machine Learning.
A. Chy, M. Seddiqui, and S. Das, "Bangla news classification using naive Bayes classifier," 2014. International Conference on Computer and Information Technology.
F. Debole and F. Sebastiani, "Supervised term weighting for automated text categorization," 2003. Proceedings of the ACM symposium on Applied computing.
Z.-H. Deng, S.-W. Tang, D.-Q. Yang, M. Z. L.-Y. Li, and K.-Q. Xie, "A comparative study on feature weight in text categorization," Advanced Web Technologies and Applications, Lecture Notes in Computer Science, vol. 3007, pp. 588-597, 2004.
M. Lan, C. Tan, J. Su, and Y. Lu, "Supervised and traditional term weighting methods for automatic text categorization," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 4, pp. 721-735, 2009. https://doi.org/10.1109/TPAMI.2008.110
V. N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.
K. Crammer, Y. Singer, N. Cristianini, J. Shawe-taylor, and B. Williamson, "On the algorithmic implementation of multiclass kernel-based vector machines," Journal of Machine Learning Research, vol. 2, 2001.
D. P. Bertsekas, Nonlinear Programming. Athena Scientific, Belmont, MA, 1999.
M. F. Porter, "An algorithm for suffix stripping," Program, vol. 14, no. 3, pp. 130-137, 1980. https://doi.org/10.1108/eb046814

Cited by

Simultaneous Learning of Sentence Clustering and Class Prediction for Improved Document Classification vol.17, pp.1, 2017, https://doi.org/10.5391/IJFIS.2017.17.1.35

International Journal of Fuzzy Logic and Intelligent Systems

Robust Algorithms for Combining Multiple Term Weighting Vectors for Document Classification

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)