DOI QR코드

DOI QR Code

Exploring an Optimal Feature Selection Method for Effective Opinion Mining Tasks

  • Eo, Kyun Sun (SKKU Business School, Sungkyunkwan University) ;
  • Lee, Kun Chang (SKKU Business School/SAIHST (Samsung Advanced Institute of Health Sciences & Technology), Sungkyunkwan University)
  • 투고 : 2018.11.13
  • 심사 : 2019.01.07
  • 발행 : 2019.02.28

초록

This paper aims to find the most effective feature selection method for the sake of opinion mining tasks. Basically, opinion mining tasks belong to sentiment analysis, which is to categorize opinions of the online texts into positive and negative from a text mining point of view. By using the five product groups dataset such as apparel, books, DVDs, electronics, and kitchen, TF-IDF and Bag-of-Words(BOW) fare calculated to form the product review feature sets. Next, we applied the feature selection methods to see which method reveals most robust results. The results show that the stacking classifier based on those features out of applying Information Gain feature selection method yields best result.

키워드

Table 1. Study of Opinion mining

CPTSCQ_2019_v24n2_171_t0001.png 이미지

Table 2. Results of Accuracy

CPTSCQ_2019_v24n2_171_t0002.png 이미지

Table 4. Results of AUC

CPTSCQ_2019_v24n2_171_t0003.png 이미지

Table 3. The number of features

CPTSCQ_2019_v24n2_171_t0004.png 이미지

Table 5. Results of T-test

CPTSCQ_2019_v24n2_171_t0005.png 이미지

참고문헌

  1. A. Yadollahi, A. G. Shahraki, & O. R. Zaiane, "Current state of text sentiment analysis from opinion to emotion mining". Association for computing machinery computing surveys, Vol. 50, No. 2, Article 25, 2017.
  2. M. V. Mantyla, D. Graziotin, & M. Kuutila, "The evolution of sentiment analysis? A review of research topics, venues, and top cited papers", Computer Science Review, Vol. 27, pp. 16-32, 2018. https://doi.org/10.1016/j.cosrev.2017.10.002
  3. C. Catal, & M. Nangir, "A sentiment classification model based on multiple classifiers". Applied Soft Computing, Vol. 50, pp. 135-141, 2017. https://doi.org/10.1016/j.asoc.2016.11.022
  4. M. Kang, J. Ahn, & K. Lee, "Opinion mining using ensemble text hidden Markov models for text classification". Expert Systems with Applications, Vol. 94, pp. 218-227, 2018. https://doi.org/10.1016/j.eswa.2017.07.019
  5. Z. Li, W. Xu, L. Zhang, & R. Y. Lau, "An ontology-based Web mining method for unemployment rate prediction", Decision Support Systems, Vol. 66, pp. 114-122, 2014. https://doi.org/10.1016/j.dss.2014.06.007
  6. M. Ghiassi, J. Skinner, & D. Zimbra, "Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network.", Expert systems with applications, Vol. 40, No. 16, pp. 6266-6282, 2013. https://doi.org/10.1016/j.eswa.2013.05.057
  7. N. F. Da Silva, E. R. Hruschka, & E. R. Hruschka, "Tweet sentiment analysis with classifier ensembles." Decision support systems, Vol. 66, pp. 170-179, 2014. https://doi.org/10.1016/j.dss.2014.07.003
  8. G. Wang, J. Sun, J. Ma, K. Xu, & J. Gu, "Sentiment classification: The contribution of ensemble learning.", Decision support systems, Vol. 57, pp. 77-93, 2014. https://doi.org/10.1016/j.dss.2013.08.002
  9. Y. Liu, J. W. Bi, & Z. P. Fan, "Multi-class sentiment classification: The experimental comparisons of feature selection and machine learning algorithms.", Expert systems with applications, Vol. 80, pp. 323-339, 2017. https://doi.org/10.1016/j.eswa.2017.03.042
  10. M. A. Hall, "Correlation-based feature selection for machine learning", 1999.
  11. M. Robnik-Sikonja, & I. Kononenko, "Theoretical and empirical analysis of ReliefF and RReliefF". Machine learning, Vol. 53, No. (1-2), pp. 23-69, 2003. https://doi.org/10.1023/A:1025667309714
  12. S. Menard, "Applied logistic regression analysis, Vol. 106, Sage", 2002.
  13. W. L. Buntine, "Operations for learning with graphical models". Journal of Atificial Intelligence Research, Vol. 2, pp. 159-225, 1994. https://doi.org/10.1613/jair.62
  14. M. Ballings, D. Van den Poel, N. Hespeels, & R. Gryp, "Evaluating multiple classifiers for stock price direction prediction". Expert Systems with Applications, Vo. 42, No. 20, pp. 7046-7056, 2015. https://doi.org/10.1016/j.eswa.2015.05.013
  15. L. Breiman, "Random forests. Machine learning, Vol. 45, No. 1, pp. 5-32, 2001. https://doi.org/10.1023/A:1010933404324
  16. T.K. Ho, "The Random Subspace Method for Constructing Decision Forests," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832-844, 1998. https://doi.org/10.1109/34.709601
  17. S. K. Murthy, "Automatic construction of decision trees from data: A multi-disciplinary survey". Data mining and knowledge discovery, Vol. 2, No. 4, pp. 345-389, 1998. https://doi.org/10.1023/A:1009744630224
  18. V. Vapnik, "The nature of statistical learning theory. Springer science & business media", 2013.
  19. L. Breiman, "Bagging predictors". Machine learning, Vol. 24, No. 2, pp. 123-140, 1996. https://doi.org/10.1023/A:1018054314350
  20. D. H. Wolpert, "Stacked generalization". Neural networks, Vol. 5, No. 2, pp. 241-259, 1992. https://doi.org/10.1016/S0893-6080(05)80023-1
  21. J. Blitzer, M. Dredze, & F. Pereira, "Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification". In Proceedings of the 45th annual meeting of the association of computational linguistics pp. 440-447, 2007.
  22. A. P. Bradley, "The use of the area under the ROC curve in the evaluation of machine learning algorithms", Pattern recognition, Vol. 30, No. 7, pp. 1145-1159, 1997. https://doi.org/10.1016/S0031-3203(96)00142-2
  23. S. Arlot, & A. Celisse, "A survey of cross-validation procedures for model selection", Statistics surveys, Vol. 4, pp. 40-79, 2010. https://doi.org/10.1214/09-SS054