DOI QR코드

DOI QR Code

Evaluation of Classification Algorithm Performance of Sentiment Analysis Using Entropy Score

엔트로피 점수를 이용한 감성분석 분류알고리즘의 수행도 평가

  • Park, Man-Hee (Department of Business Administration, Catholic University of Pusan)
  • Received : 2018.04.02
  • Accepted : 2018.05.05
  • Published : 2018.09.30

Abstract

Online customer evaluations and social media information among a variety of information sources are critical for businesses as it influences the customer's decision making. There are limitations on the time and money that the survey will ask to identify a variety of customers' needs and complaints. The customer review data at online shopping malls provide the ideal data sources for analyzing customer sentiment about their products. In this study, we collected product reviews data on the smartphone of Samsung and Apple from Amazon. We applied five classification algorithms which are used as representative sentiment analysis techniques in previous studies. The five algorithms are based on support vector machines, bagging, random forest, classification or regression tree and maximum entropy. In this study, we proposed entropy score which can comprehensively evaluate the performance of classification algorithm. As a result of evaluating five algorithms using an entropy score, the SVMs algorithm's entropy score was ranked highest.

다양한 온라인 고객 평가 및 소셜 미디어 정보는 고객의 의사결정에 영향을 미치기 때문에 기업에게 매우 중요한 정보 출처라고 할 수 있다. 설문 조사를 통해 고객의 다양한 요구와 불만 사항을 파악하는 데는 많은 비용과 시간적인 제약이 발생하고 있다. 온라인 쇼핑몰의 고객 후기 데이터는 제품에 대한 고객들의 감성을 분석할 수 있는 이상적인 자료를 제공하고 있다. 본 연구에서는 삼성과 애플 스마폰에 대한 감성분석을 위해 아마존 쇼핑몰로부터 고객 리뷰 데이터를 수집하였다. 선행 연구에서 대표적인 감성분석 기법으로 사용된 5가지 분류 알고리즘을 적용하였다. 5가지 분류알고리즘은 support vector machines, bagging, random forest, classification or regression tree, maximum entropy 등이다. 본 연구에서는 분류 알고리즘의 수행도를 종합적으로 평가할 수 있는 entropy score를 제안하였다. Entropy score를 이용하여 5가지 알고리즘을 평가한 결과에 따르면 support vector machines 알고리즘의 entropy score가 가장 높은 것으로 분석되었다.

Keywords

References

  1. B. Liu, Sentiment Analysis and Opinion Mining, Morgan & Claypool Publishers, 2015.
  2. B. Gregorutti, B. Michel and P. Saint-Pierre, "Correlation and variable importance in random forests," Statistics and Computing, vol. 27, no. 3, pp. 659-678, Apr. 2017. https://doi.org/10.1007/s11222-016-9646-1
  3. V. A. Kharde and S. S. Sonawane, "Sentiment Analysis of Twitter Data: A Survey of Techniques," International Journal of Computer Applications, vol. 139, no. 11, pp. 5-15, Apr. 2016. https://doi.org/10.5120/ijca2016908625
  4. G. Vinodhini and RM. Chandrasekaran, "Performance Evaluation of Machine Learning Classifiers in Sentiment Mining," International Journal of Computer Trends and Technology, vol. 4, no. 6, pp. 1783-1786, Jun. 2013.
  5. R. Polikar, "Ensemble based systems in decision making," IEEE Circuits and Systems Magazine, vol. 6, no. 3, pp. 21-45, Sep. 2006. https://doi.org/10.1109/MCAS.2006.1688199
  6. L. Breiman, "Bagging predictors," Machine Learning, vol. 24, no. 2, pp. 123-140, Jan. 1996. https://doi.org/10.1023/A:1018054314350
  7. C. D. Sutton, "Classification and Regression Trees, Bagging, and Boosting," Handbook of Statistics, vol. 24, pp. 303-329, Apr. 2005.
  8. R. Kohavi and F. Provost, "Glossary of terms. Editorial for the Special Issue on Applications of Machine Learning and the Knowledge Discovery Process," Machine Learning, vol. 30, pp. 271-274, Feb. 1998. https://doi.org/10.1023/A:1017181826899
  9. Q. Xie, Q. Dai, Y. Li and A. Jiang, "Increasing the Discriminatory Power of DEA Using Shannon's Entropy," Entropy, vol. 16. pp. 1571-1585, Mar. 2014. https://doi.org/10.3390/e16031571
  10. D. Meyer, E. Dimitriadou, K. Hornik, A. Weingessel, and F. Leisch, e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071) [Internet]. Available: http://CRAN.R-project.org/package=e1071.
  11. A. Peters and T. Hothorn, ipred: Improved Predictors [Internet]. Available: http://CRAN.R-project.org/package=ipred.
  12. A. Liaw and M. Wiener, "Classification and regression by randomForest," R News, vol. 2, no. 3, pp. 18-22, Dec. 2002.
  13. B. Ripley, tree: Classification and Regression Trees [Internet]. Available: http://CRAN.R-project.org/package=tree.
  14. T. P. Jurka, L. Collingwood, A. E. Boydstun, Grossman, and W. E. Atteveldt, "RTextTools: A Supervised Learning Package for Text Classification," The R Journal, vol. 5, no. 1, pp. 6-12, Jun. 2013.
  15. T. P. Jurka, "maxent: An R package for low-memory multinomial logistic regression with support for semi- automated text classification," The R Journal, vol. 4, no. 1, pp. 56-59, Jun. 2012.
  16. Y. Wan and Q. Gao, "An Ensemble Sentiment Classification System of Twitter Data for Airline Services Analysis," in Proceedings of 15th IEEE International Conference on Data Mining Workshop, pp. 1318-1325, 2015.