DOI QR코드

DOI QR Code

Investigating the Performance of Bayesian-based Feature Selection and Classification Approach to Social Media Sentiment Analysis

소셜미디어 감성분석을 위한 베이지안 속성 선택과 분류에 대한 연구

  • Chang Min Kang (SKK Business School, Sungkyunkwan University) ;
  • Kyun Sun Eo (SKK Business School, Sungkyunkwan University) ;
  • Kun Chang Lee (Global Business Administration/Department of Health Sciences & Technology/SAIHST(Samsung Advanced Institute for Health Sciences & Technology), Sungkyunkwan University)
  • 강창민 (성균관대학교 경영대학) ;
  • 어균선 (성균관대학교 경영대학) ;
  • 이건창 (성균관대학교 글로벌 경영학과/삼성융합의과학원(SAIHST) 융합의과학과)
  • Received : 2021.02.16
  • Accepted : 2021.11.25
  • Published : 2022.02.28

Abstract

Social media-based communication has become crucial part of our personal and official lives. Therefore, it is no surprise that social media sentiment analysis has emerged an important way of detecting potential customers' sentiment trends for all kinds of companies. However, social media sentiment analysis suffers from huge number of sentiment features obtained in the process of conducting the sentiment analysis. In this sense, this study proposes a novel method by using Bayesian Network. In this model MBFS (Markov Blanket-based Feature Selection) is used to reduce the number of sentiment features. To show the validity of our proposed model, we utilized online review data from Yelp, a famous social media about restaurant, bars, beauty salons evaluation and recommendation. We used a number of benchmarking feature selection methods like correlation-based feature selection, information gain, and gain ratio. A number of machine learning classifiers were also used for our validation tasks, like TAN, NBN, Sons & Spouses BN (Bayesian Network), Augmented Markov Blanket. Furthermore, we conducted Bayesian Network-based what-if analysis to see how the knowledge map between target node and related explanatory nodes could yield meaningful glimpse into what is going on in sentiments underlying the target dataset.

온라인 사용자들이 소셜 미디어상에 올린 온라인 리뷰 속 숨겨진 감정을 분석하는 감성분석은 소셜미디어의 확산에 힘입어 많은 관심을 받고 있다. 본 연구는 기존 연구들과 차별화된 방법으로 감성분석을 시도하기 위하여 베이지안 네트워크에 기반한 감성 분석 모델을 제안한다. 모델에는 MBFS(Markov Blanket-based Feature Selection)가 속성 선택 기법으로 사용된다. MBFS의 성과를 실증적으로 증명하기 위하여 소셜미디어인 Yelp의 리뷰 데이터를 활용하였다. 벤치마킹 속성 선택 기법으로는 상관관계기반 속성 선택, 정보획득 속성 선택, 획득비율 속성 선택을 사용하였다. 한편, 해당 속성선택방법을 토대로 4개의 머신러닝 알고리즘을 이용하여 분류성과를 비교하였다. 나아가 MBFS로 선택된 속성들 간 인과관계를 확인하고자 베이지안 네트워크를 통해 What-if 분석을 실시하였다. 본 연구에서 택한 머신러닝 분류기는 베이지안 네트워크 기반의 TAN (Tree Augmented Naive Bayes), NB (Naive Bayes), S-Spouses(Sons & Spouses), A-markov (Augmented Markov Blanket)이다. 성과분석 결과 본 연구에서 제안한 MBFS 방법이 정확도, 정밀도, F1점수 측면에서 벤치마킹 방법보다 더 우수한 성과를 나타내었다.

Keywords

Acknowledgement

이 논문은 2019년 대한민국 교육부와 한국연구재단의 지원을 받아 수행된 연구임(NRF-2019S1A5A2A01046529).

References

  1. Alamoodi, A. H., B. B. Zaidan, A. A. Zaidan, O. S. Albahri, K. I. Mohammed, R. Q. Malik, E. M. Almahdi, M. A. Chyad, Z. Tareq, A. S. Albahri, H. Hameed, and M. Alaa, "Sentiment analysis and its applications in fighting COVID-19 and infectious diseases: A systematic review", Expert Systems with Applications, Vol.167, 2020, p. 114155.
  2. Arlot, S. and A. Celisse, "A survey of cross-validation procedures for model selection", Statistics Surveys, Vol.4, 2010, pp. 40-79.
  3. Asur, S. and B. A. Huberman, "Predicting the future with social media", 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Vol. 1, 2010, pp. 492-499, https://doi.org/10.1109/WI-IAT.2010.63.
  4. Chakraborty, K., S. Bhatia, S. Bhattacharyya, J. Platos, R. Bag, and A. E. Hassanien, "Sentiment analysis of COVID-19 tweets by Deep Learning Classifiers-A study to show how popularity is affecting accuracy in social media, Applied Soft Computing, Vol.97, 2020, p. 106754. Available at https://doi.org/10.1016/j.asoc.2020.106754.
  5. Chan, S. F., B. R. Barnes, and K. Fukukawa, "Consumer control, dependency and satisfaction with online service", Asia Pacific Journal of Marketing and Logistics, Vol.28, No.4, 2016, pp. 594-615. https://doi.org/10.1108/APJML-09-2015-0134
  6. Cho, T. J., H. J. Yun, and C. C. Lee, "Twitter and retweet context: User characteristics and message attributes of Twitter for PR and marketing", Information Systems Review, Vol.14, No.1, 2012, pp. 21-35.
  7. Conrady, S. and L. Jouffe, Bayesian Networks and BayesiaLab: A Practical Introduction for Researchers, Bayesia USA, 2015.
  8. Costello, F. J., C. Kim, C. M. Kang, and K. C. Lee, "Identifying high-risk factors of depression in middle-aged persons with a novel sons and spouses bayesian network model", Healthcare, Vol.8, No.4, 2020, p. 562.
  9. Costello, F. J. and K. C. Lee, "Exploring the sentiment analysis of electric vehicles social media data by using feature selection methods", Journal of Digital Convergence, Vol.18, No.2, 2020, pp. 249-259, Available at https://doi.org/10.14400/JDC.2020.18.2.249.
  10. Dag, H., K. E. Sayin, I. Yenidogan, S. Albayrak, and C. Acar, "Comparison of feature selection algorithms for medical data", 2012 International Symposium on Innovations in Intelligent Systems and Applications, 2012, pp. 1-5.
  11. Eo, K. S. and K. C. Lee, "Exploring an optimal feature selection method for effective opinion mining tasks," Journal of the Korea Society of Computer and Information, Vol.24, No.2, 2019, pp. 171-177, Available at https://doi.org/10.9708/JKSCI.2019.24.02.171.
  12. Erra, U., S. Senatore, F. Minnella, and G. Caggianese, "Approximate TF-IDF based on topic extraction from massive message stream using the GPU", Information Sciences, Vol.292, 2015, pp. 143-161, Available at https://doi.org/10.1016/j.ins.2014.08.062.
  13. Feldman, R., "Techniques and applications for sentiment analysis", Communications of the ACM, Vol.56, No.4, 2013, pp. 82-89. https://doi.org/10.1145/2436256.2436274
  14. Hall, M., Correlation based feature selection for machine learning (Doctoral dissertation), University of Waikato, Dept. of Computer Science, 1999.
  15. Jiang, L., Z. Cai, D. Wang, and H. Zhang, "Improving tree augmented naive bayes for class probability estimation", Knowledge-Based Systems, Vol.26, 2012, pp. 239-245. https://doi.org/10.1016/j.knosys.2011.08.010
  16. Karegowda, A. G., A. S. Manjunath, and M. A. Jayaram, "Comparative study of attribute selection using gain ratio and correlation based feature selection", International Journal of Information Technology and Knowledge Management, Vol.2, No.2, 2010, pp. 271-277.
  17. Kira, K. and L. A. Rendell, "A practical approach to feature selection", Machine Learning Proceedings, Morgan Kaufmann, 1992, pp. 249-256.
  18. Kohavi, R. and G. H. John, "Wrappers for feature subset selection", Artificial Intelligence, Vol.97, No.1-2, 1997, pp. 273-324. https://doi.org/10.1016/S0004-3702(97)00043-X
  19. Koller, D. and M. Sahami, Toward optimal feature selection, Stanford InfoLab, 1996.
  20. Lee, T. and T. Hong, "Terms based sentiment classification for online review using support vector machine", Information Systems Review, Vol.17, No.1, 2015, pp. 49-64, Available at https://doi.org/10.14329/isr.2015.17.1.049.
  21. Li, L. X. and S. S. Abdul Rahman, "Students' learning style detection using tree augmented naive Bayes", Royal Society Open Science, Vol.5, No.7, 2018, p. 172108.
  22. Liu, Y., J. W. Bi, and Z. P. Fan, "Multi-class sentiment classification: The experimental comparisons of feature selection and machine learning algorithms", Expert Systems with Applications, Vol.80, 2017, pp. 323-339, Available at https://doi.org/10.1016/j.eswa.2017.03.042.
  23. Luca, M., "Reviews, reputation, and revenue: The case of Yelp.Com, Harvard Business School NOM Unit Working Paper 12-016, 2016.
  24. Min, J. Y., "The Amplifying Aspects of SNS Comments: An exploratory study through the sentiment comparison between news site comments and SNS comments", Information Systems Review, Vol.22, No.4, 2020, pp. 163-184. https://doi.org/10.14329/isr.2020.22.4.163
  25. Murtagh, F. and P. Contreras, "Algorithms for hierarchical clustering: An overview", Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol.2, No.1, 2012, pp. 86-97. https://doi.org/10.1002/widm.53
  26. Parlar, T., S. A. Ozel, and F. Song, "QER: A new feature selection method for sentiment analysis", Human-centric Computing and Information Sciences, Vol.8, No.1, 2018, p. 10, Available at https://doi.org/10.1186/s13673-018-0135-8.
  27. Prabhakaran, R., R. Krishnaprasad, M. Nanda, and J. Jayanthi, "System safety analysis for critical system applications using Bayesian networks", Procedia Computer Science, Vol.93, 2016, pp. 782-790. https://doi.org/10.1016/j.procs.2016.07.294
  28. Prabowo, R. and M. Thelwall, "Sentiment analysis: A combined approach", Journal of Informetrics, Vol.3, No.2, 2009, pp. 143-157. https://doi.org/10.1016/j.joi.2009.01.003
  29. Quinlan, J. R., "Induction of decision trees", Machine Learning, Vol.1, No.1, 1986, pp. 81-106.
  30. Sihwi, S. W., I. P. Jati, and R. Anggrainingsih, "Twitter sentiment analysis of movie reviews using information gain and Naive Bayes classifier, 2018 International Seminar on Application for Technology of Information and Communication, 2018, pp. 190-195, Available at https://doi.org/10.1109/ISEMANTIC.2018.8549757.
  31. Tang, J., S. Alelyani, and H. Liu, "Feature selection for classification: A review", Data classification: Algorithms and Applications, Vol.37, 2014, pp. 1-29.
  32. Wang, H., Z. Ling, K. Yu, and X. Wu, "Towards efficient and effective discovery of Markov blankets for feature selection", Information Sciences, Vol.509, 2020, pp. 227-242. https://doi.org/10.1016/j.ins.2019.09.010
  33. Wang, Y., A. Hong, X. Li, and J. Gao, "Marketing innovations during a global crisis: A study of China firms' response to COVID-19", Journal of Business Research, Vol.116, 2020, pp. 214-220. https://doi.org/10.1016/j.jbusres.2020.05.029
  34. Yadav, A. and D. K. Vishwakarma, "Sentiment analysis using deep learning architectures: A review", Artificial Intelligence Review, Vol.53, No.6, 2020, pp. 4335-4385. https://doi.org/10.1007/s10462-019-09794-5
  35. Yassine, M. and H. Hajj, "A framework for emotion mining from text in online social networks", In 2010 IEEE International Conference on Data Mining Workshops 2010, pp. 1136-1142.
  36. Yoo, S., J. Song, and O. Jeong, "Social media contents based sentiment analysis and prediction system", Expert Systems with Applications, Vol. 105, 2018, pp. 102-111, Available at https://doi.org/10.1016/j.eswa.2018.03.055.
  37. Yousefpour, A., R. Ibrahim, and H. N. A. Hamed, "Ordinal-based and frequency-based integration of feature selection methods for sentiment analysis", Expert Systems with Applications, Vol.75, 2017, pp. 80-93, Available at https://doi.org/10.1016/j.eswa.2017.01.009.