DOI QR코드

DOI QR Code

A Method of Analyzing Sentiment Polarity of Multilingual Social Media: A Case of Korean-Chinese Languages

다국어 소셜미디어에 대한 감성분석 방법 개발: 한국어-중국어를 중심으로

  • 최미나 (경희대학교 일반대학원 경영학과) ;
  • 진윤선 (경희대학교 일반대학원 경영학과) ;
  • 권오병 (경희대학교 경영대학)
  • Received : 2016.06.24
  • Accepted : 2016.08.11
  • Published : 2016.09.30

Abstract

It is crucial for the social media based marketing practices to perform sentiment analyze the unstructured data written by the potential consumers of their products and services. In particular, when it comes to the companies which are interested in global business, the companies must collect and analyze the data from the social media of multinational settings (e.g. Youtube, Instagram, etc.). In this case, since the texts are multilingual, they usually translate the sentences into a certain target language before conducting sentiment analysis. However, due to the lack of cultural differences and highly qualified data dictionary, translated sentences suffer from misunderstanding the true meaning. These result in decreasing the quality of sentiment analysis. Hence, this study aims to propose a method to perform a multilingual sentiment analysis, focusing on Korean-Chinese cases, while avoiding language translations. To show the feasibility of the idea proposed in this paper, we compare the performance of the proposed method with those of the legacy methods which adopt language translators. The results suggest that our method outperforms in terms of RMSE, and can be applied by the global business institutions.

소비자들이 소셜미디어 상에 기록한 글을 통해 기업은 제품 또는 기업 이미지에 대한 감성분석을 수행하는데 이는 소셜미디어 기반 마케팅에서 중요한 활동 중에 하나다. 특히 글로벌 소셜미디어의 경우 국적을 불문하고 다양한 고객이 늘어남에 따라 여러 언어권의 소비자들이 각자의 언어로 다양한 의견을 표명하고 있다. 이처럼 다양한 언어로 작성된 텍스트를 감성분석하기 위해서는 기존 방법과 달리 동일한 언어로 통일시켜야 하는 번역 작업이 필요하다. 하지만 번역을 하게 될 경우, 언어와 관련된 배경이나 문화, 용어사용의 차이 등으로 본래 문서에 있는 모든 단어나 문법을 정확히 표현할 수 없는 문제점이 있다. 따라서 본 연구에서는 다중 언어로 수집되는 텍스트를 번역하지 않고 해당 언어별로 텍스트를 분리한 다음 감성분석을 진행하여 각각의 극성치를 종합하는 방법을 제안하고자 한다. 본 연구에서 제안한 다국어 감성분석 알고리즘을 검증하기 위해 다중언어 문장을 한국어, 중국어로 번역한 감성분석의 극성치 편차인 RMSE 값을 비교하였다. 그 결과, 번역을 통한 다중언어의 감성분석보다 언어별로 분리한 감성값이 실제 감성값에 가장 근접하는 것으로 나타나 본 연구에서 제안한 방법론의 우수성을 입증하였다. 본 연구는 다수의 유사한 연구에서 사용했던 알고리즘을 사용하지 않고 원문 그대로 다중언어 감성분석을 시도했다는 점에서 의의가 있다.

Keywords

References

  1. Abdul-Mageed M., M. Diab and S. Kubler, "SAMAR: Subjectivity and sentiment analysis for Arabic social media," Computer Speech & Language, Vol.28, No.1(2014), 20-37. https://doi.org/10.1016/j.csl.2013.03.001
  2. Baccianella S., A. Esuli and F. Sebastiani, "SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining," LREC, Vol.10(2010), 2200-2204.
  3. Balahur A. and J. M. Perea-Ortega, "Sentiment analysis system adaptation for multilingual processing: The case of tweets," Information Processing & Management, Vol.51, No.4 (2015), 547-556. https://doi.org/10.1016/j.ipm.2014.10.004
  4. Bal, D., M. Bal, A. Van Bunningen, A. Hogenboom, F. Hogenboom and F. Frasincar, "Sentiment analysis with a multilingual pipeline," In International Conference on Web Information Systems Engineering, Springer Berlin Heidelberg, (2011), 129-142.
  5. Bautin M., L. Vijayarenu and S. Skiena, "International Sentiment Analysis for News and Blogs," In ICWSM, (2008).
  6. Boiy E. and M. F. Moens, "A machine learning approach to sentiment analysis in multilingual Web texts," Information retrieval, Vol.12, No.5(2009), 526-558. https://doi.org/10.1007/s10791-008-9070-z
  7. Denecke K., "Using sentiwordnet for multilingual sentiment analysis," In Data Engineering Workshop, 2008. ICDEW 2008. IEEE 24th International Conference on IEEE, (2008), 507-512.
  8. Dong Zhengdong and Dong Qiang, "Introduction to Hownet," In HowNet, http://www.keenage.com, 2000.
  9. Gamallo P. and M. Garcia, "Citius: A Naive-Bayes Strategy for Sentiment Analysis on English Tweets," Proceedings of SemEval, (2014), 171-175.
  10. Ghorbel H. and D. Jacot, "Sentiment analysis of French movie reviews," In Advances in Distributed Agent-Based Retrieval Tools, Springer Berlin Heidelberg, (2011), 97-108.
  11. Go A., R. Bhayani and L. Huang, "Twitter sentiment classification using distant supervision," CS224N Project Report, Stanford, (2009).
  12. Hajmohammadi M. S., R. Ibrahim and A. Selamat, "Bi-view semi-supervised active learning for cross-lingual sentiment classification," Information Processing & Management, Vol.50, No.5(2014), 718-732. https://doi.org/10.1016/j.ipm.2014.03.005
  13. Hajmohammadi M. S., R. Ibrahim and A. Selamat, "Cross-lingual sentiment classification using multiple source languages in multi-view semi-supervised learning," Engineering Applications of Artificial Intelligence, Vol.36(2014), 195-203. https://doi.org/10.1016/j.engappai.2014.07.020
  14. Hajmohammadi M. S., R. Ibrahim, A. Selamat and H. Fujita, "Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples," Information Sciences: 317, (2015), 67-77. https://doi.org/10.1016/j.ins.2015.04.003
  15. Kim Y., S. R. Jeong and I. Ghani, "Text opinion mining to analyze news for stock market prediction," Int. J. Advance. Soft Comput. Appl, Vol.6, No.1(2014).
  16. Lee G. H. and K. J. Lee, "Twitter Sentiment Analysis for the Recent Trend Extracted from the Newspaper Article," KIPS Transactions on Software and Data Engineering, Vol.2, No.10(2013), 731-738. https://doi.org/10.3745/KTSDE.2013.2.10.731
  17. Martin-Valdivia M. T., E. Martinez-Camara, J. M. Perea-Ortega and L. A. Urena-Lopez, "Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches," Expert Systems with Applications, Vol.40, No.10(2013), 3934-3942. https://doi.org/10.1016/j.eswa.2012.12.084
  18. Novak P. K., J. Smailovic, B. Sluban and I. Mozetic, "Sentiment of emojis," PloS one, Vol.10, No.12(2015), e0144296. https://doi.org/10.1371/journal.pone.0144296
  19. Pak A. and P. Paroubek, "Twitter as a Corpus for Sentiment Analysis and Opinion Mining," In LREC, Vol.10(2010), 1320-1326.
  20. Shi W., H. Wang and S. He, "Sentiment analysis of Chinese microblogging based on sentiment ontology: a case study of '7.23 Wenzhou Train Collision'," Connection Science, Vol.25 No.4(2013), 161-178. https://doi.org/10.1080/09540091.2013.851172
  21. Tan S. and J. Zhang, "An empirical study of sentiment analysis for chinese documents," Expert Systems with Applications, Vol.34, No.4(2008), 2622-2629. https://doi.org/10.1016/j.eswa.2007.05.028
  22. Toutanova K., D. Klein, C. D. Manning and Y. Singer, "Feature-rich part-of-speech tagging with a cyclic dependency network," In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, Association for Computational Linguistics, (2003), 173-180.
  23. Tromp E. and M. Pechenizkiy, "Senticorr: Multilingual sentiment analysis of personal correspondence," In 2011 IEEE 11th International Conference on Data Mining Workshops(ICDMW), (2011), 1247-1250.
  24. Van Atteveldt, W., J. Kleinnijenhuis, N. Ruigrok and S. Schlobach, "Good news or bad news? Conducting sentiment analysis on Dutch text to distinguish between positive and negative relations," Journal of Information Technology & Politics, Vol.5, No.1(2008), 73-94. https://doi.org/10.1080/19331680802154145
  25. Vural, A. G., B. B. Cambazoglu, P. Senkul and Z. O. Tokgoz, "A framework for sentiment analysis in turkish: Application to polarity detection of movie reviews in turkish," In Computer and Information Sciences III, Springer London, (2013), 437-445.
  26. Wan, X., "Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis," In Proceedings of the conference on empirical methods in natural language processing, Association for Computational Linguistics, (2008), 553-561.
  27. Xianghua F., L. Guo, G. Yanyan and W. Zhiqiang, "Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon," Knowledge-Based Systems, Vol. 37(2013), 186-195. https://doi.org/10.1016/j.knosys.2012.08.003
  28. Yang S. and Y. Ko, "Classifying Korean comparative sentences for comparison analysis," Natural Language Engineering, Vol.20, No.4(2014), 557-581. https://doi.org/10.1017/S1351324913000211
  29. Zhang C., D. Zeng, J. Li, F. Y. Wang and W. Zuo, "Sentiment analysis of Chinese documents: From sentence to document level," Journal of the American Society for Information Science and Technology, Vol.60, No.12(2009), 2474-2487. https://doi.org/10.1002/asi.21206
  30. Zheng L., H. Wang and S. Gao, "Sentimental feature selection for sentiment analysis of Chinese online reviews," International Journal of Machine Learning and Cybernetics, (2015), 1-10.

Cited by

  1. 지자체 사이버 공간 안전을 위한 금융사기 탐지 텍스트 마이닝 방법 vol.23, pp.3, 2016, https://doi.org/10.13088/jiis.2017.23.3.119
  2. CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로 vol.24, pp.2, 2018, https://doi.org/10.13088/jiis.2018.24.2.059
  3. Identifying Social Relationships using Text Analysis for Social Chatbots vol.24, pp.4, 2016, https://doi.org/10.13088/jiis.2018.24.4.085