DOI QR코드

DOI QR Code

An Empirical Comparison of Machine Learning Models for Classifying Emotions in Korean Twitter

한국어 트위터의 감정 분류를 위한 기계학습의 실증적 비교

  • 임좌상 (상명대학교 미디어소프트웨어학과) ;
  • 김진만 (상명대학교 일반대학원 컴퓨터과학과)
  • Received : 2014.01.15
  • Accepted : 2014.02.19
  • Published : 2014.02.28

Abstract

As online texts have been rapidly growing, their automatic classification gains more interest with machine learning methods. Nevertheless, comparatively few research could be found, aiming for Korean texts. Evaluating them with statistical methods are also rare. This study took a sample of tweets and used machine learning methods to classify emotions with features of morphemes and n-grams. As a result, about 76% of emotions contained in tweets was correctly classified. Of the two methods compared in this study, Support Vector Machines were found more accurate than Na$\ddot{i}$ve Bayes. The linear model of SVM was not inferior to the non-linear one. Morphological features did not contribute to accuracy more than did the n-grams.

온라인에서의 글쓰기가 늘어나면서, 기계학습을 통해 이를 분류하는 연구가 늘고 있다. 그럼에도 불구하고 한국어로 작성된 마이크로블로그를 대상으로 한 연구는 많지 않다. 또한 통계적으로 기계학습을 평가한 연구를 찾아보기 힘들다. 본 논문에서는 트위터를 대상으로, 표본을 추출하고, 형태소와 음절을 자질로 사용하여 기계학습에 따라 감정을 분류하였다. 그 결과 약 76%정도 트위터에 포함된 감정이 분류되었다. Support Vector Machine이 Na$\ddot{i}$ve Bayes보다 정확했고, 선형모델도 비구조적인 텍스트 처리에 비선형모델에 상응하는 정확성을 보였다. 또한 형태소가 음절 자질에 비해 높은 정확성을 보이지 않았다.

Keywords

References

  1. Gerald L Clore, Norbert Schwarz, and Michael Conway, Handbook of Social Cognition, Psychology Press, New York, pp. 323-417, 1994.
  2. Michael W Morris and Dacher Keltner, "How Emotions Work: the Social Functions of Emotional Expression in Negotiations," Research in Organizational Behavior, Vol. 22, pp. 1-50, 2000. https://doi.org/10.1016/S0191-3085(00)22002-9
  3. Peggy A Thoits, "The Sociology of Emotions," Annual Review of Sociology, Vol. 15, pp. 317-342, 1989. https://doi.org/10.1146/annurev.so.15.080189.001533
  4. 홍초희, 김학수, "트윗 감정 분류를 위한 다양한 기계학습 자질에 대한 비교 연구," 한국콘텐츠학회논문지, 제12권, 제12호, pp. 471-478, 2012. https://doi.org/10.5392/JKCA.2012.12.12.471
  5. 이철성, 최동희, 김성순, 강재우, "한글 마이크로블로그 텍스트의 감정 분류 및 분석," 정보과학회논문지:데이타베이스, 제40권, 제3호, pp. 159-167, 2013.
  6. 김민철, 심규승, 한남기, 김예은, 송민, "트위터상의 악의적 이용 자동분류," 한국문헌정보학회지, 제47권, 제1호, pp. 269-286, 2013.
  7. Angela Fahrni and Manfred Klenner, "Old Wine or Warm Beer: Target-specific Sentiment Analysis of Adjectives," Proc. The Symposium on Affective Language in Human and Machine , pp. 60-63, 2008.
  8. Minqing Hu and Bing Liu, "Mining and Summarizing Customer Reviews," Proc. The Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168-177, 2004.
  9. Xiaowen Ding, Bing Liu, and Philip S Yu, "A Holistic Lexicon-based Approach to Opinion Mining," Proc. The International Conference on Web Search and Web Data Mining, pp. 231-240, 2008.
  10. Maite Taboada, Julian Brroke, Milan Tofiloski, Kimberly Voll, and Manfred Stede, "Lexicon-based Methods for Sentiment Analysis," Computational Linguistics, Vol. 37, No. 2, pp. 267-307, 2011. https://doi.org/10.1162/COLI_a_00049
  11. Ley Zhang, Riddhiman Ghosh, Mohamed Dekhil, Meichun Hsu, and Bing Liu, Combining Lexiconbased and Learning-based Methods for Twitter Sentiment Analysis, HP Laboratories, Technical Report HPL-2011, Vol. 89, 2011.
  12. Bo Pang and Lillian Lee, "A Sentimental Education: Sentiment Analysis using Subjectivity Summarization based on Minimum Cuts," Proc. The 42nd Annual Meeting on Association for Computational Linguistics, pp. 271, 2004.
  13. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan, "Thumbs Up? Sentiment Classification using Machine Learning Techniques," Proc. Emnlp 2002, pp. 79-86, 2002.
  14. 이공주, 김재훈, 서형원, 류길수, "뉴스 댓글의 감정 분류를 위한 자질 가중치 설정," 한국마린엔지니어링학회지, 제34권, 제6호, pp. 871-879, 2010. https://doi.org/10.5916/jkosme.2010.34.6.871
  15. Alec Go, Richa Bhayani, and Lei Huang, Twitter Sentiment Classification using Distant Supervision, CS224N Project Report, Stanford, pp. 1-12, 2009.
  16. Taku Kudo, MeCab. version 0.996, 2013.
  17. 이준호, 안정수, 박현주, 김명호, "한글 문서의 효과적인 검색을 위한 n-Gram 기반의 색인 방법," 정보관리학회지, 제13권, 제1호, pp. 47-63, 1996.
  18. 김철수, 김양범, "대용량 전자사전 구축을 위한 국어 대사전의 통계 정보," 한국콘텐츠학회논문지, 제7권, 제6호, pp. 60-68, 2007. https://doi.org/10.5392/JKCA.2007.7.6.060
  19. J Susan Milton and Jesse C Arnold, Introduction to Probability and Statistics: Principles and Applications for Engineering and the Computing Sciences, McGraw-Hill, Inc., New York, 2002.
  20. Bernhard E Boser, Isabelle M Guyon, and Vladimir N Vapnik, "A Training Algorithm for Optimal Margin Classifiers," Proc. The Fifth Annual Workshop on Computational Learning Theory, pp. 144-152, 1992.
  21. Jiawei Han, Micheline Kamber, and Jian Pei, Data Mining: Concepts and Techniques, Morgan kaufmann, San Francisco, California, 2006.
  22. Yiming Yang and Xin Liu, "A Re-examination of Text Categorization Methods," Proc. The 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42-49, 1999.
  23. Jason DM Rennie and Ryan Rifkin, Improving Multi Class Text Classification with the Support Vector Machine, Technical Report 2001-026, MIT. 2001.
  24. 황두성, "지지벡터기계를 이용한 다중 분류 문제의 학습과 성능 비교," 멀티미디어학회논문지, 제11권, 제7호, pp. 1035-1042, 2008.
  25. Thorsten Joachims, "Text Categorization with Support Vector Machines: Learning with Many Relevant Features," 1998.
  26. Sotiris B Kotsiantis, "Supervised Machine Learning: a Review of Classification Techniques," Informatica, Vol. 31, No. 3, pp. 249-268, 2007.
  27. Fabrice Colas and Pavel. Brazdil, "Comparison of Svm and Some Older Classification Algorithms in Text Classification Tasks," In Artificial Intelligence in Theory and Practice, Vol. 217, pp. 169-178, 2006. https://doi.org/10.1007/978-0-387-34747-9_18

Cited by

  1. Real-time Spatial Recommendation System based on Sentiment Analysis of Twitter vol.21, pp.3, 2016, https://doi.org/10.7838/jsebs.2016.21.3.015
  2. Hotspot Analysis of Korean Twitter Sentiments vol.18, pp.2, 2015, https://doi.org/10.9717/kmms.2015.18.2.233
  3. A Case Study on Machine Learning Applications and Performance Improvement in Learning Algorithm vol.14, pp.2, 2016, https://doi.org/10.14400/JDC.2016.14.2.245
  4. A User Emotion Information Measurement Using Image and Text on Instagram-Based vol.17, pp.9, 2014, https://doi.org/10.9717/kmms.2014.17.9.1125
  5. Emotion Prediction of Document using Paragraph Analysis vol.12, pp.12, 2014, https://doi.org/10.14400/JDC.2014.12.12.249
  6. A Comparative Analysis of Social Commerce and Open Market Using User Reviews in Korean Mobile Commerce vol.21, pp.4, 2015, https://doi.org/10.13088/jiis.2015.21.4.053
  7. Competitive intelligence in social media Twitter: iPhone 6 vs. Galaxy S5 vol.40, pp.1, 2016, https://doi.org/10.1108/OIR-03-2015-0068
  8. Comparing Machine Learning Classifiers for Movie WOM Opinion Mining vol.9, pp.8, 2014, https://doi.org/10.3837/tiis.2015.08.025
  9. 텍스트 분석 기술 및 활용 동향 vol.42, pp.2, 2017, https://doi.org/10.7840/kics.2017.42.2.471
  10. 도플갱어 브랜드 이미지 효과에 대한 실증적 분석: 인터넷 커뮤니티를 중심으로 vol.26, pp.1, 2014, https://doi.org/10.5859/kais.2017.26.1.21
  11. 심박 정보 기반 위치 정보 융합형 감정 추론 어플리케이션 개발 vol.8, pp.8, 2014, https://doi.org/10.15207/jkcs.2017.8.8.083
  12. 비정형 데이터를 이용한 층간소음 탐지 : 네이버 카페를 대상으로 vol.25, pp.3, 2014, https://doi.org/10.7319/kogsis.2017.25.3.087
  13. 소셜 미디어 텍스트를 이용한 장소 선호도 분석 기법 vol.25, pp.4, 2014, https://doi.org/10.7319/kogsis.2017.25.4.055
  14. 빅데이터 분석을 위한 비용효과적 오픈 소스 시스템 설계 vol.19, pp.1, 2014, https://doi.org/10.15813/kmr.2018.19.1.007
  15. Text Mining and Sentiment Analysis for Predicting Box Office Success vol.12, pp.8, 2018, https://doi.org/10.3837/tiis.2018.08.030
  16. 고객 감성 분석을 위한 학습 기반 토크나이저 비교 연구 vol.48, pp.3, 2020, https://doi.org/10.7469/jksqm.2020.48.3.421