DOI QR코드

DOI QR Code

A Comparative Study on Using SentiWordNet for English Twitter Sentiment Analysis

영어 트위터 감성 분석을 위한 SentiWordNet 활용 기법 비교

  • Kang, In-Su (School of Computer Science & Engineering, College of Engineering, Kyungsung University)
  • 강인수 (경성대학교 공과대학 컴퓨터공학부)
  • Received : 2013.05.15
  • Accepted : 2013.07.05
  • Published : 2013.08.25

Abstract

Twitter sentiment analysis is to classify a tweet (message) into positive and negative sentiment class. This study deals with SentiWordNet(SWN)-based twitter sentiment analysis. SWN is a sentiment dictionary in which each sense of an English word has a positive and negative sentimental strength. There has been a variety of SWN-based sentiment feature extraction methods which typically first determine the sentiment orientation (SO) of a term in a document and then decide SO of the document from such terms' SO values. For example, for SO of a term, some calculated the maximum or average of sentiment scores of its senses, and others computed the average of the difference of positive and negative sentiment scores. For SO of a document, many researchers employ the maximum or average of terms' SO values. In addition, the above procedure may be applied to the whole set (adjective, adverb, noun, and verb) of parts-of-speech or its subset. This work provides a comparative study on SWN-based sentiment feature extraction schemes with performance evaluation on a well-known twitter dataset.

트위터 감성 분석은 트윗글의 감성을 긍정과 부정으로 분류하는 작업이다. 이 연구에서는 SentiWordNet(SWN) 감성 사전에 기반한 트윗글 감성 분석을 다룬다. SWN은 전체 영어 단어에 대해 단어의 의미별로 긍정, 부정의 감성 강도를 저장해 둔 감성 사전이다. 기존 SWN 기반 감성 분석 연구들은 문서에 출현하는 각 용어의 감성을 SWN으로부터 결정한 다음 이를 바탕으로 문서 전체의 감성을 결정하였는데, 그 방법들이 매우 다양하다. 예를 들어, 한 용어의 감성 결정 시 해당 용어의 SWN 내 의미별 긍정, 부정 감성 강도 차이들의 평균을 계산하거나 긍정과 부정 각각의 감성 강도 평균 혹은 최대값을 구하기도 하며, 문서 전체의 감성을 결정하는 경우에도 문서 내 용어들의 감성 값들에 대해 평균 혹은 최대값을 취하기도 하였다. 또한 SWN 내 형용사, 동사, 명사, 부사의 품사 집합 전체 혹은 특정 부분집합에 대해 위의 감성 결정 작업을 적용하기도 한다. 이처럼 기존 연구에서는 SWN 기반의 다양한 감성 자질 추출 절차가 시도되고 있으나 이들 자질 추출 기법 전반에 대한 성능 비교 연구는 찾기 힘들다. 이 연구에서는 SWN을 트위터 감성 분석에 활용하는 다양한 방법들을 일반화하는 절차들을 소개하고 각 방법들의 성능 비교 및 분석 결과를 제시한다.

Keywords

References

  1. A. Go, R. Bhayani, L. Huang, "Twitter Sentiment Classification using Distant Supervision," CS224N Project Report, Stanford, 2009.
  2. A. Agarwal, B. Xie, I. Vovsha, O. Rambow, R. Passonneau, "Sentiment Analysis of Twitter Data," Proceedings of the Workshop on Languages in Social Media (LSM), 2011.
  3. A. Pak, P. Paroubek, "Twitter as a Corpus for Sentiment Analysis and Opinion Mining," Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2010.
  4. L. Barbosa, J. Feng, "Robust Sentiment Detection on Twitter from Biased and Noisy Data," Proceedings of the 23rd International Conference on Computational Linguistics (COLING), 2010.
  5. E. Kouloumpis, T. Wilson, J. Moore, "Twitter Sentiment Analysis: The Good the Bad and the OMG!," Proceedings of the Fifth International Conference on Weblogs and Social Media (ICWSM), 2011.
  6. H. Saif, Y. He, H. Alani, "Semantic sentiment analysis of twitter," Proceedings of the 11th International Conference on The Semantic Web (ISWC), 2012.
  7. S. Baccianella, A. Esuli, F. Sebastiani, "SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining," Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2010.
  8. K. Denecke, W. Nejdl, "How valuable is medical social media data? Content analysis of the medical web," Information Sciences, vol. 179, no. 12, pp. 1870-1880, 2009. https://doi.org/10.1016/j.ins.2009.01.025
  9. M. Taboada, J. Brooke, M. Tofiloski, K. Voll, "Lexicon-Based Methods for Sentiment Analysis," Computational Linguistics, vol. 37, no. 2, pp. 267-307, 2011. https://doi.org/10.1162/COLI_a_00049
  10. A. Hamouda, M. Rohaim, "Reviews Classification Using SentiWordNet Lexicon," The Online Journal on Computer Science and Information Technology (OJCSIT), vol. 2, no. 1, pp. 120-123, 2011.
  11. R. Dehkharghani, B. Yanikoglu, D. Tapucu, Y. Saygin, "Adaptation and Use of Subjectivity Lexicons for Domain Dependent Sentiment Classification," IEEE 12th International Conference on Data Mining Workshops (ICDMW), 2012.
  12. Y. Lee, S. Na, J. Kim, S, Nam, H. Jung, J. Lee, "KLE at TREC 2008 Blog Track: Blog Post and Feed Retrieval," Proceedings of The Seventeenth Text REtrieval Conference (TREC), 2008.
  13. B. Ohana, B. Tierney, "Sentiment classification of reviews using SentiWordNet," Proceedings of the 9th IT&T Conference, 2009.
  14. R. Feldman, "Techniques and applications for sentiment analysis," Communications of the ACM, vol. 56, no. 4, pp. 82-89, 2013.
  15. B. Liu, "Sentiment analysis and opinion mining," Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers, 2012.
  16. B. Pang, L. Lee, "Opinion mining and sentiment analysis," Foundations and Trends in Information Retrieval, vol. 2, no. 1-2, pp. 1-135, 2008. https://doi.org/10.1561/1500000011
  17. P. Gehler, S. Nowozin, "On feature combination for multiclass object classification," Proceedings of IEEE 12th International Conference on Computer Vision (ICCV), 2009.
  18. S. Kim, S. Park, S. Park, S. Lee, K. Kim, "A Syllable Kernel based Sentiment Classification for Movie Reviews," Journal of Korean Institute of Intelligent Systems, vol. 20, no. 2, pp. 202-207, 2010. https://doi.org/10.5391/JKIIS.2010.20.2.202

Cited by

  1. A User Emotion Information Measurement Using Image and Text on Instagram-Based vol.17, pp.9, 2014, https://doi.org/10.9717/kmms.2014.17.9.1125
  2. Comparison Between Optimal Features of Korean and Chinese for Text Classification vol.25, pp.4, 2015, https://doi.org/10.5391/JKIIS.2015.25.4.386
  3. Personalized Movie Recommendation System Using Context-Aware Collaborative Filtering Technique vol.4, pp.9, 2015, https://doi.org/10.3745/KTCCS.2015.4.9.289