DOI QR코드

DOI QR Code

Building a Korean Sentiment Lexicon Using Collective Intelligence

집단지성을 이용한 한글 감성어 사전 구축

  • An, Jungkook (Graduate School of Information, Yonsei University) ;
  • Kim, Hee-Woong (Graduate School of Information, Yonsei University)
  • 안정국 (연세대학교 정보대학원) ;
  • 김희웅 (연세대학교 정보대학원)
  • Received : 2015.05.20
  • Accepted : 2015.06.13
  • Published : 2015.06.30

Abstract

Recently, emerging the notion of big data and social media has led us to enter data's big bang. Social networking services are widely used by people around the world, and they have become a part of major communication tools for all ages. Over the last decade, as online social networking sites become increasingly popular, companies tend to focus on advanced social media analysis for their marketing strategies. In addition to social media analysis, companies are mainly concerned about propagating of negative opinions on social networking sites such as Facebook and Twitter, as well as e-commerce sites. The effect of online word of mouth (WOM) such as product rating, product review, and product recommendations is very influential, and negative opinions have significant impact on product sales. This trend has increased researchers' attention to a natural language processing, such as a sentiment analysis. A sentiment analysis, also refers to as an opinion mining, is a process of identifying the polarity of subjective information and has been applied to various research and practical fields. However, there are obstacles lies when Korean language (Hangul) is used in a natural language processing because it is an agglutinative language with rich morphology pose problems. Therefore, there is a lack of Korean natural language processing resources such as a sentiment lexicon, and this has resulted in significant limitations for researchers and practitioners who are considering sentiment analysis. Our study builds a Korean sentiment lexicon with collective intelligence, and provides API (Application Programming Interface) service to open and share a sentiment lexicon data with the public (www.openhangul.com). For the pre-processing, we have created a Korean lexicon database with over 517,178 words and classified them into sentiment and non-sentiment words. In order to classify them, we first identified stop words which often quite likely to play a negative role in sentiment analysis and excluded them from our sentiment scoring. In general, sentiment words are nouns, adjectives, verbs, adverbs as they have sentimental expressions such as positive, neutral, and negative. On the other hands, non-sentiment words are interjection, determiner, numeral, postposition, etc. as they generally have no sentimental expressions. To build a reliable sentiment lexicon, we have adopted a concept of collective intelligence as a model for crowdsourcing. In addition, a concept of folksonomy has been implemented in the process of taxonomy to help collective intelligence. In order to make up for an inherent weakness of folksonomy, we have adopted a majority rule by building a voting system. Participants, as voters were offered three voting options to choose from positivity, negativity, and neutrality, and the voting have been conducted on one of the largest social networking sites for college students in Korea. More than 35,000 votes have been made by college students in Korea, and we keep this voting system open by maintaining the project as a perpetual study. Besides, any change in the sentiment score of words can be an important observation because it enables us to keep track of temporal changes in Korean language as a natural language. Lastly, our study offers a RESTful, JSON based API service through a web platform to make easier support for users such as researchers, companies, and developers. Finally, our study makes important contributions to both research and practice. In terms of research, our Korean sentiment lexicon plays an important role as a resource for Korean natural language processing. In terms of practice, practitioners such as managers and marketers can implement sentiment analysis effectively by using Korean sentiment lexicon we built. Moreover, our study sheds new light on the value of folksonomy by combining collective intelligence, and we also expect to give a new direction and a new start to the development of Korean natural language processing.

Acknowledgement

Supported by : 한국연구재단

References

  1. Baccianella, S., A. Esuli, and F. Sebastiani, "Senti WordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining," LREC, Vol. 10(2010), 2200-2204.
  2. Ban, S. B. and C. S. Jung, "A neural network model for recognizing facial expressions based on perceptual hierarchy of facial feature points," Korean journal of cognitive science, Vol.12, No.1/2(2001), 77-89.
  3. Black, E. W., "Wikipedia and academic peer review: Wikipedia as a recognised medium for scholarly publication?," Online Information Review, Vol. 32, No. 1(2008), 73-88. https://doi.org/10.1108/14684520810865994
  4. Boder, A., "Collective intelligence: a keystone in knowledge management," Journal of Knowledge Management, Vol. 10, No. 1(2006), 81-93. https://doi.org/10.1108/13673270610650120
  5. Bollen, J., A. Pepe, and H. Mao, "Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena," arXiv preprint arXiv:0911.1583. (2009).
  6. Bonabeau, E., "Decisions 2.0: The power of collective intelligence," MIT Sloan management review, Vol. 50, No.2(2009), 45-52.
  7. Cachia, R., R. Compano, and O. D. Costa, "Grasping the potential of online social networks for foresight," Technological Forecasting and Social Change, Vol. 74, No. 8(2007), 1179-1203. https://doi.org/10.1016/j.techfore.2007.05.006
  8. Cho, S. Y., H.-K, Kim, B. Kim, and H. -W. Kim, "Predicting Movie Revenue by Online Review Mining: Using the Opening Week Online Review," Information Systems Review, Vol. 16, No. 3(2014), 111-132.
  9. Echarte, F., J. J. Astrain, A. Cordoba, and J. E. Villadangos, "Ontology of Folksonomy: A New Modelling Method," SAAKM, 289, 36(2007).
  10. Gruber, T., "Ontology of folksonomy: A mash-up of apples and oranges," International Journal on Semantic Web and Information Systems (IJSWIS), Vol. 3, No. 1(2007), 1-11. https://doi.org/10.4018/jswis.2007010101
  11. Hwang, J. S. and S. Y. Choi, "Analysis of Participants' Features in Different Collective Intelligence Models: Comparative Analysis between Korea and U.S.A.," Journal of Cybercommunication, Vol.27, No.4(2010), 257-301.
  12. Hwang, J. W., and Y. J. Ko, "A Document Sentiment Classification System Based on the Feature Weighting Method Improved by Measuring Sentence Sentiment Intensity," Journal of KIISE Vol.36, No.6(2009), 491-497.
  13. Hwang, S. H., and Y. K. Kang, "Hierarchical Triadic Context Analysis for Folksonomy-Based Web Applications," JDCTA, Vol.2, No.1(2008), 20-27.
  14. Jang, J.-Y.,"A Sentiment Analysis Algorithm for Automatic Product Reviews Classification in On-Line Shopping Mall," The Journal of Society for e-Business Studies, Vol.14, No.4 (2009), 19-33.
  15. Jang, Y., E. Cho, and H. Kim, "An Exploratory Study on Online Prosocial Behavior," Knowledge Management Research, Vol.16, No.1(2015), 225-242. https://doi.org/10.15813/kmr.2015.16.1.011
  16. Jung, Y. C., Y. J. Choi, and S. H. Myaeng, "A Study on Negation Handling and Term Weighting Schemes and Their Effects on Mood-based Text Classification," Korean journal of cognitive science, Vol.19, No.4 (2008), 477-497.
  17. Khan, F. H., S. Bashir, and U. Qamar, "TOM: Twitter opinion mining framework using hybrid classification scheme," Decision Support Systems, Vol.57(2014), 245-257. https://doi.org/10.1016/j.dss.2013.09.004
  18. Kim, J. O., S. Lee, and H. S. Yong, "Automatic Classification Scheme of Opinions Written in Korean," Journal of KIISE: Database, Vol. 38, No.6(2011), 423-428.
  19. Kim, Y., N. Kim, and S. R. Jung, "Stock-Index Invest Model Using News Big Data Opinion Mining," Journal of Intelligence and Information Systems Vol.18, No.2(2012), 143-156. https://doi.org/10.13088/JIIS.2012.18.2.143
  20. Laney, D., "3D data management: Controlling data volume, velocity and variety," META Group, 2001.
  21. Lee, J. S., "Three-Step Probabilistic Model for Korean Morphological Analysis," Journal of KIISE Vol.38, No.5(2011), 257-268.
  22. Lee, S., and H. Yoon, "The Study on Strategy of National Information for Electronic Government of S. Korea with Public Data analysed by the Application of Scenario Planning," The Journal of The Korea Institute of Electronic Communication Sciences Vol.7, No.6(2012), 1259-1273. https://doi.org/10.13067/JKIECS.2012.7.6.1259
  23. Lee, Y.-J, "A Semantic-Based Mashup Development Tool Supporting Various Open API Types," Journal of Internet Computing and Services Vol.13, No.3(2012), 115-126. https://doi.org/10.7472/jksii.2012.13.3.115
  24. Levenshtein, V. I., "Binary codes capable of correcting deletions, insertions, and reversals," Soviet physics doklady, Vol. 10, No. 8(1966), 707-710.
  25. Levy, P., Collective intelligence, Plenum/Harper Collins, 1997.
  26. Lipsman, A., G. Mudd, M, Rich, and S. Bruich, "The power of "like": How brands reach (and influence) fans through social-media marketing," Journal of Advertising research, Vol. 52, No. 1(2012), 40. https://doi.org/10.2501/JAR-52-1-040-052
  27. Malone, T. W., R. Laubacher, and C. Dellarocas, "The collective intelligence genome," IEEE Engineering Management Review, Vol.38, No.3(2010), 21-31.
  28. McAfee, A., and E. Brynjolfsson, "Big data: the management revolution," Harvard business review, Vol. 90, No.10(2012), 61-67.
  29. Medelyan, O., and C. Legg, "Integrating Cyc and Wikipedia: Folksonomy meets rigorously defined common-sense," Proceedings of the WIKI-AI: Wikipedia and AI Workshop at the AAAI'08 Conference, Chicago, US, (2008).
  30. Nasukawa, T., and J. Yi. "Sentiment analysis: Capturing favorability using natural language processing," Proceedings of the 2nd international conference on Knowledge capture, ACM, (2003), 70-77.
  31. Ohkura, T., Y. Kiyota, and H. Nakagawa, "Browsing system for weblog articles based on automated folksonomy," Proceedings of the WWW 2006 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, at WWW, Vol. 2006(2006).
  32. Ohmukai, I., M. Hamasaki, and H. Takeda, "A proposal of community-based folksonomy with RDF metadata." Proceedings of the 4th International Semantic Web Conference (ISWC2005), (2005).
  33. Pang, B., and L. Lee, "Opinion mining and sentiment analysis," Foundations and trends in information retrieval Vol.2, No.1-2(2008), 1-135. https://doi.org/10.1561/1500000011
  34. Prentice, S., "CEO Advisory:'Big Data'Equals Big Opportunity," Gartner, March 31, 2011.
  35. Russell, T., "Contextual authority tagging: Cognitive authority through folksonomy," Unpublished manuscript. Retrieved, Vol. 11, No.16(2005).
  36. Sulis, W., "Fundamental concepts of collective intelligence," Nonlinear Dynamics, Psychology, and Life Science, Vol. 1, No.1(1997), 35-53. https://doi.org/10.1023/A:1022371810032
  37. Taboada, M., J. Brooke, M. Tofiloski, K. Voll, and M. Stede, "Lexicon-based methods for sentiment analysis," Computational linguistics, Vol. 37, No. 2(2011), 267-307. https://doi.org/10.1162/COLI_a_00049
  38. Thomas, V. W., "Folksonomy," online posting, 2007.
  39. Xu, Z., Y. Fu, J. Mao, and D. Su, "Towards the semantic web: Collaborative tag suggestions," Collaborative web tagging workshop at WWW2006, Edinburgh, Scotland, (2006).

Cited by

  1. Reliability Analysis of VOC Data for Opinion Mining vol.22, pp.4, 2016, https://doi.org/10.13088/jiis.2016.22.4.217
  2. Development of Sentiment Analysis Model for the hot topic detection of online stock forums vol.22, pp.1, 2016, https://doi.org/10.13088/jiis.2016.22.1.187
  3. Sentiment analysis on movie review through building modified sentiment dictionary by movie genre vol.22, pp.2, 2016, https://doi.org/10.13088/jiis.2016.22.2.097