DOI QR코드

DOI QR Code

Public Sentiment Analysis of Korean Top-10 Companies: Big Data Approach Using Multi-categorical Sentiment Lexicon

국내 주요 10대 기업에 대한 국민 감성 분석: 다범주 감성사전을 활용한 빅 데이터 접근법

  • 김서인 (한양대학교 경영대학 경영학부) ;
  • 김동성 (한양대학교 일반대학원 경영학과) ;
  • 김종우 (한양대학교 경영대학 경영학부)
  • Received : 2016.08.18
  • Accepted : 2016.09.24
  • Published : 2016.09.30

Abstract

Recently, sentiment analysis using open Internet data is actively performed for various purposes. As online Internet communication channels become popular, companies try to capture public sentiment of them from online open information sources. This research is conducted for the purpose of analyzing pulbic sentiment of Korean Top-10 companies using a multi-categorical sentiment lexicon. Whereas existing researches related to public sentiment measurement based on big data approach classify sentiment into dimensions, this research classifies public sentiment into multiple categories. Dimensional sentiment structure has been commonly applied in sentiment analysis of various applications, because it is academically proven, and has a clear advantage of capturing degree of sentiment and interrelation of each dimension. However, the dimensional structure is not effective when measuring public sentiment because human sentiment is too complex to be divided into few dimensions. In addition, special training is needed for ordinary people to express their feeling into dimensional structure. People do not divide their sentiment into dimensions, nor do they need psychological training when they feel. People would not express their feeling in the way of dimensional structure like positive/negative or active/passive; rather they express theirs in the way of categorical sentiment like sadness, rage, happiness and so on. That is, categorial approach of sentiment analysis is more natural than dimensional approach. Accordingly, this research suggests multi-categorical sentiment structure as an alternative way to measure social sentiment from the point of the public. Multi-categorical sentiment structure classifies sentiments following the way that ordinary people do although there are possibility to contain some subjectiveness. In this research, nine categories: 'Sadness', 'Anger', 'Happiness', 'Disgust', 'Surprise', 'Fear', 'Interest', 'Boredom' and 'Pain' are used as multi-categorical sentiment structure. To capture public sentiment of Korean Top-10 companies, Internet news data of the companies are collected over the past 25 months from a representative Korean portal site. Based on the sentiment words extracted from previous researches, we have created a sentiment lexicon, and analyzed the frequency of the words coming up within the news data. The frequency of each sentiment category was calculated as a ratio out of the total sentiment words to make ranks of distributions. Sentiment comparison among top-4 companies, which are 'Samsung', 'Hyundai', 'SK', and 'LG', were separately visualized. As a next step, the research tested hypothesis to prove the usefulness of the multi-categorical sentiment lexicon. It tested how effective categorial sentiment can be used as relative comparison index in cross sectional and time series analysis. To test the effectiveness of the sentiment lexicon as cross sectional comparison index, pair-wise t-test and Duncan test were conducted. Two pairs of companies, 'Samsung' and 'Hanjin', 'SK' and 'Hanjin' were chosen to compare whether each categorical sentiment is significantly different in pair-wise t-test. Since category 'Sadness' has the largest vocabularies, it is chosen to figure out whether the subgroups of the companies are significantly different in Duncan test. It is proved that five sentiment categories of Samsung and Hanjin and four sentiment categories of SK and Hanjin are different significantly. In category 'Sadness', it has been figured out that there were six subgroups that are significantly different. To test the effectiveness of the sentiment lexicon as time series comparison index, 'nut rage' incident of Hanjin is selected as an example case. Term frequency of sentiment words of the month when the incident happened and term frequency of the one month before the event are compared. Sentiment categories was redivided into positive/negative sentiment, and it is tried to figure out whether the event actually has some negative impact on public sentiment of the company. The difference in each category was visualized, moreover the variation of word list of sentiment 'Rage' was shown to be more concrete. As a result, there was huge before-and-after difference of sentiment that ordinary people feel to the company. Both hypotheses have turned out to be statistically significant, and therefore sentiment analysis in business area using multi-categorical sentiment lexicons has persuasive power. This research implies that categorical sentiment analysis can be used as an alternative method to supplement dimensional sentiment analysis when figuring out public sentiment in business environment.

최근에 빅 데이터를 활용하여 감성을 측정하는 시도가 활발히 이루어지고 있다. 통신 매체와 SNS의 발달로 기업은 국민의 감성을 파악하고 즉시 대응해야할 필요성이 생겼다. 우리나라의 경제는 대기업에 대한 의존도가 높기 때문에 10대 기업에 대한 감성분석은 의미가 있다고 할 수 있다. 이러한 측면에서 본 연구는 다 범주를 기준으로 구축한 감성사전을 활용하여 우리나라 10대 기업에 대한 감성을 분석하였다. 빅 데이터를 이용하여 감성을 분석한 기존의 선행연구는 감성을 차원으로 분류하는 경향이 있다. 차원적 감성으로 감성을 분류하는 것은 분류의 기준이 학술적으로 증명되었기에 감성 분석에 주로 사용되어 왔지만 전문가 정도의 지식이 있어야 분류할 수 있어 보편적인 감성을 대변하는 데 비효과적이기에 보완이 필요하다고 할 수 있다. 개별 범주적 감성은 이 점을 보완할 수 있는 분류 방식으로 일정 수준의 주관성이 개입되지만 보편적으로 느낄 수 있는 감성을 측정하는데 효과적이다. 따라서 본 연구는 보편적인 감성의 측정을 위해 감성을 차원으로 분류하지 않고 개별 범주로 분류하여 9가지 영역으로 나누었다. 선행 연구에서 추출한 9가지 범주에 해당하는 감성 단어에 기초하여 감성사전을 구축하였으며 감성 단어가 검출된 빈도를 기준으로 감성을 분석했다. 대상 데이터는 2014년 1월부터 2016년 1월까지 우리나라 10대 기업에 대하여 축적된 뉴스 데이터이다. 대상 데이터에서 검출된 감성 단어의 빈도를 기준으로 각 기업에 대한 감성 순위를 나누고 분포를 확인하였다. 기업에 따라서 감성이 다를 수 있는지, 특정 사건이 각 기업에 대한 감성에 영향을 줄 수 있는지 가설을 세우고 검정하였다. 결론적으로, 다 범주 감성 사전을 활용한 감성 분석은 기업 간 비교와 시점 간 비교에 유의한 것으로 나타났다. 본 연구는 빅 데이터에 산재해있는 감성을 국민의 시각으로 측정하는 하나의 대안으로서 의의가 있다.

Keywords

References

  1. Ahn, E. J. and Y. H. Hwang, "Theory and Practice of Lemma List Construction for a Dictionary-Focused on Yonsei Contemporary Korean Dictionary Compilation," Journal of Korealex, Vol.15(2010), 165-193.
  2. Ahn, J. G. and H. W. Kim, "Building a Korean Sentiment Dictionary and Applications of Natural Language Processing," Proceeding of Journal of Intelligence and Information Systems, Vol.2014, No.11, 177-182.
  3. An. J. Y, J. H. Bae, N. G. Han and M. Song, "A Study of "Emotion Trigger' by Text Mining Techniques," Journal of Intelligence and Information Systems, Vol.21, No.2(2015), 69-92. https://doi.org/10.13088/jiis.2015.21.2.69
  4. Ahn, S. H, S. H. Lee and O. S. Kwon, "A Study of Activation dimension: A mirage in the affective space," Korean Journal of Social Psychology, Vol.7, No.1(1993), 107-123.
  5. Baek, B. H., L. K. Ha and B. C. Ahn, "An Extration Method of Sentiment Information from Unstructured Big Data on SNS," Journal of Korea Multimedia Society, Vol.17, No.6(2014), 671-680. https://doi.org/10.9717/kmms.2014.17.6.671
  6. Cha, Y. S., J. H. Park, J. H. Kim, S. Y. Kim, D. K. Kim and M. C. Whang, "Validity analysis of the social emotion model based on relation types in SNS," Science of Emotion and Sensibility, Vol.15, No.2(2012), 283-296.
  7. Choi, S. J., Y. E. Song and O. B. Kwon, "Analyzing Contextual Polarity of Unstructured Data for Measuring Subjective Well-Being," Journal of Intelligence and Information Systems, Vol.22, No.1(2015), 83-105. https://doi.org/10.13088/jiis.2016.22.1.083
  8. Ekman, P. and H. Oster, "Facial Expressions of Emotion," Annual Review of Psychology, Vol.30(1979), 527-554. https://doi.org/10.1146/annurev.ps.30.020179.002523
  9. Greenwald, M. K., E. W. Cook and P. J. Lang, "Affective judgment and psychophysiological response: Dimensional covariation in the evaluation of pictorial stimuli," Journal of Psychophysiology, Vol.3, No.1(2007), 17-25.
  10. Jang, P. S., "Study on Principal Sentiment Analysis of Social Data," Journal of Korean Institute of Information Technology, Vol.19, No.12(2014), 49-56.
  11. Jung, J. S, D. S. Kim and J. W. Kim, "Influence analysis of Internet buzz to corporate performance: Individual stock price prediction using sentiment analysis of online news," Journal of Intelligence and Information Systems, Vol.21, No.4(2015), 37-51. https://doi.org/10.13088/JIIS.2015.21.4.037
  12. Kang, S. A., Y. S. Kim and S. H. Choi, "Study on the social issue sentiment classification using text mining," Journal of the Korean Data & Information Science Society, Vol.26, No.5(2015), 1167-1173. https://doi.org/10.7465/jkdi.2015.26.5.1167
  13. Kim, D. H., T. M. Cho and J. H. Lee, "A Domain Adaptive Sentiment Dictionary Construction Method for Domain Sentiment Analysis," Proceedings of the Korean Society of Computer Information Conference, Vol.23, No.1(2015), 15-18.
  14. Kim, M. K., J. H. Kim, M. H. Cha and S. H. Chae, "An Emotion Scanning System on Text Documents," Korean Journal of the Science of Emotion and Sensibility, Vol.12, No.4(2009), 433-442.
  15. Kim, S. W. and N. G. Kim, "A Study on the Effect of Using Sentiment Lexicon in Opinion Classification," Journal of Intelligence and Information Systems, Vol.20, No.1(2014), 121-128.
  16. Kim, Y. S., N. G. Kim and S. R. Jung, "Stock-Index Invest Model Using News Big Data Opinion Mining," Journal of Intelligence and Information Systems, Vol.18, No.2(2012), 143-156. https://doi.org/10.13088/JIIS.2012.18.2.143
  17. Kotler, P., "Marketing 3.0: From Products to Customers to the Human Spirit," 1000, Wiley, 2010.
  18. Kwon, O. K. and J. Heo, "Automatic Clustering of Korean Sentiment Words Based on Newspaper Articles, Proceeding of Korean Information Science Society, Vol.2014, No.12(2014), 147-149.
  19. Lee, D. H., H. K. Kang, S. H. Kim and C, M, Lee, "Autocorrelation Analysis of the Sentiment with Stock Information Appearing on Big-Data" Korean Journal of Finance Engineering, Vol.12, No.2(2013), 79-96.
  20. Lee, H. N., G. Y. Choi, S. W. Jung, S. J. Park and Y. S. Jung, "Strategic feeling defined through weight analysis of representative feelings," Proceeding of Eromonomics Society of Korea, 281-285.
  21. Lee, J. W., H. J. Song, E. K. Nah and H. S. Kim, "Classification of Emotion Terms in Korean," Korean Journal of Journalism & Communication Studies, Vol.52, No.1(2008), 85-116.
  22. Lee, K. B., J. B. Baik and S. W, Lee, "Estimating a Pleasure-Displeasure Index of Word based on Word Similarity in SNS," Journal of KIISE : Computing Practices and Letters, Vol.20, No.3(2014), 159-164.
  23. Lee, S. H, J. Choi and J. W. Kim, "Sentiment analysis on movie review through building modified sentiment dictionary by movie genre," Journal of Intelligence and Information Systems, Vol.22, No.2(2016), 97-113. https://doi.org/10.13088/jiis.2016.22.2.097
  24. Lee, S. Y, J. S. Ham and I. J. Ko, "A Classification and Selection Method of Emotion Based on Classifying Emotion Terms by Users," Korean Journal of the Science of Emotion and Sensibility, Vol.19, No.1(2016), 39-49. https://doi.org/10.14695/KJSOS.2016.19.1.39
  25. Park, I. C, "Study on Brand Image Enhancement and Sensitivity Advertising," The Treatise on The Plastic Media, Vol.18, No.2(2015), 127-132.
  26. Park, I. J. and K. H. Min, "Making a List of Korean Emotion Terms and Exploring Dimensions Underlying Them," The Korean journal of social and personality psychology, Vol.19, No.1(2005), 109-129.
  27. Rhee, J. W., H. J. Song, E. K. Na and H. S. Kim, "Classification of Emotion Terms in Korean," Korean Journal of Journalism & Communication Studies, Vol.52, No.2(2008), 85-116.
  28. Rhee, S. Y, J. S. Ham and L. J. Ko, "A Classification and Selection Method of Emotion Based on Classifying Emotion Terms by Users," Korean Journal of the Science of Emotion and Sensibility, Vol.15, No.1(2012), 105-120.
  29. Seo, J. H, H. J. Jo and J. T. Choi, "Design for Opinion Dictionary of Emotion Applying Rules for Antonym of the Korean Grammar," Journal of Korean Institute of Information Technology, Vol.13, No.2(2015), 109-117.
  30. Sohn, S. J, M. S. Park, J. E. Park and J. H. Sohn, "Korean Emotion Vocabulary: Extraction and Categorization of Feeling words," Science of Emotion and Sensibility, Vol.15, No.1(2012), 105-120.
  31. Song, M. J., "Tracking on Attention to the Emotion and Sensibility and its Application at the Innovative Companies: Focused on Content Analysis of Annual Reports," Science of Emotion and Sensibility, Vol.19, No.1(2016), 39-48. https://doi.org/10.14695/KJSOS.2016.19.1.39