DOI QR코드

DOI QR Code

Construction of Consumer Confidence index based on Sentiment analysis using News articles

뉴스기사를 이용한 소비자의 경기심리지수 생성

  • 송민채 (이화여자대학교 빅데이터분석학) ;
  • 신경식 (이화여자대학교 경영대학)
  • Received : 2017.05.29
  • Accepted : 2017.09.13
  • Published : 2017.09.30

Abstract

It is known that the economic sentiment index and macroeconomic indicators are closely related because economic agent's judgment and forecast of the business conditions affect economic fluctuations. For this reason, consumer sentiment or confidence provides steady fodder for business and is treated as an important piece of economic information. In Korea, private consumption accounts and consumer sentiment index highly relevant for both, which is a very important economic indicator for evaluating and forecasting the domestic economic situation. However, despite offering relevant insights into private consumption and GDP, the traditional approach to measuring the consumer confidence based on the survey has several limits. One possible weakness is that it takes considerable time to research, collect, and aggregate the data. If certain urgent issues arise, timely information will not be announced until the end of each month. In addition, the survey only contains information derived from questionnaire items, which means it can be difficult to catch up to the direct effects of newly arising issues. The survey also faces potential declines in response rates and erroneous responses. Therefore, it is necessary to find a way to complement it. For this purpose, we construct and assess an index designed to measure consumer economic sentiment index using sentiment analysis. Unlike the survey-based measures, our index relies on textual analysis to extract sentiment from economic and financial news articles. In particular, text data such as news articles and SNS are timely and cover a wide range of issues; because such sources can quickly capture the economic impact of specific economic issues, they have great potential as economic indicators. There exist two main approaches to the automatic extraction of sentiment from a text, we apply the lexicon-based approach, using sentiment lexicon dictionaries of words annotated with the semantic orientations. In creating the sentiment lexicon dictionaries, we enter the semantic orientation of individual words manually, though we do not attempt a full linguistic analysis (one that involves analysis of word senses or argument structure); this is the limitation of our research and further work in that direction remains possible. In this study, we generate a time series index of economic sentiment in the news. The construction of the index consists of three broad steps: (1) Collecting a large corpus of economic news articles on the web, (2) Applying lexicon-based methods for sentiment analysis of each article to score the article in terms of sentiment orientation (positive, negative and neutral), and (3) Constructing an economic sentiment index of consumers by aggregating monthly time series for each sentiment word. In line with existing scholarly assessments of the relationship between the consumer confidence index and macroeconomic indicators, any new index should be assessed for its usefulness. We examine the new index's usefulness by comparing other economic indicators to the CSI. To check the usefulness of the newly index based on sentiment analysis, trend and cross - correlation analysis are carried out to analyze the relations and lagged structure. Finally, we analyze the forecasting power using the one step ahead of out of sample prediction. As a result, the news sentiment index correlates strongly with related contemporaneous key indicators in almost all experiments. We also find that news sentiment shocks predict future economic activity in most cases. In almost all experiments, the news sentiment index strongly correlates with related contemporaneous key indicators. Furthermore, in most cases, news sentiment shocks predict future economic activity; in head-to-head comparisons, the news sentiment measures outperform survey-based sentiment index as CSI. Policy makers want to understand consumer or public opinions about existing or proposed policies. Such opinions enable relevant government decision-makers to respond quickly to monitor various web media, SNS, or news articles. Textual data, such as news articles and social networks (Twitter, Facebook and blogs) are generated at high-speeds and cover a wide range of issues; because such sources can quickly capture the economic impact of specific economic issues, they have great potential as economic indicators. Although research using unstructured data in economic analysis is in its early stages, but the utilization of data is expected to greatly increase once its usefulness is confirmed.

경제주체들의 경기상황에 대한 판단 및 전망은 경기변동에 영향을 미치므로 경기심리지수와 거시경제지표들 간에는 밀접한 관련성을 나타내는 것으로 알려져 있다. 경기선행지표로 국내에서 많이 사용되는 경기심리지수에는 소비자동향조사, 기업경기조사, 경제심리지수가 있다. 그러나 설문조사를 통해 생성된 지수는 자료의 성격상 속보성이 떨어지는 문제가 있다. 본 연구에서는 이러한 정형데이터의 한계를 보완할 수 있도록 비정형데이터에서 정보를 추출해 경기심리지수를 생성하고, 경제분석에서의 활용 가능성을 검토하였다. 민간소비와 관련된 실물지표에는 소매판매업지수와 서비스업생산지수를 사용하였고, 고용지표에는 고용률과 실업률을, 가격지표에는 소비자물가상승률과 가계의 대출금리를 사용하여 지표들 간의 추이 분석 및 시차구조 파악을 위한 교차상관분석을 수행하였다. 마지막으로 이들 지표들에 대한 예측 가능성을 점검하였다. 분석결과, 다른 지표들의 선행지수로 많이 사용되는 소비자심리지수와 비교해 선택 지표들과 높은 상관관계를 보이며, 1~2개월 선행한 것으로 나타났다. 예측력 또한 향상되어 텍스트데이터에서 생성한 소비자 경기심리지수의 유용성이 확인되었다. 온라인에서 생성되는 뉴스기사나 소셜 SNS 등의 텍스트 데이터는 속보성이 뛰어나고, 커버리지가 넓어 특정 경제적 이슈가 발생할 경우 이것이 경제에 미치는 영향을 빠르게 파악할 수 있다는 점에서 경기판단지표로써의 잠재적 가능성이 클 것으로 보인다. 경제분석에서 비정형데이터를 활용한 국내연구는 초기 단계지만 데이터의 유용성이 확인되면 그 활용도가 크게 높아질 것으로 기대한다.

Keywords

References

  1. A. Balahur, R. Steinberger, M. Kabadjov, V. Zavarella, E. Goot, M. Halkia, B. Pouliquen, and J. Belyaeva., "Sentiment analysis in the News," Proceedings of the 7th International Conference on Language Resources and Evaluation, Valletta, Malta, 19-21, May (2010), 2216-2220.
  2. C. C. Aggarwal, and C. Zhai, "Mining Text Data," Springer Science & Business Media (2012)
  3. A. Kennedy and D. Inkpen., "Sentiment classification of movie and product reviews using contextual valence shifters," Computational Intelligence, Vol.22, No.2 (2006), 110-125. https://doi.org/10.1111/j.1467-8640.2006.00277.x
  4. A. Neviarouskaya, H. Prendinger, and M. Ishizuka., "Sentiful: Generating a reliable lexicon for sentiment analysis," International Conference on Affective Computing and Intelligent Interaction and Workshops, ACII (2009), 1-6.
  5. B. Liu., "Sentiment Analysis: Mining Opinions, Sentiments and Emotions," Cambridge University Press (2015)
  6. B. Pang, L. Lee and S. Vaithyanathan., "Thumbs up?: sentiment classification using machine learning techniques," in: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Vol.10 (2002), 79-86.
  7. B. Pang and L. Lee., "Opinion mining and sentiment analysis," Foundations and Trends in Information Retrieval Vo1.2. No.1-2 (2008), 1-135. https://doi.org/10.1561/1500000011
  8. H. Kanayama and T. Nasukawa., "Fully automatic lexicon expansion for domain oriented sentiment analysis," in: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (2006), 355-363.
  9. Hwang, Y.J, "Analysis of consumer propensity using SNS data," Statistical Research Institute (2015), 117-155.
  10. J. Bollen, H. Mao and X. Zeng., "Twitter mood predicts the stock market," Journal of Computational Science, Vol.2, Issue.1 (2011), 1-8. https://doi.org/10.1016/j.jocs.2010.12.007
  11. J. Bram and S. Ludvigson., "Does Consumer Confidence Forecast Household Expenditure? A Sentiment Index Horse Race," FRBNY ECONOMIC POLICY REVIEW (1997)
  12. J. Kamps, M. Marx, R.J. Mokken and M. de Rijke., "Using WordNet to measure semantic orientations of adjectives," In Proceedings of the International Conference on Language Resources and Evaluation, Vol.4 (2004), 1115-1118.
  13. Lee, G.A. and Hwang, S.P., "Business Cycle Indicator Using Big Data: Compilation of the Naver Search Business Index," Economic Analysis, Vol.20, No.4 (2014), 1-37.
  14. L. Hoang, J.-T. Lee, Y.-I. Song, and H.-C. Rim., "Combining local and global resources for constructing an error-minimized opinion word dictionary," Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence (2008), 688-697.
  15. M. Hu and B. Liu., "Mining and summarizing customer reviews," in: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2004), 168-177.
  16. M. Taboada, J. Brooke, M. Tofiloski, K. Voll and M. Stede., "Lexicon-Based Methods for Sentiment Analysis," Association for Computational Linguistics, Vol.37, N.2 (2011), 267-307. https://doi.org/10.1162/COLI_a_00049
  17. N.O., J and K.-s., Shin. "Bankruptcy Prediction Modeling Using Qualitative Information Based on Big Data Analytics," Journal of intelligence and information systems, Vol. 22, No.2 (2016), 33-56. https://doi.org/10.13088/jiis.2016.22.2.033
  18. P. D. Turney., "Thumbs Up or Thumbs Down?: Semantic Orientation Applied to Unsupervised Classification of Reviews," Proceedings of Annual Meeting of the Association for computational Linguistics (2002), 417-424.
  19. P. D. Turney and M. L. Littman, "Measuring praise and criticism: Inference of semantic orientation from association," Journal ACM Transactions on Information Systems, Vol.21, Issue.4 (2003), 315-346. https://doi.org/10.1145/944012.944013
  20. S. Gelpera, A. Lemmensa, and C. Croux., "Consumer sentiment and consumer spending: decomposing the Granger causal relationship in the time domain," Applied Economics, 39 (2007) 1-11. https://doi.org/10.1080/00036840500427791
  21. S. Huang, Z. Niua and C. Shi., "Automatic construction of domain-specific sentiment lexicon based on constrained label propagation," Knowledge-Based Systems, Vol.56 (2014), 191-200.
  22. S. Kim and N. Kim, "A Study on the Effect of Using Sentiment Lexicon in Opinion Classification," Journal of intelligence and information systems, Vol. 20, No. 1, (2014), 133-148. https://doi.org/10.13088/JIIS.2014.20.1.133
  23. S. Lee, J. Cui, and J. Kim. "Sentiment analysis on movie review through building modified sentiment dictionary by movie genre," Journal of intelligence and information systems, Vol. 22, No, 2 (2016), 97-113. https://doi.org/10.13088/jiis.2016.22.2.097
  24. S. Ludvigson., "Consumer confidence and consumer spending," The Journal of Economic Perspectives, Vol.18, No. 2 (2004), 29-50. https://doi.org/10.1257/0895330041371222
  25. S. Mohammad, C. Dunne, and B. Dorr., "Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus," Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Vol.2 (2009), 599-608
  26. TH. Nguyen, K. Shirai and J. Velcin., "Sentiment analysis on social media for stock movement prediction," Expert Systems with Applications, Vol.42, Issue.24 (2015), 9603-9611. https://doi.org/10.1016/j.eswa.2015.07.052
  27. T. Wilson, J. Wiebe and P. Hoffmann., "Recognizing contextual polarity in phrase level sentiment analysis," in: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (2005), 347-354.
  28. X. Li, H. Xie, L. Chen, J. Wang and X. Deng, "News impact on stock price return via sentiment analysis," Knowledge-Based Systems, Vol.69 (2014), 14-23. https://doi.org/10.1016/j.knosys.2014.04.022
  29. Y. Lu, M. Castellanos, U. Dayal and C. Zhai., "Automatic Construction of a Context-Aware Sentiment Lexicon: An Optimization Approach," in proceedings of World Wide Web (2011), 346-356.

Cited by

  1. 뉴스 데이터를 활용한 텍스트 감성분석에 따른 지역 산업생태계 위기 예측 - 광주 지역 자동차 산업을 중심으로 - vol.20, pp.8, 2017, https://doi.org/10.5392/jkca.2020.20.08.001