DOI QR코드

DOI QR Code

Development of Sentiment Analysis Model for the hot topic detection of online stock forums

온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발

  • Hong, Taeho (College of Business Administration, Pusan National University) ;
  • Lee, Taewon (Institute of Chinese Studies, Pusan National University) ;
  • Li, Jingjing (Institute of Chinese Studies, Pusan National University)
  • Received : 2015.09.09
  • Accepted : 2016.03.16
  • Published : 2016.03.31

Abstract

Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.

소셜 미디어를 이용하는 사용자들이 직접 작성한 의견 혹은 리뷰를 이용하여 상호간의 교류 및 정보를 공유하게 되었다. 이를 통해 고객리뷰를 이용하는 오피니언마이닝, 웹마이닝 및 감성분석 등 다양한 연구분야에서의 연구가 진행되기 시작하였다. 특히, 감성분석은 어떠한 토픽(주제)를 기준으로 직접적으로 글을 작성한 사람들의 태도, 입장 및 감성을 알아내는데 목적을 두고 있다. 고객의 의견을 내포하고 있는 정보 혹은 데이터는 감성분석을 위한 핵심 데이터가 되기 때문에 토픽을 통한 고객들의 의견을 분석하는데 효율적이며, 기업에서는 소비자들의 니즈에 맞는 마케팅 혹은 투자자들의 시장동향에 따른 많은 투자가 이루어지고 있다. 본 연구에서는 중국의 온라인 시나 주식 포럼에서 사용자들이 직접 작성한 포스팅(글)을 이용하여 기존에 제시된 토픽들로부터 핫토픽을 선정하고 탐지하고자 한다. 기존에 사용된 감성 사전을 활용하여 토픽들에 대한 감성값과 극성을 분류하고, 군집분석을 통해 핫토픽을 선정하였다. 핫토픽을 선정하기 위해 k-means 알고리즘을 이용하였으며, 추가로 인공지능기법인 SOM을 적용하여 핫토픽 선정하는 절차를 제시하였다. 또한, 로짓, 의사결정나무, SVM 등의 데이터마이닝 기법을 이용하여 핫토픽 사전 탐지를 하는 감성분석을 위한 모형을 개발하여 관심지수를 통해 선정된 핫토픽과 탐지된 핫토픽을 비교하였다. 본 연구를 통해 핫토픽에 대한 정보 제공함으로써 최신 동향에 대한 흐름을 알 수 있게 되고, 주식 포럼에 대한 핫토픽은 주식 시장에서의 투자자들에게 유용한 정보를 제공하게 될 뿐만 아니라 소비자들의 니즈를 충족시킬 수 있을 것이라 기대된다.

Keywords

References

  1. An, J. and H. Kim, "Building a Korean Sentiment Lexicon Using Collective Intelligence," Journal of Intelligence and Information Systems, Vol.21, No.2(2015), 49-67. https://doi.org/10.13088/jiis.2015.21.2.49
  2. Baccianella, S., A. Esuli, and F. Sebastiani, "SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining," Proceedings of the Seventh Conference on International Language Resources and Evaluation, Vol.10(2010), 2200-2204.
  3. Bollen, J., H. Mao, and X. Zeng, "Twitter mood predicts the stock market," Journal of Computational Science, Vol.2, No.1(2011), 1-8. https://doi.org/10.1016/j.jocs.2010.12.007
  4. Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Wadsworth, Belmont, 2008.
  5. Burges, C. J., "A tutorial on support vector machines for pattern recognition," Data Mining and Knowledge Discovery, Vol.2, No.2(1998), 121-167. https://doi.org/10.1023/A:1009715923555
  6. Chen, L., L. Qi, and F. Wang, "Comparison of feature-level learning methods for mining online consumer reviews," Expert Systems with Applications, Vol.39(2012), 9588-9601. https://doi.org/10.1016/j.eswa.2012.02.158
  7. Fung, G. P. C., J. X. Yu, and W. Lam, "Stock prediction: Integrating text mining approach using real-time news," Proceedings of IEEE International Conference on Computational Intelligence for Financial Engineering, (2003), 395-402.
  8. Hartigan, J. A., Clustering Algorithms. John Wiley & Sons, Inc., 1975.
  9. Hong, T. and E. Kim, "Predicting the Response of Segmented Customers for the Promotion Using Data Mining," Information Systems Review, Vol.12, No.2(2010), 75-88.
  10. Hu, M. and B. Liu, "Mining Opinion Features in Customer Reviews," Proceedings of the 19th national conference on Artificial intelligence, (2004), 755-760.
  11. Huang, C. J., J. J. Liao, D. X. Yang, T. Y. Chang, and Y. C. Luo, "Realization of a news dissemination agent based on weighted association rules and text mining techniques," Expert Systems with Applications, Vol.37, No.9(2010), 6409-6413. https://doi.org/10.1016/j.eswa.2010.02.078
  12. Hu, N., I. Bose, N. S. Koh, and L. Liu, "Manipulation of online reviews: An analysis of ratings, readability, and sentiments," Decision Support Systems, Vol.52, No.3(2012), 674-684. https://doi.org/10.1016/j.dss.2011.11.002
  13. Jin, F., N. Self, P. Saraf, P. Butler, W. Wang, and N. Ramakrishnan, "Forex-foreteller: Currency trend modeling using news articles," Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, (2013), 1470-1473.
  14. Jin, Y., J. Kim, and J. Kim, "Product Community Anlaysis Using Opinion Mining and Network Anlysis: Movie Performance Prediction Case," Journal of Intelligence and Information Systems, Vol.20, No.1(2014), 49-165. https://doi.org/10.13088/jiis.2014.20.1.049
  15. Kass, G., "An exploratory technique for investigating large quantities of categorical data," Applied Statistics, Vol.29(1980), 119-127. https://doi.org/10.2307/2986296
  16. Kim, Y. M., S. J. Jeong, and S. J. Lee, " A Study on the Stock Market Prediction Based on Sentiment Analysis of Social Media," Entrue Journal of Information Technology, Vol.13, No.3(2014), 59-70.
  17. Li, N. and D. D. Wu, "Using text mining and sentiment analysis for online forums hotspot detection and forecast," Decision Support Systems, Vol.48, No.2(2010), 354-368. https://doi.org/10.1016/j.dss.2009.09.003
  18. Liu, B., "Sentiment Analysis and Opinion Mining," Synthesis Lectures on Human Language Technologies, Vol.5, No.1(2012), 1-167. https://doi.org/10.2200/S00416ED1V01Y201204HLT016
  19. Maks, I. and P. Vossen, "A lexicon model for deep sentiment analysis and opinion mining applications," Decision Support Systems, Vol.53, No.4(2012), 680-688. https://doi.org/10.1016/j.dss.2012.05.025
  20. Martin-Valdivia, M. T., E. Martinez-Camara, J. M. Perea-Ortega, and L. A. Urena-Lopez, "Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches," Expert Systems with Applications, Vol.40, No.10(2013), 3934-3942. https://doi.org/10.1016/j.eswa.2012.12.084
  21. Medhat, W., A. Hassan, and H. Korashy, "Sentiment analysis algorithms and application: A survey," Ain Shams Engineering Journal, Vol.5(2014), 1093-1113. https://doi.org/10.1016/j.asej.2014.04.011
  22. Oh, S.-H. and S.-J. Kang, "Movie Retrieval System by Analyzing Sentimental Keyword from User's Movie Reviews," Journal of the Korea Academia-Industrial, Vol.14, No.3(2013), 1422-1427. https://doi.org/10.5762/KAIS.2013.14.3.1422
  23. Pang, B. and L. Lee, "Opinion Mining and Sentiment Analysis," Foundations and Trends in Information Retrieval, Vol.2, No.1-2(2008), 1-135. https://doi.org/10.1561/1500000011
  24. Pang, B., L. Lee, and S. Vaithyanathan, "Thumbs up?: sentiment classification using machine learning techniques," Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Vol.10(2002), 79-86.
  25. Park, H. and K. H. Cho, "CHAID Algorithm by Cubebased Proportional Sampling," Journal of Korean Data & Information Science Society, Vol.15, No.4(2004), 803-816.
  26. Quinlan, J. R., "Induction of Decision Trees," Machine Learning, Vol.1, No.1(1986), 81-106. https://doi.org/10.1007/BF00116251
  27. Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, California, 1993.
  28. Schumaker, R. P., Y. Zhang, C. N. Huang, and H. Chen, "Evaluating sentiment in financial news articles," Decision Support Systems, Vol.53, No.3(2012), 458-464. https://doi.org/10.1016/j.dss.2012.03.001
  29. Tan, S. and J. Zhang, "An empirical study of sentiment analysis for chinese documents," Expert Systems with Applications, Vol.34, No.4(2008), 2622-2629. https://doi.org/10.1016/j.eswa.2007.05.028
  30. Turney, P. D. and M. L. Littman, "Measuring Praise and Criticism: Inference of Semantic Orientation from Association," ACM Transactions on Information Systems(TOIS), Vol.21, No.4(2003), 315-346. https://doi.org/10.1145/944012.944013
  31. Vapnik, V., The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995.
  32. Wang, G., J. Sun, J. Ma, K. Xu, and J. Gu, "Sentiment classification: The contribution of ensemble learning," Decision Support Systems, Vol.57(2014), 77-93. https://doi.org/10.1016/j.dss.2013.08.002
  33. Yu, E., Y. Kim, N. Kim, and S. Jeong, "Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary," Journal of Intelligence and Information Systems, Vol.19, No.1(2013), 95-110. https://doi.org/10.13088/jiis.2013.19.1.095
  34. Zhang, C., D. Zeng, J. Li, F. Y. Wang, and W. Zuo, "Sentiment analysis of Chinese documents: From sentence to document level," Journal of the American Society for Information Science and Technology, Vol.60, No.12(2009), 2474-2487. https://doi.org/10.1002/asi.21206

Cited by

  1. 효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용 vol.24, pp.1, 2018, https://doi.org/10.13088/jiis.2018.24.1.167
  2. CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로 vol.24, pp.2, 2018, https://doi.org/10.13088/jiis.2018.24.2.059