DOI QR코드

DOI QR Code

Predicting stock movements based on financial news with systematic group identification

시스템적인 군집 확인과 뉴스를 이용한 주가 예측

  • Seong, NohYoon (College of Business, KAIST) ;
  • Nam, Kihwan (College of Business, Korea Advanced Institute of Science and Technology (KAIST))
  • 성노윤 (한국과학기술원 경영공학부) ;
  • 남기환 (한국과학기술원 경영공학부)
  • Received : 2019.05.17
  • Accepted : 2019.09.19
  • Published : 2019.09.30

Abstract

Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.

빅데이터 시대에 정보의 양이 급증하고, 그중 많은 부분을 차지하는 문자열 정보를 정량화하여 의미를 찾아 낼 수 있는 인공지능 방법론이 함께 발전하면서, 텍스트 마이닝을 통해 주가 예측에 적용해 온라인 뉴스로 주가를 예측하려는 시도가 다양해지고 있다. 이러한 주가 예측의 방법은 대개 예측하고자 하는 기업의 뉴스로 주가를 예측하는 방식이다. 하지만 특정 회사의 뉴스만이 그 회사의 주가에 영향을 주는 것이 아니라, 그 회사와 관련성이 높은 회사들의 뉴스 또한 주가에 영향을 줄 수 있다. 그러나 관련성이 높은 기업을 찾는 것은 시장 전반의 공통적인 영향과 무작위 신호 때문에 쉽지 않다. 따라서 기존 연구들은 주로 미리 정해진 국제 산업 분류 표준에 기반을 둬 관련성이 높은 기업을 찾았다. 하지만 최근 연구에 따르면, 국제 산업 분류 표준은 섹터에 따라 동질성이 다르며, 동질성이 낮은 섹터는 그들을 모두 함께 고려하여 주가를 예측하는 것이 성능에 악영향을 줄 수 있다는 한계점을 가진다. 이러한 한계점을 극복하기 위해, 본 논문에서는 주가 예측 연구에서 처음으로 경제물리학에서 주로 사용되는 무작위 행렬 이론을 사용하여 시장 전반 효과와 무작위 신호를 제거하고 군집 분석을 시행하여 관련성이 높은 회사를 찾는 방법을 제시하였다. 또한, 이를 기반으로 관련성이 높은 회사의 뉴스를 함께 고려하며 다중 커널 학습을 사용하는 인공지능 모형을 제시한다. 본 논문의 결과는 무작위 행렬 이론을 통해 시장 전반의 효과와 무작위 신호를 제거하여 정확한 상관 계수를 찾아 군집 분석을 시행한다면 기존 연구보다 더 좋은 성능을 보여 준다는 것을 보여준다.

Keywords

References

  1. Aghabozorgi, S., and Y. W. Teh, "Stock Market Co-Movement Assessment Using a Three-Phase Clustering Method," Expert Systems with Applications, Vol.11, No.4(2014) 1301-1314. https://doi.org/10.1016/j.eswa.2013.08.028
  2. Aiolli, F., and M. Donini, "Easymkl: A Scalable Multiple Kernel Learning Algorithm," Neurocomputing, Vol.169, No.1(2015), 215-224. https://doi.org/10.1016/j.neucom.2014.11.078
  3. Bun, J., R. Allez, J.-P. Bouchaud, and M. Potters, "Rotational Invariant Estimator for General Noisy Matrices," IEEE Transactions on Information Theory, Vol.62, No.12(2016), 7475-7490. https://doi.org/10.1109/TIT.2016.2616132
  4. Bun, J., J.-P. Bouchaud, and M. Potters, "Cleaning Large Correlation Matrices: Tools from Random Matrix Theory," Physics Reports, Vol.666, No.1(2017), 1-109. https://doi.org/10.1016/j.physrep.2016.10.005
  5. Cho, C. H., and T. Mooney, "Stock Return Comovement and Korean Business Groups," Review of Development Finance, Vol.5, No.2(2015), 71-81. https://doi.org/10.1016/j.rdf.2015.09.001
  6. Garcia, A., "Global Financial Indices and Twitter Sentiment: A Random Matrix Theory Approach," Physica A: Statistical Mechanics and its Applications, Vol.461, No.1(2016), 509-522. https://doi.org/10.1016/j.physa.2016.06.024
  7. Groth, S. S., and J. Muntermann, "An Intraday Market Risk Management Approach Based on Textual Analysis," Decision Support Systems, Vol.50, No.4(2011), 680-691. https://doi.org/10.1016/j.dss.2010.08.019
  8. Gu, Y., C. Wang, D. You, Y. Zhang, S. Wang, and Y. Zhang, "Representative Multiple Kernel Learning for Classification in Hyperspectral Imagery," IEEE Transactions on Geoscience and Remote Sensing, Vol.50, No.7(2012), 2852-2865. https://doi.org/10.1109/TGRS.2011.2176341
  9. Hagenau, M., M. Liebmann, and D. Neumann, "Automated News Reading: Stock Price Prediction Based on Financial News Using Context-Capturing Features," Decision Support Systems, Vol.55, No.3(2013), 685-697. https://doi.org/10.1016/j.dss.2013.02.006
  10. Hsu, C., C. Chang, and C. Lin, "A Practical Guide to Support Vector Classification," Department of Computer Science National Taiwan University, 2010.
  11. Jain, A., S. V. Vishwanathan, and M. Varma, "Spf-Gmkl: Generalized Multiple Kernel Learning with a Million Kernels," Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining: ACM, (2012), 750-758.
  12. Keerthi, S. S., and C.-J. Lin, "Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel," Neural computation, Vol.15, No.7(2003), 1667-1689. https://doi.org/10.1162/089976603321891855
  13. Kim, D.-H., and H. Jeong, "Systematic Analysis of Group Identification in Stock Markets," Physical Review E, Vol.72, No.4(2005), 046133. https://doi.org/10.1103/PhysRevE.72.046133
  14. Laloux, L., P. Cizeau, M. Potters, and J.-P. Bouchaud, "Random Matrix Theory and Financial Correlations," International Journal of Theoretical and Applied Finance, Vol.3, No.3 (2000), 391-397. https://doi.org/10.1142/S0219024900000255
  15. Loh, L., "Co-Movement of Asia-Pacific with European and Us Stock Market Returns: A Cross-Time-Frequency Analysis," Research in International Business and Finance, Vol.29, No.1(2013), 1-13. https://doi.org/10.1016/j.ribaf.2013.01.001
  16. Morck, R., B. Yeung, and W. Yu, "The Information Content of Stock Markets: Why Do Emerging Markets Have Synchronous Stock Price Movements?," Journal of financial economics, Vol.58, No.1-2(2000), 215-260. https://doi.org/10.1016/S0304-405X(00)00071-4
  17. Nam, K., and N. Seong, "Financial News-Based Stock Movement Prediction Using Causality Analysis of Influence in the Korean Stock Market," Decision Support Systems, Vol.117, No.1(2019), 100-112. https://doi.org/10.1016/j.dss.2018.11.004
  18. Park, E. L., and S. Cho, "Konlpy: Korean Natural Language Processing in Python," Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology, (2014).
  19. Rua, A., and L. C. Nunes, "International Comovement of Stock Market Returns: A Wavelet Analysis," Journal of Empirical Finance, Vol.16, No.4(2009), 632-639. https://doi.org/10.1016/j.jempfin.2009.02.002
  20. Seong, N., and K. Nam, "Combining Macro-economical Effects with Sentiment Analysis for Stock Index Prediction," Entrue Journal of Information Technology, Vol.16, No.2(2017), 41-54.
  21. Seong, N., and K. Nam, "Online News-Based Stock Price Forecasting Considering Homogeneity in the Industrial Sector," Journal of Intelligence and Information Systems, Vol.24, No.2(2018), 1-19. https://doi.org/10.13088/JIIS.2018.24.2.001
  22. Shynkevich, Y., T. M. McGinnity, S. A. Coleman, and A. Belatreche, "Forecasting Movements of Health-Care Stock Prices Based on Different Categories of News Articles Using Multiple Kernel Learning," Decision Support Systems, Vol.85, No.1(2016), 74-83. https://doi.org/10.1016/j.dss.2016.03.001
  23. Vui, C. S., G. K. Soon, C. K. On, R. Alfred, and P. Anthony, "A Review of Stock Market Prediction with Artificial Neural Network (Ann), "IEEE International Conference on Control System, Computing and Engineering: IEEE, (2013) 477-482.

Cited by

  1. 온라인 뉴스와 거시경제 지표, 금융 지표, 기술적 지표, 관심도 지표를 이용한 코스닥 상장 기업의 기계학습 기반 주가 변동 예측 vol.24, pp.3, 2019, https://doi.org/10.9717/kmms.2020.24.3.448