DOI QR코드

DOI QR Code

A Real-Time Stock Market Prediction Using Knowledge Accumulation

지식 누적을 이용한 실시간 주식시장 예측

  • Received : 2011.08.09
  • Accepted : 2011.08.16
  • Published : 2011.12.31

Abstract

One of the major problems in the area of data mining is the size of the data, as most data set has huge volume these days. Streams of data are normally accumulated into data storages or databases. Transactions in internet, mobile devices and ubiquitous environment produce streams of data continuously. Some data set are just buried un-used inside huge data storage due to its huge size. Some data set is quickly lost as soon as it is created as it is not saved due to many reasons. How to use this large size data and to use data on stream efficiently are challenging questions in the study of data mining. Stream data is a data set that is accumulated to the data storage from a data source continuously. The size of this data set, in many cases, becomes increasingly large over time. To mine information from this massive data, it takes too many resources such as storage, money and time. These unique characteristics of the stream data make it difficult and expensive to store all the stream data sets accumulated over time. Otherwise, if one uses only recent or partial of data to mine information or pattern, there can be losses of valuable information, which can be useful. To avoid these problems, this study suggests a method efficiently accumulates information or patterns in the form of rule set over time. A rule set is mined from a data set in stream and this rule set is accumulated into a master rule set storage, which is also a model for real-time decision making. One of the main advantages of this method is that it takes much smaller storage space compared to the traditional method, which saves the whole data set. Another advantage of using this method is that the accumulated rule set is used as a prediction model. Prompt response to the request from users is possible anytime as the rule set is ready anytime to be used to make decisions. This makes real-time decision making possible, which is the greatest advantage of this method. Based on theories of ensemble approaches, combination of many different models can produce better prediction model in performance. The consolidated rule set actually covers all the data set while the traditional sampling approach only covers part of the whole data set. This study uses a stock market data that has a heterogeneous data set as the characteristic of data varies over time. The indexes in stock market data can fluctuate in different situations whenever there is an event influencing the stock market index. Therefore the variance of the values in each variable is large compared to that of the homogeneous data set. Prediction with heterogeneous data set is naturally much more difficult, compared to that of homogeneous data set as it is more difficult to predict in unpredictable situation. This study tests two general mining approaches and compare prediction performances of these two suggested methods with the method we suggest in this study. The first approach is inducing a rule set from the recent data set to predict new data set. The seocnd one is inducing a rule set from all the data which have been accumulated from the beginning every time one has to predict new data set. We found neither of these two is as good as the method of accumulated rule set in its performance. Furthermore, the study shows experiments with different prediction models. The first approach is building a prediction model only with more important rule sets and the second approach is the method using all the rule sets by assigning weights on the rules based on their performance. The second approach shows better performance compared to the first one. The experiments also show that the suggested method in this study can be an efficient approach for mining information and pattern with stream data. This method has a limitation of bounding its application to stock market data. More dynamic real-time steam data set is desirable for the application of this method. There is also another problem in this study. When the number of rules is increasing over time, it has to manage special rules such as redundant rules or conflicting rules efficiently.

연속발생 데이터는 데이터의 원천으로부터 데이터 저장소로 연속적으로 축적이 되는 데이터를 말한다. 이렇게 축적된 데이터의 크기는 시간이 지남에 따라 점점 커진다. 또한 이러한 대용량 데이터에서 정보를 추출하기 위해서는 저장공간, 시간, 그리고 많은 자원이 필요하다. 이러한 연속발생 데이터의 특성은 시간이 지남에 따라 축적된 대용량 데이터의 이용을 어렵고 고비용이 되게 한다. 만약 정보나 패턴을 추출할 때 누적된 전체 발생 데이터 중에서 최근의 일부만 사용 한다면 적은 일부 표본의 사용의 문제로 인하여 전체 데이터 사용에서 발견될 수 있는 유용한 정보의 유실이 있을 수 있다. 이러한 문제점을 해결하기 위해서 본 연구는 연속발생 데이터를 발생 시점에서 계속 모으기 보다 이러한 발생되는 데이터에서 규칙을 추출하여 효율적으로 지식을 관리하고자 한다. 이 방법은 기존의 방법에 비하여 적은 양의 데이터 저장공간을 필요로 한다. 또한 이렇게 축적된 규칙집합은 미래에 예측을 위해서 언제든 실시간 예측을 할 수 있게 준비가 된다. 여러 예측 모델을 결합시키는 방법인 앙상블 이론에 의하면 본 연구가 제시하는 데로 체계적으로 규칙집합을 시간에 따라 융합시킬 경우 더 나은 예측 성과가 가능하다. 본 연구는 주식시장의 변동성을 예측하기 위하여 주식시장 데이터를 사용하였다. 본 연구는 이 데이터를 이용해 본 연구가 제시하는 방법과 기존의 방법의 예측 정확도를 비교 하였다.

Keywords

References

  1. 최세일, 최세일 "주식시장 기술적 지표 분석", 진리탐구, 1999.
  2. Achelis, S. B., "Technical analysis from A to Z", Chicago : Probus Publishing, 1995.
  3. Agrawal, R. and R. Strikant, "Mining sequential patterns", Proceedings of 1995 International Conference of Data Engineering, (1995), 3-14.
  4. Ayan, N. F., A. U. Tansel, and M .E. Arkun, "An efficient algorithm to update large item sets with early pruning", Proceedings of the Fifth CM SIGKDD International Conference on Knowledge Discovery and Data Mining, (1999), 287-291
  5. Cheung, D. W., J. Hand, V. Ng, and C. Y. Wong, "Maintenance of discovered association rules in large databases:An incremental updating technique", Proceedings of the Twelfth International Conference on Data Engineering, (1996), 106-114
  6. Cheung, W. and O. R. Zaiane, "Incremental Mining of Frequent Patterns Without Candidate Generation or Support Constraint", Proceedings of the 7th International Database Engineering and Applications Symposium, 2003.
  7. Cutler, D. M., J. M. Poterba and L. H Summers, "Speculative Dynamics and the Role of Feedback Traders", American Economic Review, Vol.80(1990), 63-68.
  8. Domingos, P. and G. Hulten, "Mining High-Speed Data Streams", Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2000), 71-80.
  9. Domingos, P. and G. Hulten, Mining High-Speed Data Streams, KDD, Boston, ACM Press, 2000.
  10. Ganti, V., J. Gehrke, and R. Ramakrishnan, "DEMON : Mining and monitoring evolving data", Proceedings of the Sixteenth International Conference on Data Engineering, (2000), 439-448
  11. Ganti, V., J. Gehrke, and R. Ramakrishnan, "Mining Data Streams under Block Evolution", SIGKDD Explorations, Vol.3, No.2 (2002), 1-10. https://doi.org/10.1145/507515.507517
  12. Gehrke, J., V. Ganti, R. Ramakrishnan, and W .L. Loh, "BOAT : optimistic decision tree construction", Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, (1999), 169-180.
  13. Giannella, C., J. Han, J. Pei, and X. Yan, "Mining Frequent Patterns in Data Streams at Multiple Time Granularities", Next Generation Data Mining, MIT Press, 2003.
  14. Greenwald, M. and S. Khanna, "Space-Efficient On-line Computation of Quintile Summaries", Proceedings of ACM SIGMOD, Santa Barbara, 2001.
  15. Guha, S., N. Mishra, R. Motwani, and L. O'Callagan, "Clustering Data Streams", Proceedings of the 41st Annual Symposium on Foundations of Computer Science, 2000.
  16. Han, J. and M. Kamber, Data Mining-Concepts and Techniques, Morgan Kaufmann, 2001.
  17. Han, J., J. Pei, and Y. Yin, "Mining frequent patterns without candidate generation", Proceedings of 2000 ACM-SIGMOD International Conference of Management of Data, (2000), 1-12.
  18. Hidber, C., "Online Associattion Rule Mining", Proceedings of ACM SIGMOD, (1999), 145 -156.
  19. Hulten, G., L. Spencer, and P. Domingos, "Mining Time-Changing Data Streams", KDD 01 San Francisco, C.A., 2001.
  20. Jegadeesh, N., "Evidence of Predictable Behavior of Security Returns", Journal of Finance, Vol.45, No.3(1990), 881-898. https://doi.org/10.1111/j.1540-6261.1990.tb05110.x
  21. Pei, J., J. Han and R. Mao, "CLOSET : An efficient algorithm for mining frequent closed itemsets", Proceedings of 2000 ACM-SIG MOD International workshop of Data Mining and Knowledge Discovery(2000), 11- 20.
  22. Pesaran, M. H. and A. Timmerman, "Predictability of Stock Returns : Robustness and Economic Significance", Journal of Finance, Vol.50, No.4(1995), 1201-1228 https://doi.org/10.1111/j.1540-6261.1995.tb04055.x
  23. Wang, H., W. Fan, P. S. Yu, and J. Han, "Mining Concept-Drifting Data Streams Using Ensemble Classifiers", Proceedings of ACMSIGKDD, (2003), 24-27.

Cited by

  1. 주가 경향 예측 모델의 공정한 성능 평가 방법 vol.20, pp.10, 2020, https://doi.org/10.5392/jkca.2020.20.10.702
  2. A Study on Stock Trend Determination in Stock Trend Prediction vol.25, pp.12, 2011, https://doi.org/10.9708/jksci.2020.25.12.035