DOI QR코드

DOI QR Code

Frequent Items Mining based on Regression Model in Data Streams

스트림 데이터에서 회귀분석에 기반한 빈발항목 예측

  • 이욱현 (한북대학교 컴퓨터정보학과)
  • Published : 2009.01.28

Abstract

Recently, the data model in stream data environment has massive, continuous, and infinity properties. However the stream data processing like query process or data analysis is conducted using a limited capacity of disk or memory. In these environment, the traditional frequent pattern discovery on transaction database can be performed because it is difficult to manage the information continuously whether a continuous stream data is the frequent item or not. In this paper, we propose the method which we are able to predict the frequent items using the regression model on continuous stream data environment. We can use as a prediction model on indefinite items by constructing the regression model on stream data. We will show that the proposed method is able to be efficiently used on stream data environment through a variety of experiments.

최근 스트림데이터 환경의 데이터 모델은 데이터의 양이 아주 크고 연속적이며 무한하다. 이에 반해 제한된 용량의 디스크나 메모리 등을 이용해서 질의 처리나 데이터 분석을 처리한다. 이러한 환경에서 트랜잭션 데이터베이스에 대한 전통적인 빈발패턴탐사는 불가능하다고 할 수 있다. 왜냐하면, 연속적으로 들어오는 스트림 데이터에 대해 어떤 항목집합이 빈발항목인지 아닌지에 대한 정보를 계속적으로 유지 관리하기가 어렵기 때문이다. 본 논문에서는 연속적으로 들어오는 스트림 데이터에 회귀모델을 적용하여 빈발 항목들을 예측할 수 있는 방법을 제안한다. 스트림 데이터로부터 회귀모델을 생성함으로써 불확실한 항목들에 대한 예측 모델로 사용할 수 있다. 다양한 실험을 통하여 제안하는 방법이 스트림 데이터 환경의 데이터에 효율적으로 사용될 수 있음을 보인다.

Keywords

References

  1. R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," In Proc. of Very Large Data Bases, pp.487-499, 1994.
  2. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Models and Issues in Data Stream Systems," In Proc. of PODS, 2002(3).
  3. G. Chen, X. Wu, and X. Zhu, "Mining Sequential Patterns Across Data Streams," Univ. of Vermont Computer Science Technical Report(CS-05-04), 2005(3).
  4. N. Davey, S. P. Hunt, and R. J. Frank, "Time Series Prediction and Neural Networks," In Journal of Intelligent and Robotic Systems, 2001.
  5. M. J. Franklin and S. R. Jeffery etc., "Design Considerations for High Fan-In System: The HiFi Approach," Conference on Innovative Data Systems Research, pp.290-304, 2005.
  6. C. Giannella, J. Han, J. Pei, X. Yan, and P. S. Yu, "Mining Frequent Patterns in Data Streams at Multiple Time Granularities," In H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yeshar(eds.), Next Generation Data Mining, AAAI/MIT, 2003.
  7. L. Golab, M. Tamer Ozsu, "Issues in Data Stream Management," In SIGMOD Record, Vol.32, No.2, 2003.
  8. H. Han, H. Ryoo, and H. Patrick, "An Infrastructure of Stream Data Mining, Fusion and Management for Monitored Patients," In Proc. of 19th IEEE International Symposium on CBMS 2006, pp.461-468, 2006(6). https://doi.org/10.1109/CBMS.2006.39
  9. X. Hao and D. Xu, "Time Series Prediction based on Non-Parametric," In SIGMOD Record, Vol.32, No.2, 2003.
  10. H. Li, S. Lee, and M. Shan, "Online Mining (Recently) Maximal Frequent Itemsets over Data Streams," In Proc. of RIDE-SDMA'05, pp.11-18, 2005(4).
  11. R. C. Olover and K. Smettem, "Field Testing a Wireless Sensor Network for Reactive Environmental Monitoring," Intelligent Sensors, Sensor Networks and Information Processing, pp.7-12, 2004.
  12. J. Pei, J. Han, and R. Mao, "CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets," In Proc. of 2000 ACM SIGMOD International Workshop Data Mining and Knowledge Discovery, pp.11-20, 2000.
  13. J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, "Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach," IEEE Transactions on Knowledge and Data Engineering, Vol.16, No.11, 2004(11). https://doi.org/10.1109/TKDE.2004.77
  14. S. Sarkka, A. Vehtari, and J. Lampinen, "Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density," In Proc. of IJCNN, pp.1653-1658, 2004.
  15. D. F. Specht, "A General Regression Neural Network," IEEE Trans. on Neural Networks, Vol.2, No.6, pp.568-576, 1991(11). https://doi.org/10.1109/72.97934
  16. M. J. Zaki and C. J. Hsiao, "CHARM: An Efficient Algorithm for Closed Itemset Mining," In Proc. 2002 SIAM International Conference Data Mining, pp457-473, 2002.
  17. B. Xu. and O. Wolfson, "Time-Series Prediction with Application to Traffic and Moving Objects Databases," ACM Workshop on Data Engineering for Wireless and Mobile Access, pp.56-60, 2003.
  18. O. B. Yaik, C. H. Yong, and F. Haron, "Time Series Prediction using Adaptive Association Rules," In Proc. of DFMA05, pp.310-314, 2005.
  19. 김현철, "SPSS for Windows에 의한 실용회귀분석"