Incremental Regression based on a Sliding Window for Stream Data Prediction

스트림 데이타 예측을 위한 슬라이딩 윈도우 기반 점진적 회귀분석

  • Published : 2007.12.15

Abstract

Time series of conventional prediction techniques uses the model which is generated from the training step. This model is applied to new input data without any change. If this model is applied directly to stream data, the rate of prediction accuracy will be decreased. This paper proposes an stream data prediction technique using sliding window and regression. This technique considers the characteristic of time series which may be changed over time. It is composed of two steps. The first step executes a fractional process for applying input data to the regression model. The second step updates the model by using its information as new data. Additionally, the model is maintained by only recent data in a queue. This approach has the following two advantages. It maintains the minimum information of the model by using a matrix, so space complexity is reduced. Moreover, it prevents the increment of error rate by updating the model over time. Accuracy rate of the proposed method is measured by RME(Relative Mean Error) and RMSE(Root Mean Square Error). The results of stream data prediction experiment are performed by the proposed technique IMQR(Incremental Multiple Quadratic Regression) is more efficient than those of MLR(Multiple Linear Regression) and SVR(Support Vector Regression).

최근 센서 네트워크의 발달로 실세계의 많은 데이타가 시간 속성을 갖고 실시간으로 수집되고 있다. 기존의 시계열 데이타 예측 기법은 모델 갱신 없이 예측을 수행하였다. 그러나 스트림 데이타는 매우 빠르게 수집이 되고 시간이 지남에 따라 데이타의 특성이 변경될 수 있으므로 기존의 시계열 예측 기법을 적용하는 것은 적절하지 않다. 따라서 이 논문에서는 슬라이딩 윈도우와 점진적인 회귀분석을 이용한 스트림 데이타 예측 기법을 제안한다. 이 기법은 스트림 데이타를 다중 회귀 모델에 입력하기 위해 차원 분열을 통해 여러 개의 속성으로 분열(Fractal)하고, 변화되는 데이타의 분포를 반영하기 위해 슬라이딩 윈도우 기법을 사용하여 점진적으로 회귀 모델을 갱신한다. 또한 고정 크기 큐를 이용하여 최근의 데이타로만 모델을 유지한다. 이전 데이타의 유지 없이 최소 정보를 갖는 행렬을 통해 모델을 갱신하므로 낮은 공간 복잡도를 갖고 점진적으로 모델을 갱신함으로써 에러율의 증가를 방지한다. 제안된 기법의 타당성은 RME(Relative Mean Error)와 RMSE(Root Mean Square Error)를 이용하여 측정하였고, 실험 결과 다른 기법에 비해 우수하였다.

Keywords

References

  1. M. J. Franklin and S. R. Jeffery etc., 'Design Considerations for High Fan-In Systems: The HiFi Approach,' Conference on Innovative Data Systems Research, pp. 290-304, 2005
  2. A. Manjeshwar and D. P. Agrawal, 'TEEN: A routing protocol for enhanced efficiency in wireless sensor networks,' International Workshop Parallel and Distributed Computing Issues in Wireless Networks and Mobile Computing, pp. 2009-2015, 2001
  3. R. C. Olover and K. Smettem etc., 'Field Testing a Wireless Sensor Network for Reactive Environmental Monitoring,' Intelligent Sensors, Sensor Networks and Information Processing, pp. 7-12, 2004
  4. B. Xu. and O. Wolfson, 'Time-Series Prediction with Application to Traffic and Moving Objects Databases,' ACM Workshop on Data Engineering for Wireless and Mobile Access, pp. 56-60, 2003
  5. B. Babcock, S. Babu, and M. Datar, et al., 'Models and Issues in Data Stream Systems,' Invited paper in Proc. of PODS, 2002
  6. L. Golab, M. Tamer Ozsu, 'Issues in Data Stream Management,' In SIGMOD Record, Volume 32, Number 2, 2003
  7. S. Babu, J. Widom, 'Continuous queries over data streams,' In ACM SIGMOD Record, pp. 109-120, 2001
  8. 오광우, 이성덕, 이우리, '시계열 분석 입문 및 응용', 탐진, 2000
  9. N. Davey, S. P. Hunt, and R. J. Frank, 'Time Series Prediction and Neural Networks,' In Journal of Intelligent and Robotic Systems, 2001
  10. X. Hao, D. XU, 'Time Series Prediction based on Non-Parametric Regression and Wavelet-Fractal,' In Proc. of ISCP04, pp. 388-391, 2004
  11. S. Sarkka, A. Vehtari, and J. Lampinen, 'Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density,' In Proc. of IJCNN, pp. 1653-1658, 2004
  12. O. B. Yaik, C. H. Yong, and F. Haron, 'Time Series Prediction using Adaptive Association Rules,' In Proc. of DFMA05, pp. 310-314, 2005
  13. B.-K. Yi, ND Sidiropoulos, and T. Johnson, et al, 'Online Data Mining for Co-Evolving Time Sequences,' In Proc. of ICDE2000, pp. 13-22, 2000
  14. 조동연, 장병탁, '순차적 베이지안 진화 연산을 이용한 시계열 예측', 한국정보과학회 추계 학술발표논문집, Vol.27, No.2, pp. 311-313, 2000
  15. D. Kibler, D. W. Aha, and M. Albert, 'Instancebased prediction of real-valued attributes,' In Computational Intelligence, pp. 51-57, 1989
  16. C. G. Atkeson, A. W. Moore, and S. Schaal, 'Locally weighted learning,' In Artificial Intelligence Review, pp. 11-73, 1997
  17. S. Schaal, C. G. Atkeson, and S. Vijayakumar, 'Real-Time Robot Learning With Locally Weighted Statistical Learning,' In Proc. of the IEEE International Conference on Robotics and Automation, pp. 288-293, 2000
  18. W. D. Smart, L. P. Kaelbling, 'Practical reinforcement learning in continuous spaces,' In Proc. of the 17th International Conference on Machine Learning, pp. 903-910, 2000
  19. 박성현, '회귀분석', 민영사, 1997
  20. S. C. Chapra, R. P. Canale, 'Numerical Method for Engineers, Third Edition,' McGraw-Hill Korea, 1999
  21. G. W. Flake, S. Lawrence, 'Efficient SVM Regression Training with SMO,' In Machine Learning, pp. 271-290, 2002
  22. C. C. Chang, C. J. Lin, LIBSVM, http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/, 2001
  23. S. Ruping, mySVM, Computer Science Dep. AI Unit Univ. of Dortmund, 2000
  24. Typhoon Research Center, http://www.typhoon.or.kr, 2001