DOI QR코드

DOI QR Code

A Dynamic Ensemble Method using Adaptive Weight Adjustment for Concept Drifting Streaming Data

컨셉 변동 스트리밍 데이터를 위한 적응적 가중치 조정을 이용한 동적 앙상블 방법

  • 김영덕 (충남대학교 컴퓨터공학과) ;
  • 박정희 (충남대학교 컴퓨터공학과)
  • Received : 2017.04.21
  • Accepted : 2017.06.07
  • Published : 2017.08.15

Abstract

Streaming data is a sequence of data samples that are consistently generated over time. The data distribution or concept can change over time, and this change becomes a factor to reduce the performance of a classification model. Adaptive incremental learning can maintain the classification performance by updating the current classification model with the weight adjusted according to the degree of concept drift. However, selecting the proper weight value depending on the degree of concept drift is difficult. In this paper, we propose a dynamic ensemble method based on adaptive weight adjustment according to the degree of concept drift. Experimental results demonstrate that the proposed method shows higher performance than the other compared methods.

스트리밍 데이터는 시간에 따라 지속적으로 생성되는 데이터 시퀀스이다. 시간이 지남에 따라 데이터의 분포 또는 컨셉이 변화할 수 있으며, 이러한 변화는 분류 모델의 성능을 저하시키는 요인이 된다. 점층적 적응적 학습 방법은 컨셉 변화의 정도에 따라 현재 분류 모델의 가중치를 조절하여 업데이트를 수행함으로써 컨셉 변화에 대한 분류 모델의 성능을 유지할 수 있게 한다. 그러나, 컨셉 변화의 정도에 맞는 적절한 가중치를 결정하기가 어렵다는 문제점이 있다. 본 논문에서는 컨셉 변화에 따른 적응적 가중치 조정에 기반한 동적 앙상블 방법을 제안한다. 실험 결과는 제안한 방법이 다른 비교 방법들에 비해 높은 성능을 보여줌을 입증한다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, and A. Bouchachia, "A Survey on Concept Drift Adaptation," ACM Computing Surveys, Vol. 46, No. 4, pp. 44:1-44:37, 2014.
  2. C. H. Park, "Application of an Adaptive Incremental Classifier for Streaming Data," Journal of KIISE, Vol. 43, No. 12, pp. 1396-1403, Dec. 2016. (in Korean) https://doi.org/10.5626/JOK.2016.43.12.1396
  3. J. Gama, P. Medas, G. Castillo, and P. Rodrigues, "Learning with Drift Detection," Advances in Artificial Intelligence SBIA, pp. 286-295, 2004.
  4. M. Baena-Garcia, J. Campo-Avil, R. Fidalgo, A. Bifet, R. Gavalda and R. Morales-Bueno, "Early Drift Detection Method," Proc. of the 4th ECML PKDD International Workshop on Knowledge Discovery from Data Streams, pp. 77-86, 2006.
  5. A. Bifet and R. Gavalda, "Learning from Time-Changing Data with Adaptive Windowing," SIAM International Conference on Data Mining, 2007.
  6. Y. I. Kim and C. H. Park, "An Effective Concept Drift Detection Method on Streaming Data Using Probability Estimates," Journal of KIISE, Vol. 43, No. 6, pp. 718-723, Jun. 2016. (in Korean) https://doi.org/10.5626/JOK.2016.43.6.718
  7. W. N. Street and Y. Kim, "A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification," Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377-382, 2001.
  8. D. Brzezinski and J. Stefanowski, "Accuracy Updated Ensemble for Data Streams with Concept Drif," Proc. of the 6th HAIS International Conference, pp. 155-163, 2011.
  9. N. Oza and S. Russell, "Online Bagging and Boosting," Artificial Intelligence and Statistics 2001, pp. 1050112. 2001.
  10. A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby and R. Gavalda, "New Ensemble Methods for Evolving Data Streams," Proc. of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 139-148, 2009.
  11. S. Chao and F. Wong, "An Incremental Decision Tree Learning Methodology Regarding Attributes in Medical Data Mining," Proc. of 8th International Conference on Machine Learning and Cybernetics, Vol. 3, pp. 1694-1699, 2009.
  12. S. Ren and Y. Lian, "Incremental Naive Bayesian Learning Algorithm Based on Classification Contribution Degree," Journal of Computers, Vol. 9, pp. 1967-1974, Aug. 2014.
  13. Y. Hai, W. He and L. Fan, "An Incremental Learning Algorithm for SVM Based on Voting Principle," Proc. of the 6th International Conference on Advanced Information Management and Service, pp. 420-423, 2010.
  14. L. I. Kuncheva and C. O. Plumpton, "Adaptive Learning Rate for Online Linear Discriminant Classifiers," LNCS 5342, pp. 510-519, 2008.
  15. G. J. Ross, N. M. Adams and D. K. Tasoulis, "Exponentially Weighted Moving Average Charts for Detecting Concept Drift," Journal of Pattern Recognition Letters, Vol. 33, pp. 191-198, Dec. 2012. https://doi.org/10.1016/j.patrec.2011.08.019
  16. A. Bifet, G. Holmes, R. Kirkby and B. Pfahringer (2010, May 11). Massive Online Analysis, [Online]. Available: http://moa.cms.waikato.ac.nz(downloaded 2016, Jun. 6)
  17. A. Bifet and R. Gavalda, "Adaptive Learning from Evolving Data Streams," Proc. of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis, pp. 249-260, 2009.
  18. G. Hulten, L. Spencer, and P. Domingos, "Mining Time-Changing Data Streams," Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97-106, 2001.
  19. J. D. Gibbons and S. Chakraborti, Nonparametric statistical inference, 5th Ed, pp. 157, Chapman & Hall/CRC Press, U. S. A, 2010.
  20. M. Harries, "Splice-2 comparative evaluation: electricity pricing," Technical report, UNSW-CSE-TR-9905 of The University of New South Wales, Jul. 1999.
  21. J. A. Blackard(1998, Aug 1). Forest Covertype. [Online]. Available: http://archive.ics.uci.edu/ml/machine-learning-databases/covtype/covtype.info(dow nloaded 2016, Nov. 20)
  22. R. Cattral(2007, Jan 1). Poker Hand. [Online]. Available: http://archive.ics.uci.edu/ml/machine-learning-databases/poker/poker-hand.names(downloaded 2016, Nov. 20)
  23. X. Zhu(2010). Sensor Stream, [Online]. Available: http://www.cse.fau.edu/-xqzhu/stream.html(downloaded 2016, Nov. 20)