DOI QR코드

DOI QR Code

Application of an Adaptive Incremental Classifier for Streaming Data

스트리밍 데이터에 대한 적응적 점층적 분류기의 적용

  • 박정희 (충남대학교 컴퓨터공학과)
  • Received : 2016.09.01
  • Accepted : 2016.10.04
  • Published : 2016.12.15

Abstract

In streaming data analysis where underlying data distribution may be changed or the concept of interest can drift with the progress of time, the ability to adapt to concept drift can be very powerful especially in the process of incremental learning. In this paper, we develop a general framework for an adaptive incremental classifier on data stream with concept drift. A distribution, representing the performance pattern of a classifier, is constructed by utilizing the distance between the confidence score of a classifier and a class indicator vector. A hypothesis test is then performed for concept drift detection. Based on the estimated p-value, the weight of outdated data is set automatically in updating the classifier. We apply our proposed method for two types of linear discriminant classifiers. The experimental results on streaming data with concept drift demonstrate that the proposed adaptive incremental learning method improves the prediction accuracy of an incremental classifier highly.

시간이 흐름에 따라 데이터 분포가 변하거나 관심 개념이 달라질 수 있는 스트리밍 데이터 분석에서 개념 변화에 적응해 나갈 수 있는 능력은 점층적 학습 과정에서 매우 중요하다. 이 논문에서는 개념 변화를 가진 스트리밍 데이터에서 적응적 점층적 분류기를 위한 일반화된 프레임워크를 제안한다. 분류기에 의해 예측되는 신뢰도 벡터와 클래스 라벨 벡터 사이의 거리를 이용하여 분류기 성능 패턴을 나타내는 분포를 구성하고 컨셉 변화에 대한 가설 검정을 수행한다. 추정되는 p-값을 이용하여 오래된 데이터에 대한 가중치를 자동으로 조정하여 분류기 업데이트에 이용한다. 제안된 방법을 두 가지 타입의 선형 판별 분류기에 적용한다. 컨셉 변화를 가진 스트리밍 데이터에 대한 실험 결과는 제안하는 적응적 점층적 학습 방법이 점층적 분류기의 예측 정확도를 크게 향상시킴을 입증한다.

Keywords

Acknowledgement

Supported by : 충남대학교

References

  1. P. Domingos and G. Hulten, Mining high-speed data streams, Proc. of KDD, 2000.
  2. G. Hulton, L. Spencer and P. Domingos, Mining time-changing data streams, Proc. of KDD, 2001.
  3. S. Nishimura, M.Terabe, K. Hashimoto, K. Mihara, Learning higher accuracy decision trees from concept drifting data streams, LNAI 5027, pp. 179-188, 2003.
  4. H. Wang, W. Fan, P. Yu and J. Han, Mining concept- drifting data streams using ensemble classifiers, Proc. of KDD, 2003.
  5. A. Bifet and R. Gavalda, Learning from timechanging data with adaptive windowing, Proc. of SDM, 2007.
  6. J. Z. Kolter and M. A. Malloof, Dynamic weighted majority : An ensemble method for drifting concepts, Journal of machine learning research 8, pp. 2755-2790, 2007.
  7. A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby and R. Gavalda, New ensemble methods for evolving data streams, Proc. of KDD, 2009.
  8. H. Zhao and P. Yuen, Incremental linear discriminant analysis for face recognition, IEEE transactions on systems, man, and cybernetics-part B:cybernetics, Vol. 38, No. 1, pp. 210-221, 2008. https://doi.org/10.1109/TSMCB.2007.908870
  9. L. Liu, Y. Jiang and Z. Zhou, Least square incremental linear discriminant analysis, Proc. of ICDM, 2009.
  10. T. K. Kim, B. Stenger, J. Kittler and R. Cipolla, Incremental linear discriminant analysis using sufficient spanning sets and its applications, International journal of computer vision 91, pp. 216-232, 2011. https://doi.org/10.1007/s11263-010-0381-3
  11. Y. Yeh and Y. Wang, A rank-one update method for least squares linear discriminant analysis with concept drift, Pattern recognition 46, pp. 1267-1276, 2013. https://doi.org/10.1016/j.patcog.2012.11.008
  12. L.I. Kuncheva and C.O. Plumpton, Adaptive learning rate for online linear discriminant classifiers, LNCS 5342 pp. 510-519, 2008.
  13. L. Rutkowski, M. Jaworski, L. Pietruczuk and P. Duda, Decision trees for mining data streams based on the gaussian approxiamtion, IEEE transactions on Knowledge and Data Engineering 26, pp. 108- 119, 2014. https://doi.org/10.1109/TKDE.2013.34
  14. P. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison Wesley, Boston, 2006.
  15. Y. Law and C. Zaniolo, An adaptive nearest neighbor classification algorithm for data streams, LNAI 3721, pp. 108-120. 2005.
  16. R. Klinkenberg, Detecting concept drift with support vector machines, Proc. of ICML, 2000.
  17. J. Gama, P. Medas and P. Rodrigues, Learning with drift detection, Proc. of SBIA Brazilian Symposium on Artificial Intelligence, 2004.
  18. M. Baena-Garcia, J. Campo-Avilla, R. Fidalgo, A. Bifet, R. Gavalda, R. Moales-Bueno, Early drift detection method, in: proceedings of ECML PKDD 2006 Workshop on Knowledge Discovery from Data Streams, 2006.
  19. G. Ross, N. Adams, D. Tasoulis and D.Hand, Exponentially weighted moving average charts for detecting concept drift, Pattern recognition letters 33, pp. 191-198, 2012. https://doi.org/10.1016/j.patrec.2011.08.019
  20. R. Duda, P. Hart and D. Stork, Pattern Classification, Wiley-Interscience, New York, 2001.
  21. W. Hager, Updating the inverse of a matrix, SIAM review 31(2), pp. 221-239, 1989. https://doi.org/10.1137/1031049
  22. A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, Moa: Massive online analysis, Journal of Machine Learning Research 11, pp. 1601-160, 2010.
  23. Splice-2 comparative evaluation: Electricity pricing, technical report, UNSW-CSE-TR-9905 of The University of New South Wales, 1999.
  24. J. Blackard and D. Dean, Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables, Computers and Electronics in Agriculture Vol. 24, No. 3, pp. 131-151, 1999. https://doi.org/10.1016/S0168-1699(99)00046-0
  25. A. Bifet and R. Gavalda, Adaptive learning from evolving data stream, Proc. of IDA, 2009.