A Hybrid Data Mining Technique Using Error Pattern Modeling

오차 패턴 모델링을 이용한 Hybrid 데이터 마이닝 기법

  • 허준 (SPSS Korea (주)데이터솔루션) ;
  • 김종우 (한양대학교 경영대학 경영학부)
  • Published : 2005.12.01

Abstract

This paper presents a new hybrid data mining technique using error pattern modeling to improve classification accuracy when the data type of a target variable is binary. The proposed method increases prediction accuracy by combining two different supervised learning methods. That is, the algorithm extracts a subset of training cases that are predicted inconsistently by both methods, and models error patterns from the cases. Based on the error pattern model, the Predictions of two different methods are merged to generate final prediction. The proposed method has been tested using practical 10 data sets. The analysis results show that the performance of proposed method is superior to the existing methods such as artificial neural networks and decision tree induction.

Keywords

References

  1. 강문식, 이상용, '데이터 마이닝을 위한 경쟁학습모델과 BP알고리즘을 결합한 하이브리드 신경망', 정보기술과 데어터베이스 저널, 제9권 2호(2002), pp.1-16
  2. 김진성, '연관규칙과 퍼지 인공신경망에 기반한 하이브리드 데이터 마이닝 메커니즘에 대한 연구', 한국경영과학회/대한산업공학회 2003 춘계 공동학술대회 논문집, (2003), pp.226-228
  3. 신현정, '앙상블 학습알고리즘의 일반화 성능비교 : OLA, Bagging, Boosting', 정보과학회논문지, 제97호(2000),pp.226-228
  4. 이군희, '모형평가와 앙상블을 이용한 데이터 마이닝에 관한 연구', 서강경영논총, 제9권(1998), pp.293-306
  5. 이극노, 이홍철, '이동통신고객 분류를 위한 의사결정나무(C4.5)와 신경망 결합 알고리즘 연구', 한국지능정보시스템학회지, 제9권, 제1호(2003), pp.139-155
  6. 이재식, 이진천, '입력자료 판별에 의한 데이터마이닝 성능개선', 한국지능정보학회학술대회, (2000), pp.293-303
  7. 허명희, 'Clementine Stream Prototypes : Part 2', SPSS KoreaWhitepaper, (2004), pp.1-7
  8. 허명희, Clementine Ver. 8 User's Guide, SPSS Inc, 2003
  9. Ali, K. and M. Pazzani, 'Error Reduction through Learning Multiple Descriptions,' Machine Learning, Vol.24, No.1(1996), pp. 105-112
  10. Brieman, L., 'Bagging Predictors,' Machine Learning, Vol.24, No.2(1996), pp.123-140
  11. Carvalho, D.R. and A.A. Freitas 'Hybrid Decision Tree/Genetic Algorithm Method for Data Mining,' Information Sciences, Vol.163, No.1/3(2004), pp.13-35 https://doi.org/10.1016/j.ins.2003.03.013
  12. Coenen, F.G., K.V. Swinnen and G. Wets 'The Improvement of Response Modeling : Combining Rule-induction and Case-based Reasoning,' Expert Systems with Application, Vol.18, No.4(2000), pp.307-313 https://doi.org/10.1016/S0957-4174(00)00012-9
  13. Conversano, C., R Siciliano and F. Mola, 'Generalized Additive Multi-mixture Model for Data Mining,' Computational Statistics & DataAnalysis, Vol.38, No.4(2002), pp.487-500 https://doi.org/10.1016/S0167-9473(01)00074-3
  14. Freund, Y. and R.E. Schapire, 'Experiments with a New Boosting Algorithm,' Proceedings of 13th International Corference on Machine Learning, Morgan Kaufmann(1996), pp.148-156
  15. Gama, Joao Maguel Portela da, Combining Classification Algorithms, Departamento de Ciecia de Computadores Faculdade de Ciecias da Universidade do Porto, 1999
  16. Grzymala-Busse, J.W., 'A Comparison of Three Strategies to Rule Induction from Data with Numerical Attributes,' Electronic Notes in Theoretical Computer Science, Vol.82, No.4(2003), pp.1-9
  17. Hansen, L.K. and P. Salaman, 'Neural Networks Ensembles,' Transactions on Pattern Analysis and Machine Intelligence, Vol.12, No.10(1990), pp.993-1001 https://doi.org/10.1109/34.58871
  18. Hsu, P.L., R Lai, C.C. Chui, and C.I. Hsu, 'The Hybrid of Association Rule Algorithms and Genetic Algorithm for Tree Induction : An Example of Predicting the Student Course Performance,' Expert Systems with Application, Vol.25, No.1(2003), pp.51-62 https://doi.org/10.1016/S0957-4174(03)00005-8
  19. Indurkhya, N. and S.M. Weiss, 'Estimating Performance Gains for Voted Decision Trees,' Intelligent Data Analysis, Vol.2, No.1/4(1998), pp.303-310 https://doi.org/10.1016/S1088-467X(98)00028-6
  20. Kuncheva, L.I.C. Bezdek, and M.A. Shutton, 'On Combining Multiple Classifiers by Fuzzy Templates,' International Coriference on Artificial Neural Networks IEEE, (1998) pp.193-197
  21. Li, R, and Z.-O. Wang, 'Mining Classification Rules Using Rough Sets and Neural Networks,' European Journal of Operational Research, Vol.157, No.2(2004), pp, 439-448 https://doi.org/10.1016/S0377-2217(03)00422-3
  22. Lin, F.Y. and S. McClean, 'A Data Mining Approach to the Prediction of Corporate Failure,' Knowledge-Based Systems, Vol. 14, No.3/4(2001), pp.189-195 https://doi.org/10.1016/S0950-7051(01)00096-X
  23. Michie D., D,J. Spiegelhalter, and C. Taylor, Machine Learning, Neural and Statistical Classification, Ellis Horwood, 1994
  24. Quinlan, R, 'Bagging, Boosting and C4.5,' Procs. 13th American Association for Artificial Intelligence, AAAl Press, 1996
  25. Schapire, R, 'The Strength of Weak Learnerbility,' Machine Learning, Vol.5, No.2 (1990), pp.197-227
  26. Schapire, R, Y. Freund, P. Bartlett, and W.S. Lee, 'Boosting the Margin : A New Explanation for theEffectiveness of Voting Methods,' Proceedings of the 14th International Conference on Machine Learning, Morgan Kaufmann, (1998), pp.32Z-330
  27. Sub, E.H, K.C. Noh and CK Sub, 'Customer List Segmentation Using the Combined Response Model,' Expert Systems with Application, Vol.17, No.2(1999), pp.89-97 https://doi.org/10.1016/S0957-4174(99)00026-3
  28. Versace, M., R Bhatt, O. Hinds and M. Shiffer, 'Predicting the Exchange Traded Fund DIA with a Combination of Genetic Algorithm and Neural Networks,' Expert Systems with Application, Vol.27, No.3(2004), pp.417-425 https://doi.org/10.1016/j.eswa.2004.05.018
  29. Zhou, Z.-H., J. Wu and W. Tang, 'Ensembling Neural Networks: Many Could Be Better Than All,' Artificial Intelligence, Vol.137, No.1/2(2002), pp.239-263 https://doi.org/10.1016/S0004-3702(02)00190-X