DOI QR코드

DOI QR Code

A Study on Selection of Split Variable in Constructing Classification Tree

의사결정나무에서 분리 변수 선택에 관한 연구

  • 정성석 (전북대학교 통계정보과학과) ;
  • 김순영 (전북대학교 통계정보과학) ;
  • 임한필 (전북대학교 통계정보과학과)
  • Published : 2004.07.01

Abstract

It is very important to select a split variable in constructing the classification tree. The efficiency of a classification tree algorithm can be evaluated by the variable selection bias and the variable selection power. The C4.5 has largely biased variable selection due to the influence of many distinct values in variable selection and the QUEST has low variable selection power when a continuous predictor variable doesn't deviate from normal distribution. In this thesis, we propose the SRT algorithm which overcomes the drawback of the C4.5 and the QUEST. Simulations were performed to compare the SRT with the C4.5 and the QUEST. As a result, the SRT is characterized with low biased variable selection and robust variable selection power.

의사결정나무에서 분리 변수를 선택하는 것은 매우 중요한 일이다. C4.5는 변수 선택에 있어 연속형 변수로의 변수 선택 편의가 심각하고, QUEST는 연속형 변수와 관련해서 정규성 가정이 위반될 경우 변수 선택력이 떨어진다. 본 논문에서는 통계적 로버스트 검정 알고리즘을 제안하고, 모의 실험을 통하여 C4.5, QUEST그러고 제안된 알고리즘의 효율성을 비교하였다. 실험 결과 제안된 알고리즘이 변수 선택 편의와 변수 선택력 측면에서 로버스트함을 알 수 있었다.

Keywords

References

  1. 서울대학교 박사학위논문 A study on bias problems in constructing classification trees 이윤모
  2. Classification and Regression Trees Breiman, L.;Friedman, J. H.;Olshen, R. A.;Stone, C. J.
  3. Applied Statistics v.29 An Exploratory technique for investigating large quantities of categorical data Kass, G. V. https://doi.org/10.2307/2986296
  4. Ph.D. Thesis, University of Wisconsin Multiway Split Classification Trees Kim, H.
  5. Journal of the American Statistical Association v.96 Classification trees with unbiased multiway splits Kim, H.;Loh, W. Y. https://doi.org/10.1198/016214501753168271
  6. Statistica Sinica v.7 Split selection method for classification trees Loh, W. Y.;Shih, Y. S.;
  7. Journal of the American Statistical Association v.83 Tree-structured classification via generalized discriminant analysis (with discussion) Loh, W. Y.;Vanichsetakul, N. https://doi.org/10.2307/2289295
  8. C4.5 : Programs for Machine Learning Quinlan, J. R.
  9. Journal of Artificial Intelligence Research v.4 Improved use of continuous attribute in C4.5 Quinlan, J. R.
  10. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations Witten, I. H.;Frank, E.