DOI QR코드

DOI QR Code

Bias Reduction in Split Variable Selection in C4.5

  • Published : 2003.12.01

Abstract

In this short communication we discuss the bias problem of C4.5 in split variable selection and suggest a method to reduce the variable selection bias among categorical predictor variables. A penalty proportional to the number of categories is applied to the splitting criterion gain of C4.5. The results of empirical comparisons show that the proposed modification of C4.5 reduces the size of classification trees.

Keywords

References

  1. 응용통계연구 v.14 데이터마이닝 패키지에서 변수선택 편의에 관한 연구 송문섭;윤영주
  2. UCI repository of machine learning databases (http://www.ics.uci.edu/~mlearn/MLRepository.html) Blake,C.L.;Merz,C.J.
  3. Classification and Regression Trees Breiman,L.;Friedman,J.H.;Olshen,R.A.;Stone,C.J.
  4. Proceedings of the Seventeenth International Conference on Machine Learning Bias correction in classification tree construction Dobra,A.;Gehrke,J.
  5. The Korean Communications in Statistics v.10 Input variable importance in supervised learning models Huh,M.H.;Lee,Y.G. https://doi.org/10.5351/CKSS.2003.10.1.239
  6. Journal of the American Statistical Association v.96 Classification trees with unbiased multiway splits Kim,H.;Loh,W.Y. https://doi.org/10.1198/016214501753168271
  7. The Korean Communications in Statistics v.9 A study on unbiased methods in constructing classification trees Lee,Y.M.;Song,M.S. https://doi.org/10.5351/CKSS.2002.9.3.809
  8. Statistica Sinica v.7 Split selection methods for classification trees Loh,W.Y.;Shih,Y.S.
  9. Programs for Machine Learning Quinlan,J.R.
  10. Journal of Artificial Intelligence Research v.4 Improved use of continuous attributes in C4.5 Quinlan,J.R.

Cited by

  1. Prediction of Product Life Cycle Using Data Mining Algorithms : A Case Study of Clothing Industry vol.40, pp.3, 2014, https://doi.org/10.7232/JKIIE.2014.40.3.291