A Decision Tree Induction using Genetic Programming with Sequentially Selected Features

순차적으로 선택된 특성과 유전 프로그래밍을 이용한 결정나무

  • Published : 2006.05.01

Abstract

Decision tree induction algorithm is one of the most widely used methods in classification problems. However, they could be trapped into a local minimum and have no reasonable means to escape from it if tree algorithm uses top-down search algorithm. Further, if irrelevant or redundant features are included in the data set, tree algorithms produces trees that are less accurate than those from the data set with only relevant features. We propose a hybrid algorithm to generate decision tree that uses genetic programming with sequentially selected features. Correlation-based Feature Selection (CFS) method is adopted to find relevant features which are fed to genetic programming sequentially to find optimal trees at each iteration. The new proposed algorithm produce simpler and more understandable decision trees as compared with other decision trees and it is also effective in producing similar or better trees with relatively smaller set of features in the view of cross-validation accuracy.

Keywords

References

  1. Aha, D.W. and R.L. Bankert, 'A Comparative Evaluation of Sequential Feature Selection Algorithms,' In Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, 1995, pp.1-7
  2. Alumuallim, H. and T.G. Ditterich, 'Learning with many Irrelevant Features,' In Proceedings of Ninth National Conference on Artificial Intelligence, MIT Press, (1991), pp.542-547
  3. Bot M.C.J. and W.B. Longdon, 'Application of Genetic Programming to Induction of Linear Classification Trees,' European Conference on Genetic Programming EuroGP2000, Lecture Notes in Computer Science 1802, (2000), pp.247-258
  4. Breiman, L., J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees, Chapman & Hall/CRC, 1998
  5. Caruana, R. and D. Freitag, 'Greedy Attribute Selection,' In Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufmann, (1994), pp. 28-36
  6. Cherkauer, K.J. and J.W. Shavilik, 'Growing Simpler Decision Trees to Facilitate Knowledge Discovery,' Machine Learning: In Proceedings of the second International Conference on Knowledge and Data Mining, AAAI press, San Mateo, (1996), pp. 315-318
  7. Fu, Z., 'A Computational Study of using Genetic Algorithms to Develop Intelligent Decision Trees,' Proceedings of the 2001 Congress on Evolutionary Computation, Seoul, South Korea, (2001), pp.1382-1387
  8. Hall, M., 'Correlation-based Feature Selection of Discrete and Numeric Class Machine Learning,' In Proceedings of the International Conference on Machine Learning, Morgan Kaufmann, San Francisco, (2000), pp.359-366
  9. Holmes, G. and C.G. Nevill-Manning, 'Feature Selection Via the Discovery of Simple Classification Rules,' In Proceedings of th Symposium on Intelligent Data Analysis, Baden-Baden, Germany, August, 1995
  10. John, G.H., R. Kohavi, and P. Pfleger, 'Irrelevant Features and Subset Selection Problem,' In Machine Learning:Proceedings of the Eleventh International Conference, Morgan Kaufmann, (1994), pp.121-129
  11. Kohavi, R. and G. John, 'Wrapper for Feature Subset Selection,' In Artificial Intelligence, Vol.97, No.1-2(1998), pp.273-324
  12. Koller, D. and M. Sahami, 'Hierarchically Classifying Documents using very Few Words,' In Machine Learning:Proceedings of the Fourteenth International Conference, Morgan Kaufmann, (1997), pp.170-178
  13. Kononenko, I., 'Estimating Attributes:Analysis and Extension of Relief,' In Proceedings of the European Conference on Machine Learning, (1994), pp.171-182
  14. Kononenko, I. and E. Simec, 'Induction of Decision Trees using RELIEFF,' In: Kruse, R., Viertl, R., Riccia, G. Della (eds.), CISM Lecture Notes, Springer Verlag, (1994), pp.199-220
  15. Koza, J.R., 'Concept Formation and Decision tree Induction using the Genetic Programming Paradigm,' Parallel Problem Solving from Nature, Berlin:Springer-Verlag, (1991), pp.124-128
  16. Koza, J. R., Genetic Programming, MIT press, 1992
  17. Lee, S. and M.Y. Huh, 'A Measure of Association for Complex Data,' Computational Statistics and Data Analysis, Vol.44, No.1-2(2003), pp.211-222 https://doi.org/10.1016/S0167-9473(03)00031-8
  18. Murthy, S.K., 'Automatic Construction of Decision Trees from Data:A Multidisciplinary Survey,' In Data Mining and Knowledge Discovery, No.2(1998), pp.345-389
  19. Papagelis, A. and D. Kalles, 'Breeding Decision Trees using Evolutionary Techniques,' ICML, (2001), pp.393-400
  20. Pfahringer, B., 'Compression-based Feature Subset Selection,' In Proceedings of the IHCAI-95 Workshop on Data Engineering for Inductive Learning, (1995), pp.109-119
  21. Quinlan, J.R., C4.5:Programs for Machine Learning, San Mateo, CA:Morgan Kaufmann, 1993
  22. Setiono, R. and H. Liu, 'Chi2:Feature Selection and Discretization of Numeric Attributes,' In Proceedings of the Seventh IEEE International Conference on Tools with Artificial Intelligence, (1995), pp.388-391
  23. Soule, T., 'Code Growth in Genetic Programming,' PhD thesis, University of Idaho, Moscow, Idaho, USA, 1998
  24. Vafail, H. and K. De Jong, 'Genetic Algorithms as a Tool for Restructuring Feature Space Representations,' In Proceedings of the International Conference on Tools With A. I., IEEE Computer Society Press, 1995
  25. Witten, I.H. and F. Eibe, Data Mining, Morgan and Kaufmann, 1990
  26. http://www.cs.ucl.ac.uk/external/A.Qureshi/gpsys_doc.html
  27. http://www.cs.waikato.ac.nz/ml/weka
  28. http://www.ics.uci.edu/~mlean/MLRepository.html
  29. http://www.r-project.org