Note on classification and regression tree analysis

분류와 회귀나무분석에 관한 소고

  • Published : 2002.03.01

Abstract

The analysis of large data sets with hundreds of thousands observations and thousands of independent variables is a formidable computational task. A less parametric method, capable of identifying important independent variables and their interactions, is a tree structured approach to regression and classification. It gives a graphical and often illuminating way of looking at data in classification and regression problems. In this paper, we have reviewed and summarized tile methodology used to construct a tree, multiple trees and the sequential strategy for identifying active compounds in large chemical databases.

Keywords

References

  1. 강현철 등(1999), '데이터마이닝, 방법론 및 활용' , 자유아카데미
  2. 임용빈, 이소영, 정종희(2001), '대용량 화학 데이터 베이스를 선별하기 위한 결합다중회귀나무 예측치', '응용통계연구' , 14권(1호), PP. 91-101
  3. Abt, M., Lim, Y.B., Sacks, J., Xie, M. and Young, S. (2001), A sequential approach for identifying lead compounds in large chemical databases, Accepted for publication in Statistical Sciences
  4. Breiman, L.(1996). Bagging predictors, Machine Learning, vol. 26, No. 2, 123-140
  5. Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984). Classification and regression trees, Chapman and Hall, Belmont, CA, Wadsworth
  6. Breiman L, (1997). Arcing Classifiers. ftp://ftp.stat.berkeley.edu pub/breiman/ arc97.ps
  7. Freund, Y. and Schapire,R. (1996). Experiments with a new boosting algorithm, Machine Learning: Proceedings of the Thirteenth International Conference, July, 1996
  8. Kass, G. (1980). An exploratory technique for investigating large quantities of categorical data, Applied Statistics, vol. 29, 119-127 https://doi.org/10.2307/2986296
  9. Kay Tatsuoka, Chong Gu, Jerome Sacks and S. Stanley Young (1999). Prediction Extreme Values in Large Datasets, Accepted for publication in J. Compt. Graph. Statist
  10. Kwok, S. and Carter, C. (1990). Multiple decision trees, Uncertainty in Artifical Intelligence, vol. 4, 327-335
  11. Quinlan, J.R. (1993). C4.5 Programs for machine learning. San Mateo: Morgan Kaufmann
  12. Rusinko, A., Farmen, M., Lambert, C. Brown, P., Yound, S. (1999), Analysis of a large structure/biological activity data set using recursive partitoning, J. Amer. Chem. Soc.. vol. 40. 1017-1026