Pattern Selection Using the Bias and Variance of Ensemble

앙상블의 편기와 분산을 이용한 패턴 선택

  • Shin, Hyunjung (Department of Industrial Engineering, Seoul National University) ;
  • Cho, Sungzoon (Department of Industrial Engineering, Seoul National University)
  • 신현정 (서울대학교 산업공학과) ;
  • 조성중 (서울대학교 산업공학과)
  • Published : 2002.03.31

Abstract

A useful pattern is a pattern that contributes much to learning. For a classification problem those patterns near the class boundary surfaces carry more information to the classifier. For a regression problem the ones near the estimated surface carry more information. In both cases, the usefulness is defined only for those patterns either without error or with negligible error. Using only the useful patterns gives several benefits. First, computational complexity in memory and time for learning is decreased. Second, overfitting is avoided even when the learner is over-sized. Third, learning results in more stable learners. In this paper, we propose a pattern 'utility index' that measures the utility of an individual pattern. The utility index is based on the bias and variance of a pattern trained by a network ensemble. In classification, the pattern with a low bias and a high variance gets a high score. In regression, on the other hand, the one with a low bias and a low variance gets a high score. Based on the distribution of the utility index, the original training set is divided into a high-score group and a low-score group. Only the high-score group is then used for training. The proposed method is tested on synthetic and real-world benchmark datasets. The proposed approach gives a better or at least similar performance.

Keywords

References

  1. Bishop, C. M. (1995), Neural Networks For Pattern Recognition, Oxford University Press, New York, 386-439
  2. Breiman, L. (1996a), Bagging Predictors, Machine Learning, 24, 123-140
  3. Breiman, L. (1996b), Bias, Variance, and Arcing Classifiers, Technical Report 460, Department of Statistics, University of California, Berkeley, CA.
  4. Burges, C.J.C (1998), A Tutorial on Support Vector Machines for Pattern Recognition , Data Mining and Knowledge Discovery, 2, 121-167 https://doi.org/10.1023/A:1009715923555
  5. Cachin, C. (1994), Pedagogical Pattern Selection Strategies, Neural Networks, 7(1), 175-181 https://doi.org/10.1016/0893-6080(94)90066-3
  6. Cho, S. and Wong, P.M. (1999), Data Selection based on Bayesian Error Bar, The Six International Conference on Neural Information Processing, 1, 418-422
  7. Drucker, E. (1997), Improving Regressors Using Boosting Techniques, The Fourteenth International Conference on Machine Learning, 107-115
  8. Drucker, E. (1999), Boosting Using Neural Networks, In Amanda J. C. Sharkey (Eds), Combining Artificial Neural Nets :Ensemble and Modular Learning, Springer-Verlag, 51-77
  9. Foody, G. M. (1999), The Significance of Border Training Patterns in Classification by a Feedforward Neural Network Using Back Propagation Learning, International Journal of Remote Sensing, 20(18), 3549-3562 https://doi.org/10.1080/014311699211192
  10. Freund, Y., Schapire, R. E. (1997), A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting, Journal of Computer and System Sciences, 55(1), 119-139 https://doi.org/10.1006/jcss.1997.1504
  11. Gunn, S. (1998), Support Vector Machines for Classification and Regression, ISIS Technical Report
  12. Hara, K. and Nakayama, K. (2000), A Training Method with Small Computation for Classification, Proceedings of the IEEE-INNS-ENNS International Joint Conference, 3, 543-548
  13. Haykin, S. (1999), Neural Networks: A Comprehensive Foundation, Macmilan, New York, 351-390
  14. Hearst. M. A. (1998), Support Vector Machines, IEEE INTEL/LIGENT SYSTEM. 167-179
  15. Krogh, A. and Vedelsby, J. (1995), Neural Network Ensembles, Cross Validation, and Active Learning,.. In: Tesauro, G., Touretzky, D. S. and Leen. T. K. (Eds), Advances in Neural Information Processing Systems 7, Cambridge, MA: MIT Press, 231-238
  16. Kwok, J. T. (1999), Moderating the Optputs of Support Vector Machine Classifiers, IEEE Transactions on Neural Networks, 10(5), 1018-1031 https://doi.org/10.1109/72.788642
  17. Lee, C. and Landgrebe, D. A. (1997), Decision Boundary Feature Extraction for Neural Networks, IEEE Transactions on Neural Networks, 8(1), 75-83 https://doi.org/10.1109/72.554193
  18. Leisch, F., Jain, L. C. and Hornik, K. (1998), Cross-Validation with Active Pattern Selection for Neural-Network Classifiers, IEEE Transactions on Neural Networks, 9, 35-41
  19. Mackay, D. J. C. (1992), Bayesian Interpolation, Neural Computation, 4, 415-447
  20. Mitchell, T. M. (1997), Machine Learning, McGRAW-HILL International Editions (Computer Science Series), 81-127
  21. Parmanto, B., Munro, P. W. and Doyle, H. R.(1996), Reducing Variance of Committee Prediction with Resampling Techniques, Connetion Science, 8, 405-425
  22. Perrone, M. P. (1993a), Improving Regression Estimation: Averaging Methods for Variance Reduction with &tension to General Convex Measure Optimization, PhD Thesis, Department of Physics, Brown University, Providence, RI
  23. Perrone, M. P. and Cooper, L. N. (1993b), When Networks Disagree: Ensemble Methods for Hybrid Neural Networks, Artificial Neural Networks for Speech and Vision, Chapman and Hall, London
  24. Plutowski, M. and White, H. (1993), Selecting Concise Training Sets from Clean Data, IEEE Transactions on Neural Networks, 4(2), 305-318 https://doi.org/10.1109/72.207618
  25. Plutowski, M. (1994), Selecting Training Examplars for Neural Network Learning, Ph.D. Dissertation, Univ. California, San Diego
  26. Qu, D., Wong, P. M., Cho, S. and Gedeon, T. D. (2001), A Hybrid Intelligent System for Improved Petrophysical Predictions, to appear in ICONIP proceedings
  27. Robel, A. (1994), The Dynamic Pattern Selection Algorithm: Effective Training and Controlled Generalization of Back Propagation Neural Networks, Technische Univ. Berlin, Germany, Technical Report
  28. Sharkey, A. J. C. (1996), On Combining Artificial Neural Nets, Connection Science, 8, 299-313
  29. Sharkey, A. J. C. (1997), Combining Diverse NeuraI Nets, The Knowledge Engineering Review, 12(3), 231-247 https://doi.org/10.1017/S0269888997003123
  30. Tumer, K. and Ghosh, J. (1996), Error. Correlation and Error Reduction in Ensemble Classifiers, Connection Science, 8, 385-404
  31. UCI Repository Of Machine Learning Databases, http://www.ics.uci.edu/~mlearn
  32. Vincent, P. and Bengio, Y. (2000), A Neural Support Vector Network Architecture with Adaptive Kernels, IEEE Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, 187-192
  33. Zhang, B. T. (1993), Learning by incremental Selection of Critical Examples, Arbeitspaper der GMD, No. 735, German National Research Center for Computer Science (GMD), St Augustin/Bonn
  34. Zhang, B. T. (1994), Accelerated Learning by Active Example Selection, Incremental Journal of Neural Systems, 5(1),67-75 https://doi.org/10.1142/S0129065794000086