DOI QR코드

DOI QR Code

Improving an Ensemble Model Using Instance Selection Method

사례 선택 기법을 활용한 앙상블 모형의 성능 개선

  • Min, Sung-Hwan (Department of Business Administration, Hallym University)
  • Received : 2016.02.22
  • Accepted : 2016.03.14
  • Published : 2016.03.31

Abstract

Ensemble classification involves combining individually trained classifiers to yield more accurate prediction, compared with individual models. Ensemble techniques are very useful for improving the generalization ability of classifiers. The random subspace ensemble technique is a simple but effective method for constructing ensemble classifiers; it involves randomly drawing some of the features from each classifier in the ensemble. The instance selection technique involves selecting critical instances while deleting and removing irrelevant and noisy instances from the original dataset. The instance selection and random subspace methods are both well known in the field of data mining and have proven to be very effective in many applications. However, few studies have focused on integrating the instance selection and random subspace methods. Therefore, this study proposed a new hybrid ensemble model that integrates instance selection and random subspace techniques using genetic algorithms (GAs) to improve the performance of a random subspace ensemble model. GAs are used to select optimal (or near optimal) instances, which are used as input data for the random subspace ensemble model. The proposed model was applied to both Kaggle credit data and corporate credit data, and the results were compared with those of other models to investigate performance in terms of classification accuracy, levels of diversity, and average classification rates of base classifiers in the ensemble. The experimental results demonstrated that the proposed model outperformed other models including the single model, the instance selection model, and the original random subspace ensemble model.

Keywords

References

  1. Abellan, J. and Mantas, C.J., Improving Experimental Studies about Ensembles of Classifiers for Bankruptcy Prediction and Credit Scoring, Expert Systems with Applications, 2014, Vol. 41, No. 8, pp. 3825-3830. https://doi.org/10.1016/j.eswa.2013.12.003
  2. Altman, E.L., Financial ratios, discriminant analysis and the prediction of corporate bankruptcy, The Journal of Finance, 1968, Vol. 23, No. 4, pp. 589-609. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
  3. Beaver, W., Financial ratios as predictors of failure, empirical research in accounting : Selected studied, Journal of Accounting Research, 1966, Vol. 4, No. 3, pp. 71-111. https://doi.org/10.2307/2490171
  4. Bian, S. and Wang, W., On diversity and accuracy of homogeneous and heterogeneous ensembles, International Journal of Hybrid Intelligent Systems, 2007, Vol. 4, No. 2, pp. 103-128. https://doi.org/10.3233/HIS-2007-4204
  5. Breiman, L., Bagging predictors, Machine Learning, 1996, Vol. 24, No. 2, pp. 123-140. https://doi.org/10.1023/A:1018054314350
  6. Bryant, S.M., A case-based reasoning approach to bankruptcy prediction modeling, International Journal of Intelligent Systems in Accounting, Finance and Management, 1997, Vol. 6, No. 3, pp. 195-214. https://doi.org/10.1002/(SICI)1099-1174(199709)6:3<195::AID-ISAF132>3.0.CO;2-F
  7. Derrac, J., Cornelis, C., Garcia, S., and Herrera, F., Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection, Information Sciences, 2012, Vol. 186, No. 1, pp. 73-92. https://doi.org/10.1016/j.ins.2011.09.027
  8. Dietterich, T.G., Machine-learning research : Four current directions, AI Magazine, 1997, Vol. 18, No. 4, pp. 97-136.
  9. Freund, Y. and Schapire, R., Experiments with a new boosting algorithm, Proceedings of the 13th, International Conference on Machine learning, 1996, pp. 148-156.
  10. Garcia, V., Marques, A.I., and Sanchez, J.S., On the use of data filtering techniques for credit risk prediction with instance-based models, Expert Systems with Applications, 2012, Vol. 39, No. 18, pp. 13267-13276. https://doi.org/10.1016/j.eswa.2012.05.075
  11. Goldberg, D.E., Genetic algorithms in search, optimization and machine learning, New York : Addison-Wesley, 1989.
  12. Hart, P.E., The condensed nearest neighbor rule, IEEE Transactions on Information Theory, 1968, Vol. 14, pp. 515-516. https://doi.org/10.1109/TIT.1968.1054155
  13. Ho, T.K., The random subspace method for constructing decision forests, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, Vol. 20, No. 8, pp. 832-844. https://doi.org/10.1109/34.709601
  14. Hung, C. and Chen, J.-H., A Selective Ensemble Based on Expected Probabilities for Bankruptcy Prediction, Expert Systems with Applications, 2009, Vol. 36, No. 3, pp. 5297-5303. https://doi.org/10.1016/j.eswa.2008.06.068
  15. Kim, K.-J. and Ahn, H., Optimization of Support Vector Machines for Financial Forecasting, Journal of Intelligence and Information Systems, 2011, Vol. 17, No. 4, pp. 241-254.
  16. Kim, M. and Kang, D., Ensemble with neural networks for bankruptcy prediction, Expert System with Applications, 2010, Vol. 37, No. 4, pp. 3373-3379. https://doi.org/10.1016/j.eswa.2009.10.012
  17. Kim, M., Kang, D., and Kim, H.B., Geometric Mean Based Boosting Algorithm with over-Sampling to Resolve Data Imbalance Problem for Bankruptcy Prediction, Expert Systems with Applications, 2015, Vol. 42, No. 3, pp. 1074-1082. https://doi.org/10.1016/j.eswa.2014.08.025
  18. Kuncheva, L.I. and Whitaker, C.J., Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy, Machine Learning, 2003, Vol. 51, No. 2, pp. 181-207. https://doi.org/10.1023/A:1022859003006
  19. Li, H., Lee, Y.-C., Zhou, Y.-C., and Sun, J., The random subspace binary logit (RSBL) model for bankruptcy prediction, Knowledge-Based Systems, 2011, Vol. 24, No. 8, pp. 1380-1388. https://doi.org/10.1016/j.knosys.2011.06.015
  20. Louzada, F., Anacleto-Junior, O., Candolo, C., and Mazucheli, J., Poly-bagging predictors for classification modelling for credit scoring, Expert Systems with Applications, 2011, Vol. 38, No. 10, pp. 2717-12720. https://doi.org/10.1016/j.eswa.2010.08.061
  21. Marques, A.I., Garcia, V., and Sanchez, J.S., Exploring the Behaviour of Base Classifiers in Credit Scoring Ensembles, Expert Systems with Applications, 2012, Vol. 39, No. 11, pp. 10244-10250. https://doi.org/10.1016/j.eswa.2012.02.092
  22. Messier, W. and Hansen, J., Inducing rules for expert system development : an example using default and bankruptcy data, Management Science, 1998, Vol. 34, No. 12, pp. 1403-1415. https://doi.org/10.1287/mnsc.34.12.1403
  23. Meyer, P.A. and Pifer, H., Prediction of bank failures, The Journal of Finance, 1970, Vol. 25, pp. 853-868. https://doi.org/10.1111/j.1540-6261.1970.tb00558.x
  24. Min, S.-H., Lee, J., and Han, I., Hybrid genetic algorithms and support vector machines for bankruptcy prediction, Expert Systems with Applications, 2006, Vol. 31, No. 3, pp. 652-660. https://doi.org/10.1016/j.eswa.2005.09.070
  25. Nanni, L. and Lumini, A., An Experimental Comparison of Ensemble of Classifiers for Bankruptcy Prediction and Credit Scoring, Expert Systems with Applications, 2009, Vol. 36, No. 2, pp. 3028-3033. https://doi.org/10.1016/j.eswa.2008.01.018
  26. Ohlson, J., Financial ratios and the probabilistic prediction of bankruptcy, Journal of Accounting Research, 1980, Vol. 18, No. 1, pp. 109-131. https://doi.org/10.2307/2490395
  27. Park, K.-J., Simulation Optimization of Manufacturing System using Real-coded Genetic Algorithm, Journal of Society of Korea Industrial and Systems Engineering, 2005, Vol. 28, No. 3, pp. 149-155.
  28. Tam, K. and Kiang, M., Managerial applications of neural networks : the case of bank failure predictions, Management Science, 1992, Vol. 38, No. 7, pp. 926-947. https://doi.org/10.1287/mnsc.38.7.926
  29. Tsai, C. and Wu, J., Using Neural Network Ensembles for Bankruptcy Prediction and Credit Scoring, Expert Systems with Applications, 2008, Vol. 34, No. 4, pp. 2639-2649. https://doi.org/10.1016/j.eswa.2007.05.019
  30. Wang, G. and Ma, J., A hybrid ensemble approach for enterprise credit risk assessment based on Support Vector Machine, Expert Systems with Applications, 2009, Vol. 39, No. 5, pp. 5325-5331.
  31. www.kaggle.com/c/GiveMeSomeCredit (Give Me Some Credit).
  32. Yoo, J., Release Planning in Software Product Lines Using a Genetic Algorithm, Journal of Society of Korea Industrial and Systems Engineering, 2012, Vol. 35, No. 4, pp. 142-148. https://doi.org/10.11627/jkise.2012.35.4.142
  33. Yum, C.-S. and Lee, H.-J., Economic Design of Local Area Networks using Genetic Algorithms, Journal of Society of Korea Industrial and Systems Engineering, 2005, Vol. 28, No. 2, pp. 101-108.
  34. Yum, J.K., Nam, K.S., A Study of D-Optimal Design in Nonlinear Model Using the Genetic Algorithm, Journal of the Korean Society for Quality Management, 2000, Vol. 28, No. 2, pp. 135-146.
  35. Zhang, G., Hu, Y.M., Patuwo, E.B., and Indro, C.D., Artificial neural networks in bankruptcy prediction : general framework and cross-validation analysis, European Journal of Operational Research, 1999, Vol. 116, pp. 16-32. https://doi.org/10.1016/S0377-2217(98)00051-4