DOI QR코드

DOI QR Code

Bankruptcy prediction using an improved bagging ensemble

개선된 배깅 앙상블을 활용한 기업부도예측

  • Min, Sung-Hwan (Department of Business Administration, Hallym University)
  • Received : 2014.11.12
  • Accepted : 2014.12.18
  • Published : 2014.12.30

Abstract

Predicting corporate failure has been an important topic in accounting and finance. The costs associated with bankruptcy are high, so the accuracy of bankruptcy prediction is greatly important for financial institutions. Lots of researchers have dealt with the topic associated with bankruptcy prediction in the past three decades. The current research attempts to use ensemble models for improving the performance of bankruptcy prediction. Ensemble classification is to combine individually trained classifiers in order to gain more accurate prediction than individual models. Ensemble techniques are shown to be very useful for improving the generalization ability of the classifier. Bagging is the most commonly used methods for constructing ensemble classifiers. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. Instance selection is to select critical instances while deleting and removing irrelevant and harmful instances from the original set. Instance selection and bagging are quite well known in data mining. However, few studies have dealt with the integration of instance selection and bagging. This study proposes an improved bagging ensemble based on instance selection using genetic algorithms (GA) for improving the performance of SVM. GA is an efficient optimization procedure based on the theory of natural selection and evolution. GA uses the idea of survival of the fittest by progressively accepting better solutions to the problems. GA searches by maintaining a population of solutions from which better solutions are created rather than making incremental changes to a single solution to the problem. The initial solution population is generated randomly and evolves into the next generation by genetic operators such as selection, crossover and mutation. The solutions coded by strings are evaluated by the fitness function. The proposed model consists of two phases: GA based Instance Selection and Instance based Bagging. In the first phase, GA is used to select optimal instance subset that is used as input data of bagging model. In this study, the chromosome is encoded as a form of binary string for the instance subset. In this phase, the population size was set to 100 while maximum number of generations was set to 150. We set the crossover rate and mutation rate to 0.7 and 0.1 respectively. We used the prediction accuracy of model as the fitness function of GA. SVM model is trained on training data set using the selected instance subset. The prediction accuracy of SVM model over test data set is used as fitness value in order to avoid overfitting. In the second phase, we used the optimal instance subset selected in the first phase as input data of bagging model. We used SVM model as base classifier for bagging ensemble. The majority voting scheme was used as a combining method in this study. This study applies the proposed model to the bankruptcy prediction problem using a real data set from Korean companies. The research data used in this study contains 1832 externally non-audited firms which filed for bankruptcy (916 cases) and non-bankruptcy (916 cases). Financial ratios categorized as stability, profitability, growth, activity and cash flow were investigated through literature review and basic statistical methods and we selected 8 financial ratios as the final input variables. We separated the whole data into three subsets as training, test and validation data set. In this study, we compared the proposed model with several comparative models including the simple individual SVM model, the simple bagging model and the instance selection based SVM model. The McNemar tests were used to examine whether the proposed model significantly outperforms the other models. The experimental results show that the proposed model outperforms the other models.

기업의 부도 예측은 재무 및 회계 분야에서 매우 중요한 연구 주제이다. 기업의 부도로 인해 발생하는 비용이 매우 크기 때문에 부도 예측의 정확성은 금융기관으로서는 매우 중요한 일이다. 최근에는 여러 개의 모형을 결합하는 앙상블 모형을 부도 예측에 적용해 보려는 연구가 큰 관심을 끌고 있다. 앙상블 모형은 개별 모형보다 더 좋은 성과를 내기 위해 여러 개의 분류기를 결합하는 것이다. 이와 같은 앙상블 분류기는 분류기의 일반화 성능을 개선하는 데 매우 유용한 것으로 알려져 있다. 본 논문은 부도 예측 모형의 성과 개선에 관한 연구이다. 이를 위해 사례 선택(Instance Selection)을 활용한 배깅(Bagging) 모형을 제안하였다. 사례 선택은 원 데이터에서 가장 대표성 있고 관련성 높은 데이터를 선택하고 예측 모형에 악영향을 줄 수 있는 불필요한 데이터를 제거하는 것으로 이를 통해 예측 성과 개선도 기대할 수 있다. 배깅은 학습데이터에 변화를 줌으로써 기저 분류기들을 다양화시키는 앙상블 기법으로 단순하면서도 성과가 매우 좋은 것으로 알려져 있다. 사례 선택과 배깅은 각각 모형의 성과를 개선시킬 수 있는 잠재력이 있지만 이들 두 기법의 결합에 관한 연구는 아직까지 없는 것이 현실이다. 본 연구에서는 부도 예측 모형의 성과를 개선하기 위해 사례 선택과 배깅을 연결하는 새로운 모형을 제안하였다. 최적의 사례 선택을 위해 유전자 알고리즘이 사용되었으며, 이를 통해 최적의 사례 선택 조합을 찾고 이 결과를 배깅 앙상블 모형에 전달하여 새로운 형태의 배깅 앙상블 모형을 구성하게 된다. 본 연구에서 제안한 새로운 앙상블 모형의 성과를 검증하기 위해 ROC 커브, AUC, 예측정확도 등과 같은 성과지표를 사용해 다양한 모형과 비교 분석해 보았다. 실제 기업데이터를 사용해 실험한 결과 본 논문에서 제안한 새로운 형태의 모형이 가장 좋은 성과를 보임을 알 수 있었다.

Keywords

References

  1. Ahn, H., K.-j. Kim, and I. Han, "Simultaneous Optimization Model of Case-Based Reasoning for Effective Customer Relationship Management," Journal of Intelligence and Information Systems, Vol.11, No.2(2005),175-195.
  2. Altman, E. I., "Financial ratios, discriminant analysis and the prediction of corporate bankruptcy," The Journal of Finance, Vol.23, No.4(1968), 589-609. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
  3. Beaver, W. H., "Financial ratios as predictors of failure," Journal of Accounting Research, Vol.4(1966), 71-111. https://doi.org/10.2307/2490171
  4. Bian, S. and W. Wang, "On diversity and accuracy of homogeneous and heterogeneous ensembles," International Journal of Hybrid Intelligent Systems, Vol.4, No.2(2007), 103-128. https://doi.org/10.3233/HIS-2007-4204
  5. Breiman, L., "Bagging predictors," Machine Learning, Vol. 24, No.2(1996), 123-140.
  6. Buta, P., "Mining for financial knowledge with CBR," AI Expert, Vol.9, No.10(1994), 34-41.
  7. Bryant, S. M., "A case-based reasoning approach to bankruptcy prediction modeling," Intelligent Systems in Accounting, Finance and Management, Vol.6, No.3(1997), 195-214. https://doi.org/10.1002/(SICI)1099-1174(199709)6:3<195::AID-ISAF132>3.0.CO;2-F
  8. Derrac, J., C. Cornelis, S. García, and F. Herrera, "Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection," Information Sciences, Vol.186, No.1(2012), 73-92. https://doi.org/10.1016/j.ins.2011.09.027
  9. Dietterich, T. G., "Machine-learning research: Four current directions," AI Magazine, Vol.18, No.4(1997), 97-136.
  10. Dimitras, A. I., S. H. Zanakis, and C. Zopounidis, "A survey of business failure with an emphasis on prediction methods and industrial applications," European Journal of Operational Research, Vol.90, No.3(1996), 487-513. https://doi.org/10.1016/0377-2217(95)00070-4
  11. Fawcett, T., "An Introduction to ROC Analysis," Pattern Recognition Letters, Vol.27, No.8(2006), 861-874. https://doi.org/10.1016/j.patrec.2005.10.010
  12. Garcia, V., A. I. Marques, and J. S. Sanchez, "On the use of data filtering techniques for credit risk prediction with instance-based models," Expert Systems with Applications, Vol.39, No.18(2012), 13267-13276. https://doi.org/10.1016/j.eswa.2012.05.075
  13. Hart, P. E., "The condensed nearest neighbor rule," IEEE Transactions on Information Theory, Vol.14 (1968), 515-516. https://doi.org/10.1109/TIT.1968.1054155
  14. Hong, S.-H., K.-S. Shin, "Using GA based Input Selection Method for Artificial Neural Network Modeling: Application to Bankruptcy Prediction," Journal of Intelligence and Information Systems, Vol.9, No.1(2003), 227-249.
  15. Kim, D., S.-H. Min., I. Han, "Corporate Credit Rating using Partitioned Neural Network and Case-Based Reasoning," Journal of Information Technology Applications and Management, Vol.14, No.2(2007), 151-168.
  16. Kim, K.-j., "Data Mining using Instance Selection in Artificial Neural Networks for Bankruptcy Prediction," Journal of Intelligence and Information Systems, Vol.10, No.1(2004), 109-123.
  17. Kim, K.-j. and H. Ahn, "Optimization of Support Vector Machines for Financial Forecasting," Journal of Intelligence and Information Systems, Vol.17, No.4(2011), 241-254.
  18. Kim, M. J. "A Performance Comparison of Ensemble in Bankruptcy Prediction," Entrue Journal of Information Technology, Vol.8, No.2(2009), 41-49.
  19. Kim, M., "Optimal Selection of Classifier Ensemble Using Genetic Algorithms," Journal of Intelligence and Information Systems, Vol.16, No.4 (2010), 99-112.
  20. Kim, M.-J., "Ensemble Learning with Support Vector Machines for Bond Rating," Journal of Intelligence and Information Systems, Vol.18, No.2(2012), 29-45.
  21. Kim, S. H. and J. W. Kim, "SOHO Bankruptcy Prediction Using Modified Bagging Predictors," Journal of Intelligence and Information Systems, Vol.13, No.2(2007), 15-26.
  22. Kuncheva, L. I., Combining Pattern Classifiers: Methods and Algorithms, John Wiley & Sons, Inc., Hoboken, New Jersey, 2004.
  23. Kuncheva, L. I. and C. J. Whitaker, "Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy," Machine Learning, Vol.51, No.2(2003), 181-207. https://doi.org/10.1023/A:1022859003006
  24. Messier, W. F. Jr. and J. V. Hansen, "Inducing rules for expert system development: an example using default and bankruptcy data," Management Science, Vol.34, No.12(1998), 1403-1415.
  25. Meyer, P. A. and H. W. Pifer, "Prediction of bank failures," The Journal of Finance, Vol.25, No.4(1970), 853-868. https://doi.org/10.1111/j.1540-6261.1970.tb00558.x
  26. Min, S.-H., "Developing an Ensemble Classifier for Bankruptcy Prediction," Journal of the Korea Society Industrial Information System, Vol.17, No.7(2012), 139-148. https://doi.org/10.9723/jksiis.2012.17.7.139
  27. Ohlson, J. A., "Financial ratios and the probabilistic prediction of bankruptcy," Journal of Accounting Research, Vol.18, No.1(1980), 109-131. https://doi.org/10.2307/2490395
  28. Ok, J.-k. and K.-j. Kim, "Integrated Corporate Bankruptcy Prediction Model Using Genetic Algorithms," Journal of Intelligence and Information Systems, Vol.15, No.4(2009), 99-121.
  29. Shaw, M. J. and J. A. Gentry, "Using and expert system with inductive learning to evaluate business loans," Financial Management, Vol.17, No.3(1988), 45-56.
  30. Shin, T, and T. Hong, "Corporate Credit Rating Based on Bankruptcy Probability Using AdaBoost Algorithm-Based Support Vector Machine," Journal of Intelligence and Information Systems, Vol.17, No. 3(2011), 25-41.
  31. Tai, Q.-y. and K.-s. Shin, "GA-based Normalization Approach in Back-propagation Neural Network for Bankruptcy Prediction Modeling," Journal of Intelligence and Information Systems, Vol.16, No.3(2010), 1-14.
  32. Tam, K. Y. and Kiang, M. Y., "Managerial applications of neural networks: the case of bank failure predictions," Management Science, Vol.38, No.7(1992), 926-947. https://doi.org/10.1287/mnsc.38.7.926
  33. Vapnik, V. N., The nature of statistical learning theory, Springer, New York, 1995.

Cited by

  1. 부트스트랩 샘플링 최적화를 통한 앙상블 모형의 성능 개선 vol.17, pp.2, 2014, https://doi.org/10.7472/jksii.2016.17.2.49