DOI QR코드

DOI QR Code

A Hybrid Under-sampling Approach for Better Bankruptcy Prediction

부도예측 개선을 위한 하이브리드 언더샘플링 접근법

  • Kim, Taehoon (Graduate School of Business IT, Kookmin University) ;
  • Ahn, Hyunchul (Graduate School of Business IT, Kookmin University)
  • 김태훈 (국민대학교 비즈니스IT전문대학원) ;
  • 안현철 (국민대학교 비즈니스IT전문대학원)
  • Received : 2015.05.20
  • Accepted : 2015.06.16
  • Published : 2015.06.30

Abstract

The purpose of this study is to improve bankruptcy prediction models by using a novel hybrid under-sampling approach. Most prior studies have tried to enhance the accuracy of bankruptcy prediction models by improving the classification methods involved. In contrast, we focus on appropriate data preprocessing as a means of enhancing accuracy. In particular, we aim to develop an effective sampling approach for bankruptcy prediction, since most prediction models suffer from class imbalance problems. The approach proposed in this study is a hybrid under-sampling method that combines the k-Reverse Nearest Neighbor (k-RNN) and one-class support vector machine (OCSVM) approaches. k-RNN can effectively eliminate outliers, while OCSVM contributes to the selection of informative training samples from majority class data. To validate our proposed approach, we have applied it to data from H Bank's non-external auditing companies in Korea, and compared the performances of the classifiers with the proposed under-sampling and random sampling data. The empirical results show that the proposed under-sampling approach generally improves the accuracy of classifiers, such as logistic regression, discriminant analysis, decision tree, and support vector machines. They also show that the proposed under-sampling approach reduces the risk of false negative errors, which lead to higher misclassification costs.

부도는 막대한 사회적, 경제적 손실을 야기할 수 있으므로, 미리 부도여부를 정확하게 예측하여 선제 대응하는 것은 경영분야에서 대단히 중요한 의사결정문제 중 하나이다. 이에 지능정보시스템 분야에서도 그간 기업의 재무 데이터에 기반해 부도예측을 개선하기 위한 노력을 기울여왔는데, 안타깝게도 기존의 연구들은 대부분 분류모형의 성능 개선을 통해 예측 정확도를 개선하는 것에만 주로 초점을 맞추어 다른 요소들을 충분히 고려하지 못했다는 한계가 있다. 이러한 배경에서 본 연구는 부도예측 모형의 정확도를 개선하기 위한 방편으로 새로운 데이터 전처리 방법, 그 중에서도 효과적인 표본추출 방법을 제안하고자 한다. 일반적으로 부도예측을 위해 사용되는 데이터들은 극심한 데이터 불균형 문제에 노출되어 있는데, 본 연구에서는 k-reverse nearest neighbor(k-RNN)와 one-class support vector machine(OCSVM) 방법을 결합한 하이브리드 언더샘플링(hybrid under-sampling) 접근법을 통해 이같은 데이터 불균형 문제를 해결하고자 하였다. 본 연구에서 제안한 접근법에서 k-RNN은 이상치를 효과적으로 제거할 수 있으며, OCSVM은 다수를 구성하는 등급의 데이터로부터 정보량이 풍부한 표본만 효과적으로 선택할 수 있는 수단으로 활용될 수 있다. 제안된 기법의 성능을 검증하기 위해, 본 연구에서는 국내 한 은행의 비외감기업 부도예측모형 구축에 제안 기법을 적용해 본 뒤, 일반적으로 많이 사용되는 랜덤샘플링(random sampling)과 제안 기법의 성능을 비교해 보았다. 그 결과, 로지스틱 회귀분석, 판별분석, 의사결정나무, SVM 등 대다수의 분류모형에 있어 분류 정확도가 개선됨을 확인할 수 있었으며, 모든 분류모형에 있어 부정 오류, 즉 부실기업을 정상으로 예측하는 오류율이 크게 감소함을 확인할 수 있었다.

Keywords

References

  1. Ahn, H., and K.-j. Kim., "Corporate Bond Rating using Various Multiclass Support Vector Machines." Asia Pacific Journal of Information Systems, Vol.19, No.2(2009), 157-178.
  2. Altman, E. I., "Financial ratios, discriminant analysis and the prediction of corporate bankruptcy," The Journal of Finance, Vol.23, No.4(1968), 589-609. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
  3. Anitha, R., and S. Santhi., "Minority Oversampling Technique for Imbalanced Dataset Learning Using Agglomerative Clustering," International Journal of Emerging Technology and Innovative Engineering, Vol. 1, No.3(2015), 137-142.
  4. Bellovary, J. L., D. E. Giacomino, and M. D. Aker, "A Review of Bankru,ptcy Prediction Studies: 1930 to Present," Journal of Financial Education, Vol.33, No.4(2007), 1-43.
  5. Chang, C. -C. and C.-J. Lin, "LIBSVM : a library for support vector machines," ACM Transactions on Intelligent Systems and Technology, Vol.2, No.3(2011), 1-27. Software available at http://www.csie.ntu.edu.tw/-cjlin/libsvm.
  6. Chawla, N. V., K. W. Bowyer, and L. O. Hall, W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research, Vol.16(2002), 321-547.
  7. Choi, S. Y., and H. Ahn, "Optimized Bankruptcy Prediction through Combining SVM with Fuzzy Theory," Journal of Digital Convergence, Vol.13, No.3(2015), 155-165. https://doi.org/10.14400/JDC.2015.13.3.155
  8. Deakin, E., "A Discriminant Analysis of Predictors of Business Failure," Journal of Accounting, Vol.10, No.1(1974), 167-179.
  9. Garcia, V., J. S. Sanchez, and R. A. Mollineda, "On the effectiveness of preprocessing methods when dealing with different levels of class imbalance," Knowledge-Based Systems, Vol. 25(2012), 13-21. https://doi.org/10.1016/j.knosys.2011.06.013
  10. Hart, P. E., "The Condensed Nearest Neighbor Rule," IEEE Transactions on Information Theory, Vol. 18, (1968), 515-516.
  11. Jindaluang, W., V. Chouvatut, and S. Kantabutra, "Under-sampling by algorithm with performance guaranteed for class-imbalance problem," Computer Science and Engineering Conference (ICSEC), (2014), 215-221.
  12. Kim, M. J., D. K. Kang, and H.B. Kim, "Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction," Expert Systems with Applications, Vol.42, No.3(2015), 1074-1082. https://doi.org/10.1016/j.eswa.2014.08.025
  13. Kim, S., C. S. Park, and S. M. Jeon, "Default Decisions of FIs and Endogeneity Problems in Default Prediction," Journal of Business Research, Vol.26, No.1(2011), 99-132.
  14. Kotsiantis, S., D. Tzelepis, E. Koumanakos, and V. Tampakas, "Selective costing voting for bankruptcy prediction," International Journal of Knowledge-based and Intelligent Engineering Systems, Vol.11(2007), 115-127. https://doi.org/10.3233/KES-2007-11204
  15. Kumar, P. and V. Ravi, "Bankruptcy prediction in banks and firms via statistical and intelligent techniques-A review," European Journal of Operational Research, Vol.180, No.1(2007), 1-28. https://doi.org/10.1016/j.ejor.2006.08.043
  16. Kumar, P., P. R. Krishna, and S. B. Raju, Pattern Discovery Using Sequence Data Mining: Applications and Studies: Applications and Studies, IGI Global, Hershey, Pennsylvania, 2011.
  17. Lee, J. S. and J. G. Kwon, "A Hybrid SVM Classifier for Imbalanced Data Sets," Journal of Intelligence and Information Systems, Vol.19, No.2(2013), 125-140. https://doi.org/10.13088/jiis.2013.19.2.125
  18. Liu, A., J. Ghosh, and C. E. Martin, "Generative Oversampling for Mining Imbalanced Datasets," Proceedings of the 2007 International Conference on Data Mining, (2007), 66-72.
  19. Liu, X. Y., J. Wu, and Z. H. Zhou, "Exploratory undersampling for class-imbalance learning," IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, Vol.39 No. 2(2009), 539-550. https://doi.org/10.1109/TSMCB.2008.2007853
  20. Min, J. H. and Y.-C. Lee, "Bankruptcy Prediction Using Support Vector Machine with Optimal Choice of Kernel Function Parameters," Expert Systems with Applications, Vol.28, No.4(2005), 603-614. https://doi.org/10.1016/j.eswa.2004.12.008
  21. Ng, W. W., J. Hu, D. S. Yeung, S. Yin, and F. Roli, F, "Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems," IEEE Transactions on Cybernetics, (2015), Forthcoming.
  22. Odom, M. D., and R. Sharda, "A Neural Network Model For Bankruptcy Prediction," Proceedings of the International Joint Conference on Neural networks, Vol.2(1990), 163-168.
  23. Ohlson, J. A., "Financial Ratios and the Probabilistic Prediction of Bankruptcy," Journal of Accounting Research, Vol.18, No.1(1980), 109-131. https://doi.org/10.2307/2490395
  24. Park, J.-m., K.-j. Kim, and I. Han, "Bankruptcy Prediction using Support Vector Machines," Asia Pacific Journal of Information Systems, Vol.15, No.2(2005), 51-63.
  25. Serrano-Cinsa, C., "Self organizing neural networks for financial diagnosis," Decision Support Systems, Vol.17, No.3(1996), 227-238. https://doi.org/10.1016/0167-9236(95)00033-X
  26. Shin, K.-S., T. S. Lee, and H.-j. Kim, "An application of support vector machines in bankruptcy prediction model," Expert Systems with Applications, Vol.28, No.1(2005), 127-135. https://doi.org/10.1016/j.eswa.2004.08.009
  27. Shin, T. and T. Hong, "Corporate Credit Rating based on Bankruptcy Probability Using AdaBoost Algorithm-based Support Vector Machine," Journal of Intelligence and Information Systems, Vol.17, No.3(2011), 25-41.
  28. Soujanya, V., R. V. Satyanarayana, and K. Kamalakar, "A Simple Yet Effective Data Clustering Algorithm," Proceedings of the Sixth International Conference on Data Mining(ICDM'06), Hong Kong, (2006), 1108-1112.
  29. Sundarkumar, G. G. and V. Ravi, "A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance," Engineering Applications of Artificial Intelligence, Vol.37, (2015), 368-377. https://doi.org/10.1016/j.engappai.2014.09.019
  30. Tai, Q.-y., and K.-s. Shin, "GA-based Normalization Approach in Backpropagation Neural Network for Bankruptcy Prediction Modeling," Journal of Intelligence and Information Systems, Vol.15, No.3(2009), 1-14.
  31. Tam, K. Y. and M. Y. Kiang, " Managerial Applications of Neural Networks : The Case of Bank Failure Predictions," Management science, Vol.38, No.7(1992), 926-947. https://doi.org/10.1287/mnsc.38.7.926
  32. Tax, D. M. J., and R. P. W. Duin, "Support Vector Data Description," Machine Learning, Vol. 54, No.1(2004), 45-66. https://doi.org/10.1023/B:MACH.0000008084.60811.49
  33. Vapnik, V. N., Statistical Learning Theory, John Wiley & Sons, New York, 1998.
  34. Wang, D., and M. Shi, "Density Weighted Region Growing Method for Imbalanced Data SVM Classification in Under-sampling Approaches," Journal of Information & Computational Science, Vol.11, No.18(2014), 6673-6680. https://doi.org/10.12733/jics20105078
  35. Yang, J., and V. Honavar, "Feature Subset Selection Using a Genetic Algorithm," Computer Science Technical Reports, (1997), Paper 156.
  36. Zhou, L., "Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling method," Knowledge-Based Systems, Vol.41(2013), 16-25. https://doi.org/10.1016/j.knosys.2012.12.007
  37. Zhou, L., K. K. Lai, and J. Yen, "Bankruptcy prediction using SVM models with a new approach to combine features selection and parameter optimisation," International Journal of Systems Science, Vol.45, No.3(2014), 241-253. https://doi.org/10.1080/00207721.2012.720293

Cited by

  1. Clickstream Big Data Mining for Demographics based Digital Marketing vol.22, pp.3, 2016, https://doi.org/10.13088/jiis.2016.22.3.143
  2. Design of Client-Server Model For Effective Processing and Utilization of Bigdata vol.22, pp.4, 2016, https://doi.org/10.13088/jiis.2016.22.4.109
  3. RNN(Recurrent Neural Network)을 이용한 기업부도예측모형에서 회계정보의 동적 변화 연구 vol.23, pp.3, 2015, https://doi.org/10.13088/jiis.2017.23.3.139
  4. 효과적인 기업부도 예측모형을 위한 ROSE 표본추출기법의 적용 vol.18, pp.8, 2015, https://doi.org/10.5392/jkca.2018.18.08.525
  5. KOSDAQ 시장의 관리종목 지정 탐지 모형 개발 vol.24, pp.3, 2015, https://doi.org/10.13088/jiis.2018.24.3.157
  6. Predicting Corporate Bankruptcy using Simulated Annealing-based Random Fores vol.24, pp.4, 2015, https://doi.org/10.13088/jiis.2018.24.4.155