DOI QR코드

DOI QR Code

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis

부도예측을 위한 KNN 앙상블 모형의 동시 최적화

  • Min, Sung-Hwan (Department of Business Administration, Hallym University)
  • Received : 2016.03.02
  • Accepted : 2016.03.14
  • Published : 2016.03.31

Abstract

Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

앙상블 분류기란 개별 분류기보다 더 좋은 성과를 내기 위해 다수의 분류기를 결합하는 것을 의미한다. 이와 같은 앙상블 분류기는 단일 분류기의 일반화 성능을 향상시키는데 매우 유용한 것으로 알려져 있다. 랜덤 서브스페이스 앙상블 기법은 각각의 기저 분류기들을 위해 원 입력 변수 집합으로부터 랜덤하게 입력 변수 집합을 선택하며 이를 통해 기저 분류기들을 다양화 시키는 기법이다. k-최근접 이웃(KNN: k nearest neighbor)을 기저 분류기로 하는 랜덤 서브스페이스 앙상블 모형의 성과는 단일 모형의 성과를 개선시키는 데 효과적인 것으로 알려져 있으며, 이와 같은 랜덤 서브스페이스 앙상블의 성과는 각 기저 분류기를 위해 랜덤하게 선택된 입력 변수 집합과 KNN의 파라미터 k의 값이 중요한 영향을 미친다. 하지만, 단일 모형을 위한 k의 최적 선택이나 단일 모형을 위한 입력 변수 집합의 최적 선택에 관한 연구는 있었지만 KNN을 기저 분류기로 하는 앙상블 모형에서 이들의 최적화와 관련된 연구는 없는 것이 현실이다. 이에 본 연구에서는 KNN을 기저 분류기로 하는 앙상블 모형의 성과 개선을 위해 각 기저 분류기들의 k 파라미터 값과 입력 변수 집합을 동시에 최적화하는 새로운 형태의 앙상블 모형을 제안하였다. 본 논문에서 제안한 방법은 앙상블을 구성하게 될 각각의 KNN 기저 분류기들에 대해 최적의 앙상블 성과가 나올 수 있도록 각각의 기저 분류기가 사용할 파라미터 k의 값과 입력 변수를 유전자 알고리즘을 이용해 탐색하였다. 제안한 모형의 검증을 위해 국내 기업의 부도 예측 관련 데이터를 가지고 다양한 실험을 하였으며, 실험 결과 제안한 모형이 기존의 앙상블 모형보다 기저 분류기의 다양화와 예측 성과 개선에 효과적임을 알 수 있었다.

Keywords

References

  1. Abellan, J. and C. J. Mantas, "Improving Experimental Studies about Ensembles of Classifiers for Bankruptcy Prediction and Credit Scoring," Expert Systems with Applications, Vol.41, No.8(2014), 3825-3830. https://doi.org/10.1016/j.eswa.2013.12.003
  2. Alexandre, L., A. Campihlo, and M. Kamel, "On combining classifiers using sum and product rules," Pattern Recognition Letter, Vol.22, No.12(2001), 1283-1289. https://doi.org/10.1016/S0167-8655(01)00073-3
  3. Altman, E. I., "Financial ratios, discriminant analysis and the prediction of corporate bankruptcy," The Journal of Finance, Vol.23, No.4(1968), 589-609. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
  4. Beaver, W. H., "Financial ratios as predictors of failure," Journal of Accounting Research, Vol.4(1966), 71-111. https://doi.org/10.2307/2490171
  5. Bian, S. and W. Wang, "On diversity and accuracy of homogeneous and heterogeneous ensembles," International Journal of Hybrid Intelligent Systems, Vol.4, No.2(2007), 103-128. https://doi.org/10.3233/HIS-2007-4204
  6. Breiman, L., "Bagging predictors," Machine Learning, Vol.24, No.2(1996), 123-140. https://doi.org/10.1023/A:1018054314350
  7. Buta, P., "Mining for financial knowledge with CBR," AI Expert, Vol.9, No.10(1994), 34-41.
  8. Dietterich, T. G., "Machine-learning research: Four current directions," AI Magazine, Vol.18, No.4(1997), 97-136.
  9. Goldberg, D. E., Genetic algorithms in search, optimization and machine learning, Addison-Wesley, New York, 1989.
  10. Ho, T., "The random subspace method for construction decision forests," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.20, No.8(1998), 832-844. https://doi.org/10.1109/34.709601
  11. Ho, T., "Multiple classifier combination: Lessons and next steps," Series in Machine Perception and Artificial Intelligence, Vol.47(2002), 171-198.
  12. Hung, C. and J-H. Chen, "A Selective Ensemble Based on Expected Probabilities for Bankruptcy Prediction," Expert Systems with Applications, Vol.36, No.3(2009), 5297-5303. https://doi.org/10.1016/j.eswa.2008.06.068
  13. Kuncheva, L., J. Bezdek, and R. Duin, "Decision templates for multiple classifier fusion: an experimental comparison," Pattern Recognition, Vol.34, No.2(2001), 299-314. https://doi.org/10.1016/S0031-3203(99)00223-X
  14. Kuncheva, L. I. and C. J. Whitaker, "Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy," Machine Learning, Vol.51, No.2(2003), 181-207. https://doi.org/10.1023/A:1022859003006
  15. Kim, M., "Ensemble Learning for Solving Data Imbalance in Bankruptcy Prediction," Journal of Intelligence and Information Systems, Vol.15, No.3(2009), 1-15.
  16. Kim, M., "Optimal Selection of Classifier Ensemble Using Genetic Algorithms," Journal of Intelligence and Information Systems, Vol.16, No.4(2010), 99-112.
  17. Kim, S. H. and J. W. Kim, "SOHO Bankruptcy Prediction Using Modified Bagging Predictors," Journal of Intelligence and Information Systems, Vol.13, No.2(2007), 15-26.
  18. Li, H., Y.-C. Lee, Y.-C. Zhou, and J. Sun, "The random subspace binary logit (RSBL) model for bankruptcy prediction," Knowledge-Based Systems, Vol.24, No.8(2011), 1380-1388. https://doi.org/10.1016/j.knosys.2011.06.015
  19. Li, K., Z. Liu, and Y. Han, "Study of Selective Ensemble Learning Methods Based on Support Vector Machine," Physics Procedia, Vol.33(2012), 1518-1525. https://doi.org/10.1016/j.phpro.2012.05.247
  20. Louzada, F., O. Anacleto-Junior, C. Candolo, and J. Mazucheli, "Poly-bagging predictors for classification modelling for credit scoring," Expert Systems with Applications, Vol.38, No.10(2011), 2717-12720. https://doi.org/10.1016/j.eswa.2010.08.061
  21. Mandler, E. and J. Schurmann, "Combining the classification results of independent classifiers based on the Dempster-Shafer theory of evidence," In E.S. Geselma and L.N. Kanal (eds.), Pattern Recognition and Artificial Intelligence, North Holland, Amsterdam, (1988), 381-393.
  22. Marques, A. I., V. Garcia, and J. S. Sanchez, "Two-Level Classifier Ensembles for Credit Risk Assessment," Expert Systems with Applications, Vol.39, No.12(2012), 10916-10922. https://doi.org/10.1016/j.eswa.2012.03.033
  23. Meyer, P. A. and H. W. Pifer, "Prediction of bank failures," The Journal of Finance, Vol.25, No.4(1970), 853-868. https://doi.org/10.1111/j.1540-6261.1970.tb00558.x
  24. Messier, W. F. Jr. and J. V. Hansen, "Inducing rules for expert system development: an example using default and bankruptcy data," Management Science, Vol.34, No.12(1998), 1403-1415. https://doi.org/10.1287/mnsc.34.12.1403
  25. Min, S., "Developing an Ensemble Classifier for Bankruptcy Prediction," Journal of the Korea Society Industrial Information System, Vol.17, No.7(2012), 139-148. https://doi.org/10.9723/jksiis.2012.17.7.139
  26. Min, S., "Bankruptcy Prediction Using an Improved Bagging Ensemble," Journal of Intelligence and Information Systems, Vol.20, No.4(2014), 121-139. https://doi.org/10.13088/JIIS.2014.20.4.121
  27. Nanni, L. and A. Lumini, "An Experimental Comparison of Ensemble of Classifiers for Bankruptcy Prediction and Credit Scoring," Expert Systems with Applications, Vol.36, No.2(2009), 3028-3033. https://doi.org/10.1016/j.eswa.2008.01.018
  28. Ohlson, J. A., "Financial ratios and the probabilistic prediction of bankruptcy," Journal of Accounting Research, Vol.18, No.1(1980), 109-131. https://doi.org/10.2307/2490395
  29. Tam, K. Y. and M. Y. Kiang, "Managerial applications of neural networks: the case of bank failure predictions," Management Science, Vol.38, No.7(1992), 926-947. https://doi.org/10.1287/mnsc.38.7.926
  30. Tsai, C. and J. Wu, "Using Neural Network Ensembles for Bankruptcy Prediction and Credit Scoring," Expert Systems with Applications, Vol.34, No.4(2008), 2639-2649. https://doi.org/10.1016/j.eswa.2007.05.019
  31. Wang, G. and J. Ma, "A hybrid ensemble approach for enterprise credit risk assessment based on Support Vector Machine," Expert Systems with Applications, Vol.39, No.5(2009), 5325-5331.
  32. Zhang, G., Y. M. Hu, E. B. Patuwo, and C. D. Indro, "Artificial neural networks in bankruptcy prediction: general framework and cross-validation analysis," European Journal of Operational Research, Vol.116, No.1(1999), 16-32. https://doi.org/10.1016/S0377-2217(98)00051-4