• Title/Summary/Keyword: default prediction

Search Result 60, Processing Time 0.028 seconds

The Prediction of Export Credit Guarantee Accident using Machine Learning (기계학습을 이용한 수출신용보증 사고예측)

  • Cho, Jaeyoung;Joo, Jihwan;Han, Ingoo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.83-102
    • /
    • 2021
  • The government recently announced various policies for developing big-data and artificial intelligence fields to provide a great opportunity to the public with respect to disclosure of high-quality data within public institutions. KSURE(Korea Trade Insurance Corporation) is a major public institution for financial policy in Korea, and thus the company is strongly committed to backing export companies with various systems. Nevertheless, there are still fewer cases of realized business model based on big-data analyses. In this situation, this paper aims to develop a new business model which can be applied to an ex-ante prediction for the likelihood of the insurance accident of credit guarantee. We utilize internal data from KSURE which supports export companies in Korea and apply machine learning models. Then, we conduct performance comparison among the predictive models including Logistic Regression, Random Forest, XGBoost, LightGBM, and DNN(Deep Neural Network). For decades, many researchers have tried to find better models which can help to predict bankruptcy since the ex-ante prediction is crucial for corporate managers, investors, creditors, and other stakeholders. The development of the prediction for financial distress or bankruptcy was originated from Smith(1930), Fitzpatrick(1932), or Merwin(1942). One of the most famous models is the Altman's Z-score model(Altman, 1968) which was based on the multiple discriminant analysis. This model is widely used in both research and practice by this time. The author suggests the score model that utilizes five key financial ratios to predict the probability of bankruptcy in the next two years. Ohlson(1980) introduces logit model to complement some limitations of previous models. Furthermore, Elmer and Borowski(1988) develop and examine a rule-based, automated system which conducts the financial analysis of savings and loans. Since the 1980s, researchers in Korea have started to examine analyses on the prediction of financial distress or bankruptcy. Kim(1987) analyzes financial ratios and develops the prediction model. Also, Han et al.(1995, 1996, 1997, 2003, 2005, 2006) construct the prediction model using various techniques including artificial neural network. Yang(1996) introduces multiple discriminant analysis and logit model. Besides, Kim and Kim(2001) utilize artificial neural network techniques for ex-ante prediction of insolvent enterprises. After that, many scholars have been trying to predict financial distress or bankruptcy more precisely based on diverse models such as Random Forest or SVM. One major distinction of our research from the previous research is that we focus on examining the predicted probability of default for each sample case, not only on investigating the classification accuracy of each model for the entire sample. Most predictive models in this paper show that the level of the accuracy of classification is about 70% based on the entire sample. To be specific, LightGBM model shows the highest accuracy of 71.1% and Logit model indicates the lowest accuracy of 69%. However, we confirm that there are open to multiple interpretations. In the context of the business, we have to put more emphasis on efforts to minimize type 2 error which causes more harmful operating losses for the guaranty company. Thus, we also compare the classification accuracy by splitting predicted probability of the default into ten equal intervals. When we examine the classification accuracy for each interval, Logit model has the highest accuracy of 100% for 0~10% of the predicted probability of the default, however, Logit model has a relatively lower accuracy of 61.5% for 90~100% of the predicted probability of the default. On the other hand, Random Forest, XGBoost, LightGBM, and DNN indicate more desirable results since they indicate a higher level of accuracy for both 0~10% and 90~100% of the predicted probability of the default but have a lower level of accuracy around 50% of the predicted probability of the default. When it comes to the distribution of samples for each predicted probability of the default, both LightGBM and XGBoost models have a relatively large number of samples for both 0~10% and 90~100% of the predicted probability of the default. Although Random Forest model has an advantage with regard to the perspective of classification accuracy with small number of cases, LightGBM or XGBoost could become a more desirable model since they classify large number of cases into the two extreme intervals of the predicted probability of the default, even allowing for their relatively low classification accuracy. Considering the importance of type 2 error and total prediction accuracy, XGBoost and DNN show superior performance. Next, Random Forest and LightGBM show good results, but logistic regression shows the worst performance. However, each predictive model has a comparative advantage in terms of various evaluation standards. For instance, Random Forest model shows almost 100% accuracy for samples which are expected to have a high level of the probability of default. Collectively, we can construct more comprehensive ensemble models which contain multiple classification machine learning models and conduct majority voting for maximizing its overall performance.

Corporate Default Prediction Model Using Deep Learning Time Series Algorithm, RNN and LSTM (딥러닝 시계열 알고리즘 적용한 기업부도예측모형 유용성 검증)

  • Cha, Sungjae;Kang, Jungseok
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.1-32
    • /
    • 2018
  • In addition to stakeholders including managers, employees, creditors, and investors of bankrupt companies, corporate defaults have a ripple effect on the local and national economy. Before the Asian financial crisis, the Korean government only analyzed SMEs and tried to improve the forecasting power of a default prediction model, rather than developing various corporate default models. As a result, even large corporations called 'chaebol enterprises' become bankrupt. Even after that, the analysis of past corporate defaults has been focused on specific variables, and when the government restructured immediately after the global financial crisis, they only focused on certain main variables such as 'debt ratio'. A multifaceted study of corporate default prediction models is essential to ensure diverse interests, to avoid situations like the 'Lehman Brothers Case' of the global financial crisis, to avoid total collapse in a single moment. The key variables used in corporate defaults vary over time. This is confirmed by Beaver (1967, 1968) and Altman's (1968) analysis that Deakins'(1972) study shows that the major factors affecting corporate failure have changed. In Grice's (2001) study, the importance of predictive variables was also found through Zmijewski's (1984) and Ohlson's (1980) models. However, the studies that have been carried out in the past use static models. Most of them do not consider the changes that occur in the course of time. Therefore, in order to construct consistent prediction models, it is necessary to compensate the time-dependent bias by means of a time series analysis algorithm reflecting dynamic change. Based on the global financial crisis, which has had a significant impact on Korea, this study is conducted using 10 years of annual corporate data from 2000 to 2009. Data are divided into training data, validation data, and test data respectively, and are divided into 7, 2, and 1 years respectively. In order to construct a consistent bankruptcy model in the flow of time change, we first train a time series deep learning algorithm model using the data before the financial crisis (2000~2006). The parameter tuning of the existing model and the deep learning time series algorithm is conducted with validation data including the financial crisis period (2007~2008). As a result, we construct a model that shows similar pattern to the results of the learning data and shows excellent prediction power. After that, each bankruptcy prediction model is restructured by integrating the learning data and validation data again (2000 ~ 2008), applying the optimal parameters as in the previous validation. Finally, each corporate default prediction model is evaluated and compared using test data (2009) based on the trained models over nine years. Then, the usefulness of the corporate default prediction model based on the deep learning time series algorithm is proved. In addition, by adding the Lasso regression analysis to the existing methods (multiple discriminant analysis, logit model) which select the variables, it is proved that the deep learning time series algorithm model based on the three bundles of variables is useful for robust corporate default prediction. The definition of bankruptcy used is the same as that of Lee (2015). Independent variables include financial information such as financial ratios used in previous studies. Multivariate discriminant analysis, logit model, and Lasso regression model are used to select the optimal variable group. The influence of the Multivariate discriminant analysis model proposed by Altman (1968), the Logit model proposed by Ohlson (1980), the non-time series machine learning algorithms, and the deep learning time series algorithms are compared. In the case of corporate data, there are limitations of 'nonlinear variables', 'multi-collinearity' of variables, and 'lack of data'. While the logit model is nonlinear, the Lasso regression model solves the multi-collinearity problem, and the deep learning time series algorithm using the variable data generation method complements the lack of data. Big Data Technology, a leading technology in the future, is moving from simple human analysis, to automated AI analysis, and finally towards future intertwined AI applications. Although the study of the corporate default prediction model using the time series algorithm is still in its early stages, deep learning algorithm is much faster than regression analysis at corporate default prediction modeling. Also, it is more effective on prediction power. Through the Fourth Industrial Revolution, the current government and other overseas governments are working hard to integrate the system in everyday life of their nation and society. Yet the field of deep learning time series research for the financial industry is still insufficient. This is an initial study on deep learning time series algorithm analysis of corporate defaults. Therefore it is hoped that it will be used as a comparative analysis data for non-specialists who start a study combining financial data and deep learning time series algorithm.

Option-type Default Forecasting Model of a Firm Incorporating Debt Structure, and Credit Risk (기업의 부채구조를 고려한 옵션형 기업부도예측모형과 신용리스크)

  • Won, Chae-Hwan;Choi, Jae-Gon
    • The Korean Journal of Financial Management
    • /
    • v.23 no.2
    • /
    • pp.209-237
    • /
    • 2006
  • Since previous default forecasting models for the firms evaluate the probability of default based upon the accounting data from book values, they cannot reflect the changes in markets sensitively and they seem to lack theoretical background. The market-information based models, however, not only make use of market data for the default prediction, but also have strong theoretical background like Black-Scholes (1973) option theory. So, many firms recently use such market based model as KMV to forecast their default probabilities and to manage their credit risks. Korean firms also widely use the KMV model in which default point is defined by liquid debt plus 50% of fixed debt. Since the debt structures between Korean and American firms are significantly different, Korean firms should carefully use KMV model. In this study, we empirically investigate the importance of debt structure. In particular, we find the following facts: First, in Korea, fixed debts are more important than liquid debts in accurate prediction of default. Second, the percentage of fixed debt must be less than 20% when default point is calculated for Korean firms, which is different from the KMV. These facts give Korean firms some valuable implication about default forecasting and management of credit risk.

  • PDF

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

A Systematic Analysis on Default Risk Based on Delinquency Probability

  • Kim, Gyoung Sun;Shin, Seung Woo
    • Korea Real Estate Review
    • /
    • v.28 no.3
    • /
    • pp.21-35
    • /
    • 2018
  • The recent performance of residential mortgages demonstrated how default risk operated separately from prepayment risk. In this study, we investigated the determinants of the borrowers' decisions pertaining to early termination through default from the mortgage performance data released by Freddie Mac, involving securitized mortgage loans from January 2011 to September 2013. We estimated a Cox-type, proportional hazard model with a single risk on fundamental factors associated with default options for individual mortgages. We proposed a mortgage default model that included two specifications of delinquency: one using a delinquency binary variable, while the other using a delinquency probability. We also compared the results obtained from two specifications with respect to goodness-of-fit proposed in the spirit of Vuong (1989) in both overlapping and nested models' cases. We found that a model with our proposed delinquency probability variable showed a statistically significant advantage compared to a benchmark model with delinquency dummy variables. We performed a default prediction power test based on the method proposed in Shumway (2001), and found a much stronger performance from the proposed model.

Explainable Credit Default Prediction Using SHAP (SHAP을 이용한 설명 가능한 신용카드 연체 예측)

  • Minjoong Kim;Seungwoo Kim;Jihoon Moon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.39-40
    • /
    • 2024
  • 본 연구는 SHAP(SHapley Additive exPlanations)을 활용하여 신용카드 사용자의 연체 가능성을 예측하는 기계학습 모델의 해석 가능성을 강화하는 방법을 제안한다. 대규모 신용카드 데이터를 분석하여, 고객의 나이, 성별, 결혼 상태, 결제 이력 등이 연체 발생에 미치는 영향을 명확히 하는 것을 목표로 한다. 본 연구를 토대로 금융기관은 더 정확한 위험 관리를 수행하고, 고객에게 맞춤형 서비스를 제공할 수 있는 기반을 마련할 수 있다.

  • PDF

User Simility Measurement Using Entropy and Default Voting Prediction in Collaborative Filtering (엔트로피와 Default Voting을 이용한 협력적 필터링에서의 사용자 유사도 측정)

  • 조선호;김진수;이정현
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10b
    • /
    • pp.115-117
    • /
    • 2001
  • 기존의 인터넷 웹사이트에서는 사용자의 만족을 극대화시키기 위하여 사용자별로 개인화 된 서비스를 제공하는 협력적 필터링 방식을 적용하고 있다. 협력적 필터링 기술은 사용자의 취향에 맞는 아이템을 예측하여 추천하며, 비슷한 선호도를 가진 다른 사용자들과의 상관관계를 구하기 위하여 일반적으로 피어슨 상관계수를 많이 이용한다. 그러나, 피어슨 상관계수를 이용한 방법은 사용자가 평가를 한 아이템이 있을 때에만 상관관계를 구할 수 있다는 단점과 예측의 정확성이 떨어진다는 단점을 가지고 있다. 따라서, 본 논문에서는 피어슨 상관관계 기반 예측 기법을 보완하여 보다 정확한 사용자 유사도를 구하는 방법을 제안한다. 제안된 방법에서는 사용자들을 대상으로 사용자가 평가를 한 아이템의 선호도를 사용해서 엔트로피를 적용하였고, 사용자가 선호도를 표시하지 않은 상품에 대해서는 Default Voting 방법을 이용하여 보다 정확한 헙력적 필터링 방식을 구현하였다.

  • PDF

A Study on Default Prediction Model: Focusing on The Imbalance Problem of Default Data (부도 예측 모형 연구: 부도 데이터의 불균형 문제를 중심으로)

  • Jinsoo Park;Kangbae Lee;Yongbok Cho
    • Information Systems Review
    • /
    • v.26 no.2
    • /
    • pp.169-183
    • /
    • 2024
  • This study summarizes improvement strategies for addressing the imbalance problem in observed default data that must be considered when constructing a default model and compares and analyzes the performance improvement effects using data resampling techniques and default threshold adjustments. Empirical analysis results indicate that as the level of imbalance resolution in the data increases, and as the default threshold of the model decreases, the recall of the model improves. Conversely, it was found that as the level of imbalance resolution in the data decreases, and as the default threshold of the model increases, the precision of the model improves. Additionally, focusing solely on either recall or precision when addressing the imbalance problem results in a phenomenon where the other performance evaluation metrics decrease significantly due to the trade-off relationship. This study differs from most previous research by focusing on the relationship between improvement strategies for the imbalance problem of default data and the enhancement of default model performance. Moreover, it is confirmed that to enhance the practical usability of the default model, different improvement strategies for the imbalance problem should be applied depending on the main purpose of the model, and there is a need to utilize the Fβ Score as a performance evaluation metric.

Optimization of SWAN Wave Model to Improve the Accuracy of Winter Storm Wave Prediction in the East Sea

  • Son, Bongkyo;Do, Kideok
    • Journal of Ocean Engineering and Technology
    • /
    • v.35 no.4
    • /
    • pp.273-286
    • /
    • 2021
  • In recent years, as human casualties and property damage caused by hazardous waves have increased in the East Sea, precise wave prediction skills have become necessary. In this study, the Simulating WAves Nearshore (SWAN) third-generation numerical wave model was calibrated and optimized to enhance the accuracy of winter storm wave prediction in the East Sea. We used Source Term 6 (ST6) and physical observations from a large-scale experiment conducted in Australia and compared its results to Komen's formula, a default in SWAN. As input wind data, we used Korean Meteorological Agency's (KMA's) operational meteorological model called Regional Data Assimilation and Prediction System (RDAPS), the European Centre for Medium Range Weather Forecasts' newest 5th generation re-analysis data (ERA5), and Japanese Meteorological Agency's (JMA's) meso-scale forecasting data. We analyzed the accuracy of each model's results by comparing them to observation data. For quantitative analysis and assessment, the observed wave data for 6 locations from KMA and Korea Hydrographic and Oceanographic Agency (KHOA) were used, and statistical analysis was conducted to assess model accuracy. As a result, ST6 models had a smaller root mean square error and higher correlation coefficient than the default model in significant wave height prediction. However, for peak wave period simulation, the results were incoherent among each model and location. In simulations with different wind data, the simulation using ERA5 for input wind datashowed the most accurate results overall but underestimated the wave height in predicting high wave events compared to the simulation using RDAPS and JMA meso-scale model. In addition, it showed that the spatial resolution of wind plays a more significant role in predicting high wave events. Nevertheless, the numerical model optimized in this study highlighted some limitations in predicting high waves that rise rapidly in time caused by meteorological events. This suggests that further research is necessary to enhance the accuracy of wave prediction in various climate conditions, such as extreme weather.

A Video Sequence Coding Using Dynamic Selection of Unrestricted Motion Vector Mode in H.263 (H.263의 비제한 움직임 벡터 모드의 동적 선택을 이용한 영상 부호화)

  • 박성한;박성태
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.8
    • /
    • pp.1075-1088
    • /
    • 2001
  • In this paper, we propose a method for dynamic selection of unrestricted motion vector(UMV) or default prediction mode(DPM) in H.263 bit stream. For this, we use the error of compensated image and the magnitude of motion vector. In the proposed strategy, the UMV mode is dynamically applied in a frame according to average magnitude of motion vector and error of compensated image. This scheme has improved the quality of image compared to the fixed mode UMV or DPM only. Number of searching points are greatly reduced when comparing to UMV The proposed method is more profitable to long video sequences having camera movement locally.

  • PDF