• Title/Summary/Keyword: Credit Prediction

Search Result 82, Processing Time 0.028 seconds

The Prediction of Purchase Amount of Customers Using Support Vector Regression with Separated Learning Method (Support Vector Regression에서 분리학습을 이용한 고객의 구매액 예측모형)

  • Hong, Tae-Ho;Kim, Eun-Mi
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.213-225
    • /
    • 2010
  • Data mining has empowered the managers who are charge of the tasks in their company to present personalized and differentiated marketing programs to their customers with the rapid growth of information technology. Most studies on customer' response have focused on predicting whether they would respond or not for their marketing promotion as marketing managers have been eager to identify who would respond to their marketing promotion. So many studies utilizing data mining have tried to resolve the binary decision problems such as bankruptcy prediction, network intrusion detection, and fraud detection in credit card usages. The prediction of customer's response has been studied with similar methods mentioned above because the prediction of customer's response is a kind of dichotomous decision problem. In addition, a number of competitive data mining techniques such as neural networks, SVM(support vector machine), decision trees, logit, and genetic algorithms have been applied to the prediction of customer's response for marketing promotion. The marketing managers also have tried to classify their customers with quantitative measures such as recency, frequency, and monetary acquired from their transaction database. The measures mean that their customers came to purchase in recent or old days, how frequent in a period, and how much they spent once. Using segmented customers we proposed an approach that could enable to differentiate customers in the same rating among the segmented customers. Our approach employed support vector regression to forecast the purchase amount of customers for each customer rating. Our study used the sample that included 41,924 customers extracted from DMEF04 Data Set, who purchased at least once in the last two years. We classified customers from first rating to fifth rating based on the purchase amount after giving a marketing promotion. Here, we divided customers into first rating who has a large amount of purchase and fifth rating who are non-respondents for the promotion. Our proposed model forecasted the purchase amount of the customers in the same rating and the marketing managers could make a differentiated and personalized marketing program for each customer even though they were belong to the same rating. In addition, we proposed more efficient learning method by separating the learning samples. We employed two learning methods to compare the performance of proposed learning method with general learning method for SVRs. LMW (Learning Method using Whole data for purchasing customers) is a general learning method for forecasting the purchase amount of customers. And we proposed a method, LMS (Learning Method using Separated data for classification purchasing customers), that makes four different SVR models for each class of customers. To evaluate the performance of models, we calculated MAE (Mean Absolute Error) and MAPE (Mean Absolute Percent Error) for each model to predict the purchase amount of customers. In LMW, the overall performance was 0.670 MAPE and the best performance showed 0.327 MAPE. Generally, the performances of the proposed LMS model were analyzed as more superior compared to the performance of the LMW model. In LMS, we found that the best performance was 0.275 MAPE. The performance of LMS was higher than LMW in each class of customers. After comparing the performance of our proposed method LMS to LMW, our proposed model had more significant performance for forecasting the purchase amount of customers in each class. In addition, our approach will be useful for marketing managers when they need to customers for their promotion. Even if customers were belonging to same class, marketing managers could offer customers a differentiated and personalized marketing promotion.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Improving the Effectiveness of Customer Classification Models: A Pre-segmentation Approach (사전 세분화를 통한 고객 분류모형의 효과성 제고에 관한 연구)

  • Chang, Nam-Sik
    • Information Systems Review
    • /
    • v.7 no.2
    • /
    • pp.23-40
    • /
    • 2005
  • Discovering customers' behavioral patterns from large data set and providing them with corresponding services or products are critical components in managing a current business. However, the diversity of customer needs coupled with the limited resources suggests that companies should make more efforts on understanding and managing specific groups of customers, not the whole customers. The key issue of this paper is based on the fact that the behavioral patterns extracted from the specific groups of customers shall be different from those from the whole customers. This paper proposes the idea of pre-segmentation before developing customer classification models. We collected three customers' demographic and transactional data sets from a credit card, a tele-communication, and an insurance company in Korea, and then segmented customers by major variables. Different churn prediction models were developed from each segments and the whole data set, respectively, using the decision tree induction approach, and compared in terms of the hit ratio and the simplicity of generated rules.

A Study on the Combined Decision Tree(C4.5) and Neural Network Algorithm for Classification of Mobile Telecommunication Customer (이동통신고객 분류를 위한 의사결정나무(C4.5)와 신경망 결합 알고리즘에 관한 연구)

  • 이극노;이홍철
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.1
    • /
    • pp.139-155
    • /
    • 2003
  • This paper presents the new methodology of analyzing and classifying patterns of customers in mobile telecommunication market to enhance the performance of predicting the credit information based on the decision tree and neural network. With the application of variance selection process from decision tree, the systemic process of defining input vector's value and the rule generation were developed. In point of customer management, this research analyzes current customers and produces the patterns of them so that the company can maintain good customer relationship and makes special management on the customer who has huh potential of getting out of contract in advance. The real implementation of proposed method shows that the predicted accuracy is higher than existing methods such as decision tree(CART, C4.5), regression, neural network and combined model(CART and NN).

  • PDF

Analysis of Longevity Factor through Japan shinise (일본 시니세를 통해 본 장수요인분석)

  • Choi, Seung-Il;Kim, Dong-Il
    • Journal of Digital Convergence
    • /
    • v.13 no.1
    • /
    • pp.85-92
    • /
    • 2015
  • Recently more and more shortened of corporation's sustainable longevity at the enterprise, and enterprise environment is also changing drastically. In this study, we wanted to obtain a solution that enable to continue the company's corporate longevity through critical factor analysis of the Japan Shinise. In the analysis results of Japan corporation as Doraya, Deicoko, Shinise longevity, we can explain about critical factors like that credit, tradition, customization, products development, changes of management methods and finding a global new markets. Also, In this study, analyzed a significance of critical factors for focusing on each cases and theories using Analytic Hierarchy Process(AHP), and did prediction and analysis about critical longevity factors. However, it need to collect sufficient data and a lots of variables for input. because the sample may be it insufficient. Results of the study, will be expected to be a useful guide for analysis of longevity factors at the company in the future.

The Foreign Asset Leverage Effect of Oil & Gas Companies after the Financial Crisis (금융위기 이후 정유산업의 외화자산 레버리지효과 분석)

  • Dong-Gyun Kim
    • Korea Trade Review
    • /
    • v.46 no.2
    • /
    • pp.19-38
    • /
    • 2021
  • This study aims to analyze the foreign asset leverage effect on Korean oil & gas companies' foreign profits and to maintain the appropriate foreign asset volume for reducing exchange risk. For a long time, large Korean companies, including oil companies, overheld foreign currency liabilities. For this reason, most large companies have been burdened to hedge exchange risk and this excess limit holding deteriorated total profit and reduced foreign currency asset management efficiency. Our paper proceeds in presenting a three-stage analysis considering diversified exchange risk factors through estimation on transformation of foreign transactions a/c including annual trends of foreign asset and industry specifics. We also supplement incomplete the estimation method through a practical hedging case investigation. Our research parts are differentiated on the analyzing four periods considering period-specifics The FER value of the oil firms ranged from -0.3 to +2.3 over the entire period. The results of the FER Value are volatile and irregular; those results do not represent the industry standard comparative index. The Korean oil firms are over the credit limit without accurate prediction and finance high interest rate funds from foreign-owned banks on the basis on a biased relationship. Since the IMF crisis, liabilities of global firms have decreased. Above all, oil firms need to finance a minimum limit without opportunity losses on the demand forecast and prepare for uncertainty in the market. To reduce exchange risk from the over-the-limit position, we must consider factors that affect the corporate exchange risk on the entire business process, including the contract phase.

Predicting Default of Construction Companies Using Bayesian Probabilistic Approach (베이지안 확률적 접근법을 이용한 건설업체 부도 예측에 관한 연구)

  • Hong, Sungmoon;Hwang, Jaeyeon;Kwon, Taewhan;Kim, Juhyung;Kim, Jaejun
    • Korean Journal of Construction Engineering and Management
    • /
    • v.17 no.5
    • /
    • pp.13-21
    • /
    • 2016
  • Insolvency of construction companies that play the role of main contractors can lead to clients' losses due to non-fulfillment of construction contracts, and it can have negative effects on the financial soundness of construction companies and suppliers. The construction industry has the cash flow financial characteristic of receiving a project and getting payment based on the progress of the construction. As such, insolvency during project progress can lead to financial losses, which is why the prediction of construction companies is so important. The prediction of insolvency of Korean construction companies are often made through the KMV model from the KMV (Kealhofer McQuown and Vasicek) Company developed in the U.S. during the early 90s, but this model is insufficient in predicting construction companies because it was developed based on credit risk assessment of general companies and banks. In addition, the predictive performance of KMV value's insolvency probability is continuously being questioned due to lack of number of analyzed companies and data. Therefore, in order to resolve such issues, the Bayesian Probabilistic Approach is to be combined with the existing insolvency predictive probability model. This is because if the Prior Probability of Bayesian statistics can be appropriately predicted, reliable Posterior Probability can be predicted through ensured conditionality on the evidence despite the lack of data. Thus, this study is to measure the Expected Default Frequency (EDF) by utilizing the Bayesian Probabilistic Approach with the existing insolvency predictive probability model and predict the accuracy by comparing the result with the EDF of the existing model.

Determinants of IPO Failure Risk and Price Response in Kosdaq (코스닥 상장 시 실패위험 결정요인과 주가반응에 관한 연구)

  • Oh, Sung-Bae;Nam, Sam-Hyun;Yi, Hwa-Deuk
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.5 no.4
    • /
    • pp.1-34
    • /
    • 2010
  • Recently, failure rates of Kosdaq IPO firms are increasing and their survival rates tend to be very low, and when these firms do fail, often times backed by a number of governmental financial supports, they may inflict severe financial damage to investors, let alone economy as a whole. To ensure investors' confidence in Kosdaq and foster promising and healthy businesses, it is necessary to precisely assess their intrinsic values and survivability. This study investigates what contributed to the failure of IPO firms and analyzed how these elements are factored into corresponding firms' stock returns. Failure risks are assessed at the time of IPO. This paper considers factors reflecting IPO characteristics, a firm's underwriter prestige, auditor's quality, IPO offer price, firm's age, and IPO proceeds. The study further went on to examine how, if at all, these failure risks involved during IPO led to post-IPO stock prices. Sample firms used in this study include 98 Kosdaq firms that have failed and 569 healthy firms that are classified into the same business categories, and Logit models are used in estimate the probability of failure. Empirical results indicate that auditor's quality, IPO offer price, firm's age, and IPO proceeds shown significant relevance to failure risks at the time of IPO. Of other variables, firm's size and ROA, previously deemed significantly related to failure risks, in fact do not show significant relevance to those risks, whereas financial leverage does. This illustrates the efficacy of a model that appropriately reflects the attributes of IPO firms. Also, even though R&D expenditures were believed to be value relevant by previous studies, this study reveals that R&D is not a significant factor related to failure risks. In examing the relation between failure risks and stock prices, this study finds that failure risks are negatively related to 1 or 2 year size-adjusted abnormal returns after IPO. The results of this study may provide useful knowledge for government regulatory officials in contemplating pertinent policy and for credit analysts in their proper evaluation of a firm's credit standing.

  • PDF

Statistical Analysis of Extreme Values of Financial Ratios (재무비율의 극단치에 대한 통계적 분석)

  • Joo, Jihwan
    • Knowledge Management Research
    • /
    • v.22 no.2
    • /
    • pp.247-268
    • /
    • 2021
  • Investors mainly use PER and PBR among financial ratios for valuation and investment decision-making. I conduct an analysis of two basic financial ratios from a statistical perspective. Financial ratios contain key accounting numbers which reflect firm fundamentals and are useful for valuation or risk analysis such as enterprise credit evaluation and default prediction. The distribution of financial data tends to be extremely heavy-tailed, and PER and PBR show exceedingly high level of kurtosis and their extreme cases often contain significant information on financial risk. In this respect, Extreme Value Theory is required to fit its right tail more precisely. I introduce not only GPD but exGPD. GPD is conventionally preferred model in Extreme Value Theory and exGPD is log-transformed distribution of GPD. exGPD has recently proposed as an alternative of GPD(Lee and Kim, 2019). First, I conduct a simulation for comparing performances of the two distributions using the goodness of fit measures and the estimation of 90-99% percentiles. I also conduct an empirical analysis of Information Technology firms in Korea. Finally, exGPD shows better performance especially for PBR, suggesting that exGPD could be an alternative for GPD for the analysis of financial ratios.

Stand Yield Table and Commercial Timber Volume of Eucalyptus Pellita and Acacia Mangium Plantations in Indonesia (인도네시아 유칼립투스 및 아카시아 조림지의 임분수확표 및 이용가능 목재생산량 추정)

  • Son, Yeong-Mo;Kim, Hoon;Lee, Ho-Young;Kim, Cheol-Min;Kim, Cheol-Sang;Kim, Jae-Weon;Joo, Rin-Won;Lee, Kyeong-Hak
    • Journal of Korean Society of Forest Science
    • /
    • v.99 no.1
    • /
    • pp.9-15
    • /
    • 2010
  • This study was conducted to develop a stand growth model and a stand yield table for Eucalyptus pellita and Acacia mangium plantations in Kalimantan, Indonesia. To develop a stand growth model, Weibull robability density function, a diameter class model, was applied in this study. In the development of stand growth model by site index and stand age, a hierarchy is generally required - estimation, recovery and prediction of the diameter class model. A number of grow equations were also involved in each process to estimate diameter, height, basal area, minimum or maximum diameter. To examine whether the grow equations are adequate for Eucalyptus pellita or Acacia mangium plantations, a fitness index was analyzed for each equation. The results showed that fitness indices were ranged from 65 to 89% for Eucalyptus pellita plantations and from 72 to 95% for Acacia mangium plantations. As being highly adequate for the plantations, a stand yield table was developed based on the resulted growth model, and applied to estimate the stand growth with midium site index for 10-year period. The highest annual stand growth of Eucalyptus pellita plantations was estimated to be 21.25 $m^3$/ha, while that of Acacia mangium plantations was 27.5 $m^3$/ha. In terms of annual stand growth, Acacia mangium plantations appeared to be more beneficial than Eucalyptus pellita plantations. Also, to estimate commercial timber volume available from the plantations, an assumption that a log would be cut by 2.7 m in length and the rest of the log would be cut by 1.5m was involved. The commercial timber volume available from Eucalyptus pellita plantations was 68.0 $m^3$/ha, 33% from the total stand volume, 203.2 $m^3$/ha. Also 96.7 $m^3$/ha of commercial timbers were available from Acacia mangium plantations, which was 42% from the 232.9 $m^3$/ha in total. Presenting a good information about the stand growth in Eucalyptus pellita and Acacia mangium plantations, this study might be useful for whom proceeds or considers an abroad plantation for merchantable timber production or carbon credit in tropical regions.