• Title/Summary/Keyword: Data-driven based Method

Search Result 294, Processing Time 0.019 seconds

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.

Dietary total sugar intake of Koreans: Based on the Korea National Health and Nutrition Examination Survey (KNHANES), 2008-2011 (한국인의 총 당류 섭취실태 평가: 2008~2011년 국민건강영양조사 자료를 이용하여)

  • Lee, Haeng-Shin;Kwon, Sung-Ok;Yon, Miyong;Kim, Dohee;Lee, Jee-Yeon;Nam, Jiwoon;Park, Seung-Joo;Yeon, Jee-Young;Lee, Soon-Kyu;Lee, Hye-Young;Kwon, Oh-Sang;Kim, Cho-Il
    • Journal of Nutrition and Health
    • /
    • v.47 no.4
    • /
    • pp.268-276
    • /
    • 2014
  • Purpose: The aim of this study is to estimate total sugar intake and identify major food sources of total sugar intake in the diet of the Korean population. Methods: Dietary intake data of 33,745 subjects aged one year and over from the KNHANES 2008-2011 were used in the analysis. Information on dietary intake was obtained by one day 24-hour recall method in KNHANES. A database for total sugar content of foods reported in the KNHANES was established using Release 25 of the U.S. Department of Agriculture National Nutrient Database for Standard Reference, a total sugar database from the Ministry of Food and Drug Safety, and information from nutrition labeling of processed foods. With this database, total sugar intake of each subject was estimated from dietary intake data using SAS. Results: Mean total sugar intake of Koreans was 61.4 g/person/day, corresponding to 12.8% of total daily energy intake. More than half of this amount (35.0 g/day, 7.1% of daily energy intake) was from processed foods. The top five processed food sources of total sugar intake for Koreans were granulated sugar, carbonated beverages, coffee, breads, and fruit and vegetable drinks. Compared to other age groups, total sugar intake of adolescents and young adults was much higher (12 to 18 yrs, 69.6 g/day and 19 to 29 yrs, 68.4 g/day) with higher beverage intake that beverage-driven sugar amounted up to 25% of total sugar intake. Conclusion: This study revealed that more elaborated and customized measures are needed for control of sugar intake of different subpopulation groups, even though current total sugar intake of Koreans was within the range (10-20% of daily energy intake) recommended by Dietary Reference Intakes for Koreans. In addition, development of a more reliable database on total sugar and added sugar content of foods commonly consumed by Koreans is warranted.

Water Balance Projection Using Climate Change Scenarios in the Korean Peninsula (기후변화 시나리오를 활용한 미래 한반도 물수급 전망)

  • Kim, Cho-Rong;Kim, Young-Oh;Seo, Seung Beom;Choi, Su-Woong
    • Journal of Korea Water Resources Association
    • /
    • v.46 no.8
    • /
    • pp.807-819
    • /
    • 2013
  • This study proposes a new methodology for future water balance projection considering climate change by assigning a weight to each scenario instead of inputting future streamflows based on GCMs into a water balance model directly. K-nearest neighbor algorithm was employed to assign weights and streamflows in non-flood period (October to the following June) was selected as the criterion for assigning weights. GCM-driven precipitation was input to TANK model to simulate future streamflow scenarios and Quantile Mapping was applied to correct bias between GCM hindcast and historical data. Based on these bias-corrected streamflows, different weights were assigned to each streamflow scenarios to calculate water shortage for the projection periods; 2020s (2010~2039), 2050s (2040~2069), and 2080s (2070~2099). As a result by applying the proposed methodology to project water shortage over the Korean Peninsula, average water shortage for 2020s is projected to increase to 10~32% comparing to the basis (1967~2003). In addition, according to getting decreased in streamflows in non-flood period gradually by 2080s, average water shortage for 2080s is projected to increase up to 97% (516.5 million $m^3/yr$) as maximum comparing to the basis. While the existing research on climate change gives radical increase in future water shortage, the results projected by the weighting method shows conservative change. This study has significance in the applicability of water balance projection regarding climate change, keeping the existing framework of national water resources planning and this lessens the confusion for decision-makers in water sectors.

A Thermal Time-Driven Dormancy Index as a Complementary Criterion for Grape Vine Freeze Risk Evaluation (포도 동해위험 판정기준으로서 온도시간 기반의 휴면심도 이용)

  • Kwon, Eun-Young;Jung, Jea-Eun;Chung, U-Ran;Lee, Seung-Jong;Song, Gi-Cheol;Choi, Dong-Geun;Yun, Jin-I.
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.8 no.1
    • /
    • pp.1-9
    • /
    • 2006
  • Regardless of the recent observed warmer winters in Korea, more freeze injuries and associated economic losses are reported in fruit industry than ever before. Existing freeze-frost forecasting systems employ only daily minimum temperature for judging the potential damage on dormant flowering buds but cannot accommodate potential biological responses such as short-term acclimation of plants to severe weather episodes as well as annual variation in climate. We introduce 'dormancy depth', in addition to daily minimum temperature, as a complementary criterion for judging the potential damage of freezing temperatures on dormant flowering buds of grape vines. Dormancy depth can be estimated by a phonology model driven by daily maximum and minimum temperature and is expected to make a reasonable proxy for physiological tolerance of buds to low temperature. Dormancy depth at a selected site was estimated for a climatological normal year by this model, and we found a close similarity in time course change pattern between the estimated dormancy depth and the known cold tolerance of fruit trees. Inter-annual and spatial variation in dormancy depth were identified by this method, showing the feasibility of using dormancy depth as a proxy indicator for tolerance to low temperature during the winter season. The model was applied to 10 vineyards which were recently damaged by a cold spell, and a temperature-dormancy depth-freeze injury relationship was formulated into an exponential-saturation model which can be used for judging freeze risk under a given set of temperature and dormancy depth. Based on this model and the expected lowest temperature with a 10-year recurrence interval, a freeze risk probability map was produced for Hwaseong County, Korea. The results seemed to explain why the vineyards in the warmer part of Hwaseong County have been hit by more freeBe damage than those in the cooler part of the county. A dormancy depth-minimum temperature dual engine freeze warning system was designed for vineyards in major production counties in Korea by combining the site-specific dormancy depth and minimum temperature forecasts with the freeze risk model. In this system, daily accumulation of thermal time since last fall leads to the dormancy state (depth) for today. The regional minimum temperature forecast for tomorrow by the Korea Meteorological Administration is converted to the site specific forecast at a 30m resolution. These data are input to the freeze risk model and the percent damage probability is calculated for each grid cell and mapped for the entire county. Similar approaches may be used to develop freeze warning systems for other deciduous fruit trees.