• Title/Summary/Keyword: stepwise variable selection

Search Result 53, Processing Time 0.019 seconds

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.

Association between Type D Personality and the Somatic Symptom Complaints in Depressive Patients (우울증 환자에서 D형 인격과 신체 증상 호소와의 관련성)

  • Park, Wu-Ri;Jeong, Seong-Hoon
    • Korean Journal of Psychosomatic Medicine
    • /
    • v.21 no.1
    • /
    • pp.18-26
    • /
    • 2013
  • Objectives : Type D personality was originally introduced to study the role of personality in predicting outcomes of heart disease. However, researches showed that other medical conditions are also affected by this personality. The purpose of this study was to evaluate the relationship between type D personality and somatic symptom complaints in depressive patients. Methods : Eighty-two individuals diagnosed with depressive disorder were included. Type D personality was measured with DS14. Patient Health Questionnaire(PHQ) 9 and 15 were used to measure depression severity and somatization tendencies. For alexithymia, TAS-20 was used. Student T-test and linear regression analysis were performed. The best regression model was determined by stepwise variable selection. Results : More than half of the subjects(56%) complained at least medium degree somatic symptoms according to PHQ-15 criteria. Two-thirds of the subjects were classified as Type D personality(63.4%). The mean PHQ-15 score of the Type D individuals was significantly higher than the remaining subjects(PHQ-15 mean=12.7, $p=8.2{\times}10^{-7}$). The best regression model included age, PHQ-9 score and NA subscale score as predictor variables. Among these, only the coefficients of age($p=1.5{\times}10^{-3}$) and NA score($p=1.5{\times}10^{-7}$) were found to be statistically significant. Conclusions : The result showed that Type D personality was one of the strong predictors of somatic complaints among depressive individuals. The finding that negative affectivity rather than social inhibition was more closely associated with somatization tendencies does not fully agree with the traditional explanation that inability to express negative emotion predispose the individuals to somatic symptoms. The finding that alexithymia was not shown to be a significant predictors also substantiated this discrepancy. However, it might be possible that the high correlation between NA and SI subscore(r=0.65) and between NA and TAS-20 score(r=0.44) hid the additional effects of social inhibition and alexithymia. Further research with a larger sample would be needed to investigate the effects of the latter two components over and above the effect of negative affectivity on the somatic complaints in depressive patients.

  • PDF

Factors Associated with Care Burden among Family Caregivers of Terminally Ill Cancer Patients (말기암환자 가족 간병인의 간병 부담과 관련된 요인)

  • Lee, Jee Hye;Park, Hyun Kyung;Hwang, In Cheol;Kim, Hyo Min;Koh, Su-Jin;Kim, Young Sung;Lee, Yong Joo;Choi, Youn Seon;Hwang, Sun Wook;Ahn, Hong Yup
    • Journal of Hospice and Palliative Care
    • /
    • v.19 no.1
    • /
    • pp.61-69
    • /
    • 2016
  • Purpose: It is important to alleviate care burden for terminal cancer patients and their families. This study investigated the factors associated with care burden among family caregivers (FCs) of terminally ill cancer patients. Methods: We analyzed data from 289 FCs of terminal cancer patients who were admitted to palliative care units of seven medical centers in Korea. Care burden was assessed using the Korean version of Caregiver Reaction Assessment (CRA) scale which comprises five domains. A multivariate logistic regression model with stepwise variable selection was used to identify factors associated with care burden. Results: Diverse associating factors were identified in each CRA domain. Emotional factors had broad influence on care burden. FCs with emotional distress were more likely to experience changes to their daily routine (adjusted odds ratio (aOR), 2.54; 95% confidence interval (CI), 1.29~5.02), lack of family support (aOR, 2.27; 95% CI, 1.04~4.97) and health issues (aOR, 5.44; 2.50~11.88). Family functionality clearly reflected a lack of support, and severe family dysfunction was linked to financial issues as well. FCs without religion or comorbid conditions felt more burdened. The caregiving duration and daily caregiving hours significantly predicted FCs' lifestyle changes and physical burden. FCs who were employed, had weak social support or could not visit frequently, had a low self-esteem. Conclusion: This study indicates that it is helpful to understand FCs' emotional status and family functions to assess their care burden. Thus, efforts are needed to lessen their financial burden through social support systems.