• Title/Summary/Keyword: Selections

Search Result 393, Processing Time 0.019 seconds

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.

Studies on the Inheritance of Heading Date in Wheat(Triticum aestivum L. em Thell) (소맥(Triticum aestivum L. em Thell)의 출수기 유전에 관한 연구)

  • Chang-Hwan Cho
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.15
    • /
    • pp.1-31
    • /
    • 1974
  • Introducing genes for earliness of wheat varieties is important to develop early varieties in winter wheat. In oder to obtain basic informations on the response of heading to the different day length and temperature treatments and on the inheritance of heading dates, experiments were conducted at the field and greenhouse of the Crop Experiment Station, Suwon. Varieties used in this experiments were, early variety Yecora F70, medium varieties Suke #169, Parker and Yukseung #3, and late varieties Changkwang, Bezostaia, Sturdy and Blueboy. The parents and F$_1$s of partial diallel crosses of above eight varieties were subjected the following four different treatments; 1. high temperature and long day, 2. high temperature and short day, 3. low temperature and long day, and 4. low temperature and short day. The same materials were grown also in field condition. Parents, F$_1$ and F$_2$ generation were grown also in both greenhouse under high temperature and short day and in field. The results obtained were summarized as follow: 1. No effects of temperature and daylength on the number of leaves on the main stem were found when -varieties were vernalized. The number of main stem leaves were fewer for spring type of varieties than for winter type of varieties. 2. The effects of temperature and daylength on the days to flag leaf opening were dependent on the speed of leaf emergence. The speed of leaf emergence were faster for lower leaves than for upper leaves. 3. The response to short day and long day (earliness of narrow sense) of varieties were found to be direct factor responsible to physiology of heading dates in vernalized varieties. Great difference of varieties to heading date was found in high temperature and short day treatment, but less differences were found in high temperature and long day, low temperature and long day and low temperature and short day treatments respectively. The least varietal difference for heading dates was found in the field condition. 4. Changkwang and Parker were found to be the most sensitive to short day treatment (photosensitive) and the heading of these varieties were delayed by short day treatment. No great varietal differences were found among other varieties. 5. Varietal differences of heading dates due to daylength were greater in high temperature than in low temperature. 6. Varietal differences of heading dates due to temperature were not great. but in general the heading dates of varieties were faster under high temperature than under low temperature. 7. Earliness of heading dates was due to partial dominance effect of genes involved in any condition. The degree of dominance was greater under short day than under long day treatment. 8. The varietal differences of heading date under high temperature and long day were due to earliness or narrow sense (response to long day) of varieties. The degree of dominance was greater for Yecora F70, spring type than for other winter type of varieties. No differences or less differences of degree of dominance was found among winter type of varieties. The estimated number of effective factor concerned in the earliness of narrow sense was one pair of allele with minor genes. 9. The insensitivity of varieties to short day treatment in heading dates was due to single dominant gene effect. Under the low temperature the sensitivity of varieties to short day treatment was less apparent. 10. The earliness of short day and long day (earliness of narrow sense) sensitivities of varieties appearea to be due to partial dominance of earliness over lateness. In strict sense, the degree of the dominance should be distinguished. 11. Dominant gene effects were found for the thermo-sensitivity of varieties, and the effect was less, significant than the earliness in narrow sense. 12. One pair of allele, ee and EE, for photosensitivity was responsible for the difference in the heading dates between Changkwang and Suke #169. Two pairs of alleles, ee, enen and EE, EnEn. appeared to be responsible for the difference between Changkwang and Yecora F70. The effects of EE and EnEn were, additive to the earliness and the effects of EE were greater than EnEn under short day. However, the effects of EE were not evident in long day but the effects of EnEn were observed in long day. 13. Two pairs of dominant alleles for the earliness were estimated from the analysis of F$_1$ diallels in the field but the effects of these alleles in F$_2$ were not apparent due to low temperature and short day treatment in early part of growth and high temperature and long day treatment in later part of growth. The F$_2$ population shows continuous variation due to environmental effects and due to other minor gene effects. 14. The heritabilities for heading dates were ranged from 0.51 to 0.72, indicating that the selection in early generation might be effective. The extent of heritability for heading dates varied with environments; higher magnitude of heritability was obtained in short day treatment and high temperature compared with long day and low temperature treatments. The heritabilities of heading date due to response to short day were 0.86 in high temperature and 0.76 in low temperature. The heritabilities of heading date due to temperature were not significantly high. 15. The correlation coefficients of heading dates to the number of grains per spike, weight of 1, 000 grains. and grain yield were positive and high, indicating the difficulties of selections of high yielding lines from early population. But no significant correlation coefficient was obtained between the earliness and the number of spikes, indicating the effective selection for high tillering from early varieties for high yielding.

  • PDF

Radioimmunoassay Reagent Survey and Evaluation (검사별 radioimmunoassay시약 조사 및 비교실험)

  • Kim, Ji-Na;An, Jae-seok;Jeon, Young-woo;Yoon, Sang-hyuk;Kim, Yoon-cheol
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.25 no.1
    • /
    • pp.34-40
    • /
    • 2021
  • Purpose If a new test is introduced or reagents are changed in the laboratory of a medical institution, the characteristics of the test should be analyzed according to the procedure and the assessment of reagents should be made. However, several necessary conditions must be met to perform all required comparative evaluations, first enough samples should be prepared for each test, and secondly, various reagents applicable to the comparative evaluations must be supplied. Even if enough comparative evaluations have been done, there is a limit to the fact that the data variation for the new reagent represents the overall patient data variation, The fact puts a burden on the laboratory to the change the reagent. Due to these various difficulties, reagent changes in the laboratory are limited. In order to introduce a competitive bid, the institute conducted a full investigation of Radioimmunoassay(RIA) reagents for each test and established the range of reagents available in the laboratory through comparative evaluations. We wanted to share this process. Materials and Methods There are 20 items of tests conducted in our laboratory except for consignment tests. For each test, RIA reagents that can be used were fully investigated with the reference to external quality control report. and the manuals for each reagent were obtained. Each reagent was checked for the manual to check the test method, Incubation time, sample volume needed for the test. After that, the primary selection was made according to whether it was available in this laboratory. The primary selected reagents were supplied with 2kits based on 100tests, and the data correlation test, sensitivity measurement, recovery rate measurement, and dilution test were conducted. The secondary selection was performed according to the results of the comparative evaluation. The reagents that passed the primary and secondary selections were submitted to the competitive bidding list. In the case of reagent is designated as a singular, we submitted a explanatory statement with the data obtained during the primary and secondary selection processes. Results Excluded from the primary selection was the case where TAT was expected to be delayed at the moment, and it was impossible to apply to our equipment due to the large volume of reagents used during the test. In the primary selection, there were five items which only one reagent was available.(squamous cell carcinoma Ag(SCC Ag), β-human chorionic gonadotropin(β-HCG), vitamin B12, folate, free testosterone), two reagents were available(CA19-9, CA125, CA72-4, ferritin, thyroglobulin antibody(TG Ab), microsomal antibody(Mic Ab), thyroid stimulating hormone-receptor-antibody(TSH-R-Ab), calcitonin), three reagents were available (triiodothyronine(T3), Tree T3, Free T4, TSH, intact parathyroid hormone(intact PTH)) and four reagents were available are carcinoembryonic antigen(CEA), TG. In the secondary selection, there were eight items which only one reagent was available.(ferritin, TG, CA19-9, SCC, β-HCG, vitaminB12, folate, free testosterone), two reagents were available(TG Ab, Mic Ab, TSH-R-Ab, CA125, CA72-4, intact PTH, calcitonin), three reagents were available(T3, Tree T3, Free T4, TSH, CEA). Reasons excluded from the secondary selection were the lack of reagent supply for comparative evaluations, the problems with data reproducibility, and the inability to accept data variations. The most problematic part of comparative evaluations was sample collection. It didn't matter if the number of samples requested was large and the capacity needed for the test was small. It was difficult to collect various concentration samples in the case of a small number of tests(100 cases per month or less), and it was difficult to conduct a recovery rate test in the case of a relatively large volume of samples required for a single test(more than 100 uL). In addition, the lack of dilution solution or standard zero material for sensitivity measurement or dilution tests was one of the problems. Conclusion Comparative evaluation for changing test reagents require appropriate preparation time to collect diverse and sufficient samples. In addition, setting the total sample volume and reagent volume range required for comparative evaluations, depending on the sample volume and reagent volume required for one test, will reduce the burden of sample collection and planning for each comparative evaluation.