Search | Korea Science

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

Ahn, Hyunchul
- Information Systems Review
- /
- v.16 no.3
- /
- pp.161-177
- /
- 2014
Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.
https://doi.org/10.14329/isr.2014.16.3.161 인용 PDF

An Intelligent Decision Support System for Selecting Promising Technologies for R&D based on Time-series Patent Analysis (R&D 기술 선정을 위한 시계열 특허 분석 기반 지능형 의사결정지원시스템)

Lee, Choongseok;Lee, Suk Joo;Choi, Byounggu
- Journal of Intelligence and Information Systems
- /
- v.18 no.3
- /
- pp.79-96
- /
- 2012
As the pace of competition dramatically accelerates and the complexity of change grows, a variety of research have been conducted to improve firms' short-term performance and to enhance firms' long-term survival. In particular, researchers and practitioners have paid their attention to identify promising technologies that lead competitive advantage to a firm. Discovery of promising technology depends on how a firm evaluates the value of technologies, thus many evaluating methods have been proposed. Experts' opinion based approaches have been widely accepted to predict the value of technologies. Whereas this approach provides in-depth analysis and ensures validity of analysis results, it is usually cost-and time-ineffective and is limited to qualitative evaluation. Considerable studies attempt to forecast the value of technology by using patent information to overcome the limitation of experts' opinion based approach. Patent based technology evaluation has served as a valuable assessment approach of the technological forecasting because it contains a full and practical description of technology with uniform structure. Furthermore, it provides information that is not divulged in any other sources. Although patent information based approach has contributed to our understanding of prediction of promising technologies, it has some limitations because prediction has been made based on the past patent information, and the interpretations of patent analyses are not consistent. In order to fill this gap, this study proposes a technology forecasting methodology by integrating patent information approach and artificial intelligence method. The methodology consists of three modules : evaluation of technologies promising, implementation of technologies value prediction model, and recommendation of promising technologies. In the first module, technologies promising is evaluated from three different and complementary dimensions; impact, fusion, and diffusion perspectives. The impact of technologies refers to their influence on future technologies development and improvement, and is also clearly associated with their monetary value. The fusion of technologies denotes the extent to which a technology fuses different technologies, and represents the breadth of search underlying the technology. The fusion of technologies can be calculated based on technology or patent, thus this study measures two types of fusion index; fusion index per technology and fusion index per patent. Finally, the diffusion of technologies denotes their degree of applicability across scientific and technological fields. In the same vein, diffusion index per technology and diffusion index per patent are considered respectively. In the second module, technologies value prediction model is implemented using artificial intelligence method. This studies use the values of five indexes (i.e., impact index, fusion index per technology, fusion index per patent, diffusion index per technology and diffusion index per patent) at different time (e.g., t-n, t-n-1, t-n-2, ${\cdots}$) as input variables. The out variables are values of five indexes at time t, which is used for learning. The learning method adopted in this study is backpropagation algorithm. In the third module, this study recommends final promising technologies based on analytic hierarchy process. AHP provides relative importance of each index, leading to final promising index for technology. Applicability of the proposed methodology is tested by using U.S. patents in international patent class G06F (i.e., electronic digital data processing) from 2000 to 2008. The results show that mean absolute error value for prediction produced by the proposed methodology is lower than the value produced by multiple regression analysis in cases of fusion indexes. However, mean absolute error value of the proposed methodology is slightly higher than the value of multiple regression analysis. These unexpected results may be explained, in part, by small number of patents. Since this study only uses patent data in class G06F, number of sample patent data is relatively small, leading to incomplete learning to satisfy complex artificial intelligence structure. In addition, fusion index per technology and impact index are found to be important criteria to predict promising technology. This study attempts to extend the existing knowledge by proposing a new methodology for prediction technology value by integrating patent information analysis and artificial intelligence network. It helps managers who want to technology develop planning and policy maker who want to implement technology policy by providing quantitative prediction methodology. In addition, this study could help other researchers by proving a deeper understanding of the complex technological forecasting field.
https://doi.org/10.13088/jiis.2012.18.3.079 인용 PDF KSCI

Development of the Efficiency-Evaluation Model for the Mechanism of CO₂ Sequestration in a Deep Saline Aquifer (심부 대염수층 CO₂ 격리 메커니즘에 관한 효율성 평가 모델 개발)

Kim, Jung-Gyun;Lee, Young-Soo;Lee, Jeong-Hwan
- Journal of the Korean Institute of Gas
- /
- v.16 no.6
- /
- pp.55-66
- /
- 2012
The practical way to minimize the greenhouse gas is to reduce the emission of carbon dioxide. For this reason, CCS(Carbon Capture and Storage) technology, which could reduce carbon dioxide emission, has risen as a realistic alternative in recent years. In addition, the researcher is recently working into ways of applying CCS technologies with deep saline aquifer. In this study, the evaluation model on the feasibility of $CO_2$ sequestration in the deep saline aquifer using ANN(Artificial Neural Network) was developed. In order to develop the efficiency-evaluation model, basic model was created in the deep saline aquifer and sensitivity analysis was performed for the aquifer characteristics by utilizing the commercial simulator of GEM. Based on the sensitivity analysis, the factors and ranges affecting $CO_2$ sequestration in the deep saline aquifer were chosen. The result from ANN training scenario were confirmed $CO_2$ sequestration by solubility trapping and residual trapping mechanism. The result from ANN model evaluation indicated there is the increase of correlation coefficient up to 0.99. It has been confirmed that the developed model can be utilized in feasibility of $CO_2$ sequestration at deep saline aquifer.
https://doi.org/10.7842/kigas.2012.16.6.55 인용 PDF KSCI

Development of DL-MCS Hybrid Expert System for Automatic Estimation of Apartment Remodeling (공동주택 리모델링 자동견적을 위한 DL-MCS Hybrid Expert System 개발)

Kim, Jun;Cha, Heesung
- Korean Journal of Construction Engineering and Management
- /
- v.21 no.6
- /
- pp.113-124
- /
- 2020
Social movements to improve the performance of buildings through remodeling of aging apartment houses are being captured. To this end, the remodeling construction cost analysis, structural analysis, and political institutional review have been conducted to suggest ways to activate the remodeling. However, although the method of analyzing construction cost for remodeling apartment houses is currently being proposed for research purposes, there are limitations in practical application possibilities. Specifically, In order to be used practically, it is applicable to cases that have already been completed or in progress, but cases that will occur in the future are also used for construction cost analysis, so the sustainability of the analysis method is lacking. For the purpose of this, we would like to suggest an automated estimating method. For the sustainability of construction cost estimates, Deep-Learning was introduced in the estimating procedure. Specifically, a method for automatically finding the relationship between design elements, work types, and cost increase factors that can occur in apartment remodeling was presented. In addition, Monte Carlo Simulation was included in the estimation procedure to compensate for the lack of uncertainty, which is the inherent limitation of the Deep Learning-based estimation. In order to present higher accuracy as cases are accumulated, a method of calculating higher accuracy by comparing the estimate result with the existing accumulated data was also suggested. In order to validate the sustainability of the automated estimates proposed in this study, 13 cases of learning procedures and an additional 2 cases of cumulative procedures were performed. As a result, a new construction cost estimating procedure was automatically presented that reflects the characteristics of the two additional projects. In this study, the method of estimate estimate was used using 15 cases, If the cases are accumulated and reflected, the effect of this study is expected to increase.
https://doi.org/10.6106/KJCEM.2020.21.6.113 인용 PDF KSCI

Optimized Feature Selection using Feature Subset IG-MLP Evaluation based Machine Learning Model for Disease Prediction (특징집합 IG-MLP 평가 기반의 최적화된 특징선택 방법을 이용한 질환 예측 머신러닝 모델)

Kim, Kyeongryun;Kim, Jaekwon;Lee, Jongsik
- Journal of the Korea Society for Simulation
- /
- v.29 no.1
- /
- pp.11-21
- /
- 2020
Cardio-cerebrovascular diseases (CCD) account for 24% of the causes of death to Koreans and its proportion is the highest except cancer. Currently, the risk of the cardiovascular disease for domestic patients is based on the Framingham risk score (FRS), but accuracy tends to decrease because it is a foreign guideline. Also, it can't score the risk of cerebrovascular disease. CCD is hard to predict, because it is difficult to analyze the features of early symptoms for prevention. Therefore, proper prediction method for Koreans is needed. The purpose of this paper is validating IG-MLP (Information Gain - Multilayer Perceptron) evaluation based feature selection method using CCD data with simulation. The proposed method uses the raw data of the 4^th ~ 7^th of The Korea National Health and Nutrition Examination Survey (KNHANES). To select the important feature of CCD, analysis on the attributes using IG-MLP are processed, finally CCD prediction ANN model using optimize feature set is provided. Proposed method can find important features of CCD prediction of Koreans, and ANN model could predict more accurate CCD for Koreans.
https://doi.org/10.9709/JKSS.2020.29.1.011 인용 PDF KSCI

Rice Yield Estimation of South Korea from Year 2003-2016 Using Stacked Sparse AutoEncoder (SSAE 알고리즘을 통한 2003-2016년 남한 전역 쌀 생산량 추정)

Ma, Jong Won;Lee, Kyungdo;Choi, Ki-Young;Heo, Joon
- Korean Journal of Remote Sensing
- /
- v.33 no.5_2
- /
- pp.631-640
- /
- 2017
The estimation of rice yield affects the income of farmers as well as the fields related to agriculture. Moreover, it has an important effect on the government's policy making including the control of supply demand and the price estimation. Thus, it is necessary to build the crop yield estimation model and from the past, many studies utilizing empirical statistical models or artificial neural network algorithms have been conducted through climatic and satellite data. Presently, scientists have achieved successful results with deep learning algorithms in the field of pattern recognition, computer vision, speech recognition, etc. Among deep learning algorithms, the SSAE (Stacked Sparse AutoEncoder) algorithm has been confirmed to be applicable in the field of forecasting through time series data and in this study, SSAE was utilized to estimate the rice yield in South Korea. The climatic and satellite data were used as the input variables and different types of input data were constructed according to the period of rice growth in South Korea. As a result, the combination of the satellite data from May to September and the climatic data using the 16 day average value showed the best performance with showing average annual %RMSE (percent Root Mean Square Error) and region %RMSE of 7.43% and 7.16% that the applicability of the SSAE algorithm could be proved in the field of rice yield estimation.
https://doi.org/10.7780/kjrs.2017.33.5.2.3 인용 PDF KSCI

Prediction and analysis of acute fish toxicity of pesticides to the rainbow trout using 2D-QSAR (2D-QSAR방법을 이용한 농약류의 무지개 송어 급성 어독성 분석 및 예측)

Song, In-Sik;Cha, Ji-Young;Lee, Sung-Kwang
- Analytical Science and Technology
- /
- v.24 no.6
- /
- pp.544-555
- /
- 2011
The acute toxicity in the rainbow trout (Oncorhynchus mykiss) was analyzed and predicted using quantitative structure-activity relationships (QSAR). The aquatic toxicity, 96h $LC_{50}$ (median lethal concentration) of 275 organic pesticides, was obtained from EU-funded project DEMETRA. Prediction models were derived from 558 2D molecular descriptors, calculated in PreADMET. The linear (multiple linear regression) and nonlinear (support vector machine and artificial neural network) learning methods were optimized by taking into account the statistical parameters between the experimental and predicted p$LC_{50}$. After preprocessing, population based forward selection were used to select the best subsets of descriptors in the learning methods including 5-fold cross-validation procedure. The support vector machine model was used as the best model ($R^2_{CV}$=0.677, RMSECV=0.887, MSECV=0.674) and also correctly classified 87% for the training set according to EU regulation criteria. The MLR model could describe the structural characteristics of toxic chemicals and interaction with lipid membrane of fish. All the developed models were validated by 5 fold cross-validation and Y-scrambling test.
https://doi.org/10.5806/AST.2011.24.6.544 인용 PDF KSCI

Prediction of Shore Tide level using Artificial Neural Network (인공신경망을 이용한 해안 조위예측)

Rhee Kyoung Hoon;Moon Byoung Seok;Kim Tae Kyoung;Oh jong yang
- Proceedings of the Korea Water Resources Association Conference
- /
- 2005.05b
- /
- pp.1068-1072
- /
- 2005
조석이란, 해면의 완만한 주기적 승강을 말하며, 보통 그 승강은 1일 약 2회이나, 곳에 따라서는 1일 1회의 곳도 있다. 조석에 있어서는 이 밖에 수일의 주기를 갖는 약간 불규칙한 승강, 반년, 또는 1년을 주기로 하는 다소 규칙적인 승강까지 포함하여 취급한다. 그러나, 각 항만마다 갖는 특정적인 주기인 수분내지 수십분의 주기의 승강은 조석으로 취급하지 않는다. 조석은 해양의 제현상 중에서 예측가능성이 가장 큰 현장으로 이는 조석이 천체의 운행과 연관되기 때문이다. 조석이란 지구로부터 일정한 거리에서 각 고유의 속도를 가지는 적도상을 운행하는 무수의 가상천체에 기인하는 규칙적인 개개의 조석을 합성한 것이며 이 개개의 조석을 분조(Constituent)라 한다. 여기에서 사용되는 신경망 모형은 입력과 출력으로 구성되는 블랙박스 모형으로서 하나의 시스템을 병렬적으로 비선형적으로 구축할 수 있다는 장점 때문에 과거 하천유역의 강우-유출과정에서의 경우 유출현상을 해석하고 유출과정을 모형화 하기 위해 사용하였다. 본 연구에서는 기존의 조위 예측방법인 조화분석법이 아닌 인공신경망을 이용하여 조위예측을 실시하였다. 학습이라는 최적화 과정을 통해 구조와 기능이 복잡한 자연현상을 그대로 받아들여 축적시킴으로써 이를 지식으로 현상에 대한 재현능력이 뛰어나고, 또한 신경회로망의 연상기억능력에 적용하여 수학적으로 표현이 불가능한 불확실한 조위곡선에 적용하기에 유리한 장점을 가지고 있다. 본 연구의 목적은 과거 조위이론을 통해 이루었던 조위예측을 우리가 알기 쉬운 여러 기후인자(해면기압, 풍향, 풍속, 음력 등)에 따른 조위곡선을 예측하기 위해 신경망 모형을 이용하여 여수지역의 조위에 적용하여 비교 분석하고자 한다. May가 제안한 공식을 더 확장하여 적용할 수 있는 실험 공식으로 개선하였으며 다양한 조건에 대한 실험을 수행하여 보다 정밀한 공식으로 개선할 수 있었다.$10,924m^3/s$ 및 $10,075m^3/s$로서 실험 I의 $2,757m^3/s$에 비해 통수능이 많이 개선되었음을 알 수 있다.함을 알 수 있다. 상수관로 설계 기준에서는 관로내 수압을 $1.5\~4.0kg/cm^2$으로 나타내고 있는데 $6kg/cm^2$보다 과수압을 나타내는 경우가 $100\%$로 밸브를 개방하였을 때보다 $60\%,\;80\%$ 개방하였을 때가 더 빈번히 발생하고 있으므로 대상지역의 밸브 개폐는 $100\%$ 개방하는 것이 선계기준에 적합한 것으로 나타났다. 밸브 개폐에 따른 수압 변화를 모의한 결과 밸브 개폐도를 적절히 유지하여 필요수량의 확보 및 누수방지대책에 활용할 수 있을 것으로 판단된다.8R(mm)(r^2=0.84)$로 지수적으로 증가하는 경향을 나타내었다. 유거수량은 토성별로 양토를 1.0으로 기준할 때 사양토가 0.86으로 가장 작았고, 식양토 1.09, 식토 1.15로 평가되어 침투수에 비해 토성별 차이가 크게 나타났다. 이는 토성이 세립질일 수록 유거수의 저항이 작기 때문으로 생각된다. 경사에 따라서는 경사도가 증가할수록 증가하였으며 $10\% 경사일 때를 기준으로 $Ro(mm)=Ro_{10}{\times}0.797{\times}e^{-0.021s(\%)}$로 나타났다.천성 승모판 폐쇄 부전등을 초래하는 심각한 선
PDF

COMPARISON OF LINEAR AND NON-LINEAR NIR CALIBRATION METHODS USING LARGE FORAGE DATABASES

Berzaghi, Paolo;Flinn, Peter C.;Dardenne, Pierre;Lagerholm, Martin;Shenk, John S.;Westerhaus, Mark O.;Cowe, Ian A.
- Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
- /
- 2001.06a
- /
- pp.1141-1141
- /
- 2001
The aim of the study was to evaluate the performance of 3 calibration methods, modified partial least squares (MPLS), local PLS (LOCAL) and artificial neural network (ANN) on the prediction of chemical composition of forages, using a large NIR database. The study used forage samples (n=25,977) from Australia, Europe (Belgium, Germany, Italy and Sweden) and North America (Canada and U.S.A) with information relative to moisture, crude protein and neutral detergent fibre content. The spectra of the samples were collected with 10 different Foss NIR Systems instruments, which were either standardized or not standardized to one master instrument. The spectra were trimmed to a wavelength range between 1100 and 2498 nm. Two data sets, one standardized (IVAL) and the other not standardized (SVAL) were used as independent validation sets, but 10% of both sets were omitted and kept for later expansion of the calibration database. The remaining samples were combined into one database (n=21,696), which was split into 75% calibration (CALBASE) and 25% validation (VALBASE). The chemical components in the 3 validation data sets were predicted with each model derived from CALBASE using the calibration database before and after it was expanded with 10% of the samples from IVAL and SVAL data sets. Calibration performance was evaluated using standard error of prediction corrected for bias (SEP(C)), bias, slope and R2. None of the models appeared to be consistently better across all validation sets. VALBASE was predicted well by all models, with smaller SEP(C) and bias values than for IVAL and SVAL. This was not surprising as VALBASE was selected from the calibration database and it had a sample population similar to CALBASE, whereas IVAL and SVAL were completely independent validation sets. In most cases, Local and ANN models, but not modified PLS, showed considerable improvement in the prediction of IVAL and SVAL after the calibration database had been expanded with the 10% samples of IVAL and SVAL reserved for calibration expansion. The effects of sample processing, instrument standardization and differences in reference procedure were partially confounded in the validation sets, so it was not possible to determine which factors were most important. Further work on the development of large databases must address the problems of standardization of instruments, harmonization and standardization of laboratory procedures and even more importantly, the definition of the database population.
PDF

Forecasting of Customer's Purchasing Intention Using Support Vector Machine (Support Vector Machine 기법을 이용한 고객의 구매의도 예측)

Kim, Jin-Hwa;Nam, Ki-Chan;Lee, Sang-Jong
- Information Systems Review
- /
- v.10 no.2
- /
- pp.137-158
- /
- 2008
Rapid development of various information technologies creates new opportunities in online and offline markets. In this changing market environment, customers have various demands on new products and services. Therefore, their power and influence on the markets grow stronger each year. Companies have paid great attention to customer relationship management. Especially, personalized product recommendation systems, which recommend products and services based on customer's private information or purchasing behaviors in stores, is an important asset to most companies. CRM is one of the important business processes where reliable information is mined from customer database. Data mining techniques such as artificial intelligence are popular tools used to extract useful information and knowledge from these customer databases. In this research, we propose a recommendation system that predicts customer's purchase intention. Then, customer's purchasing intention of specific product is predicted by using data mining techniques using receipt data set. The performance of this suggested method is compared with that of other data mining technologies.
PDF KSCI

Search Result 3,082, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)