Search | Korea Science

A Hybrid SVM Classifier for Imbalanced Data Sets (불균형 데이터 집합의 분류를 위한 하이브리드 SVM 모델)

Lee, Jae Sik;Kwon, Jong Gu
- Journal of Intelligence and Information Systems
- /
- v.19 no.2
- /
- pp.125-140
- /
- 2013
We call a data set in which the number of records belonging to a certain class far outnumbers the number of records belonging to the other class, 'imbalanced data set'. Most of the classification techniques perform poorly on imbalanced data sets. When we evaluate the performance of a certain classification technique, we need to measure not only 'accuracy' but also 'sensitivity' and 'specificity'. In a customer churn prediction problem, 'retention' records account for the majority class, and 'churn' records account for the minority class. Sensitivity measures the proportion of actual retentions which are correctly identified as such. Specificity measures the proportion of churns which are correctly identified as such. The poor performance of the classification techniques on imbalanced data sets is due to the low value of specificity. Many previous researches on imbalanced data sets employed 'oversampling' technique where members of the minority class are sampled more than those of the majority class in order to make a relatively balanced data set. When a classification model is constructed using this oversampled balanced data set, specificity can be improved but sensitivity will be decreased. In this research, we developed a hybrid model of support vector machine (SVM), artificial neural network (ANN) and decision tree, that improves specificity while maintaining sensitivity. We named this hybrid model 'hybrid SVM model.' The process of construction and prediction of our hybrid SVM model is as follows. By oversampling from the original imbalanced data set, a balanced data set is prepared. SVM_I model and ANN_I model are constructed using the imbalanced data set, and SVM_B model is constructed using the balanced data set. SVM_I model is superior in sensitivity and SVM_B model is superior in specificity. For a record on which both SVM_I model and SVM_B model make the same prediction, that prediction becomes the final solution. If they make different prediction, the final solution is determined by the discrimination rules obtained by ANN and decision tree. For a record on which SVM_I model and SVM_B model make different predictions, a decision tree model is constructed using ANN_I output value as input and actual retention or churn as target. We obtained the following two discrimination rules: 'IF ANN_I output value <0.285, THEN Final Solution = Retention' and 'IF ANN_I output value ${\geq}0.285$, THEN Final Solution = Churn.' The threshold 0.285 is the value optimized for the data used in this research. The result we present in this research is the structure or framework of our hybrid SVM model, not a specific threshold value such as 0.285. Therefore, the threshold value in the above discrimination rules can be changed to any value depending on the data. In order to evaluate the performance of our hybrid SVM model, we used the 'churn data set' in UCI Machine Learning Repository, that consists of 85% retention customers and 15% churn customers. Accuracy of the hybrid SVM model is 91.08% that is better than that of SVM_I model or SVM_B model. The points worth noticing here are its sensitivity, 95.02%, and specificity, 69.24%. The sensitivity of SVM_I model is 94.65%, and the specificity of SVM_B model is 67.00%. Therefore the hybrid SVM model developed in this research improves the specificity of SVM_B model while maintaining the sensitivity of SVM_I model.
https://doi.org/10.13088/jiis.2013.19.2.125 인용 PDF KSCI

A Novel Image Classification Method for Content-based Image Retrieval via a Hybrid Genetic Algorithm and Support Vector Machine Approach

Seo, Kwang-Kyu
- Journal of the Semiconductor & Display Technology
- /
- v.10 no.3
- /
- pp.75-81
- /
- 2011
This paper presents a novel method for image classification based on a hybrid genetic algorithm (GA) and support vector machine (SVM) approach which can significantly improve the classification performance for content-based image retrieval (CBIR). Though SVM has been widely applied to CBIR, it has some problems such as the kernel parameters setting and feature subset selection of SVM which impact the classification accuracy in the learning process. This study aims at simultaneously optimizing the parameters of SVM and feature subset without degrading the classification accuracy of SVM using GA for CBIR. Using the hybrid GA and SVM model, we can classify more images in the database effectively. Experiments were carried out on a large-size database of images and experiment results show that the classification accuracy of conventional SVM may be improved significantly by using the proposed model. We also found that the proposed model outperformed all the other models such as neural network and typical SVM models.
PDF KSCI

SVM Load Forecasting using Cross-Validation (교차검증을 이용한 SVM 전력수요예측)

Jo, Nam-Hoon
- The Transactions of the Korean Institute of Electrical Engineers A
- /
- v.55 no.11
- /
- pp.485-491
- /
- 2006
In this paper, we study the problem of model selection for Support Vector Machine(SVM) predictor for short-term load forecasting. The model selection amounts to tuning SVM parameters, such as the cost coefficient C and kernel parameters and so on, in order to maximize the prediction performance of SVM. We propose that Cross-Validation method can be used as a model selection algorithm for SVM-based load forecasting technique. Through the various experiments on several data sets, we found that the difference between the prediction error of SVM using Cross-Validation and that of ideal SVM is less than 5%. This shows that SVM parameters for load forecasting can be efficiently tuned by using Cross-Validation.
PDF KSCI

Speaker Verification Using SVM Kernel with GMM-Supervector Based on the Mahalanobis Distance (Mahalanobis 거리측정 방법 기반의 GMM-Supervector SVM 커널을 이용한 화자인증 방법)

Kim, Hyoung-Gook;Shin, Dong
- The Journal of the Acoustical Society of Korea
- /
- v.29 no.3
- /
- pp.216-221
- /
- 2010
In this paper, we propose speaker verification method using Support Vector Machine (SVM) kernel with Gaussian Mixture Model (GMM)-supervector based on the Mahalanobis distance. The proposed GMM-supervector SVM kernel method is combined GMM with SVM. The GMM-supervectors are generated by GMM parameters of speaker and other speaker utterances. A speaker verification threshold of GMM-supervectors is decided by SVM kernel based on Mahalanobis distance to improve speaker verification accuracy. The experimental results for text-independent speaker verification using 20 speakers demonstrates the performance of the proposed method compared to GMM, SVM, GMM-supervector SVM kernel based on Kullback-Leibler (KL) divergence, and GMM-supervector SVM kernel based on Bhattacharyya distance.
https://doi.org/10.7776/ASK.2010.29.3.216 인용 PDF KSCI

Bankruptcy prediction using ensemble SVM model (앙상블 SVM 모형을 이용한 기업 부도 예측)

Choi, Ha Na;Lim, Dong Hoon
- Journal of the Korean Data and Information Science Society
- /
- v.24 no.6
- /
- pp.1113-1125
- /
- 2013
Corporate bankruptcy prediction has been an important topic in the accounting and finance field for a long time. Several data mining techniques have been used for bankruptcy prediction. However, there are many limits for application to real classification problem with a single model. This study proposes ensemble SVM (support vector machine) model which assembles different SVM models with each different kernel functions. Our ensemble model is made and evaluated by v-fold cross-validation approach. The k top performing models are recruited into the ensemble. The classification is then carried out using the majority voting opinion of the ensemble. In this paper, we investigate the performance of ensemble SVM classifier in terms of accuracy, error rate, sensitivity, specificity, ROC curve, and AUC to compare with single SVM classifiers based on financial ratios dataset and simulation dataset. The results confirmed the advantages of our method: It is robust while providing good performance.
https://doi.org/10.7465/jkdi.2013.24.6.1113 인용 PDF KSCI

Predicting Defect-Prone Software Module Using GA-SVM (GA-SVM을 이용한 결함 경향이 있는 소프트웨어 모듈 예측)

Kim, Young-Ok;Kwon, Ki-Tae
- KIPS Transactions on Software and Data Engineering
- /
- v.2 no.1
- /
- pp.1-6
- /
- 2013
For predicting defect-prone module in software, SVM classifier showed good performance in a previous research. But there are disadvantages that SVM parameter should be chosen differently for every kernel, and algorithm should be performed iteratively for predict results of changed parameter. Therefore, we find these parameters using Genetic Algorithm and compare with result of classification by Backpropagation Algorithm. As a result, the performance of GA-SVM model is better.
https://doi.org/10.3745/KTSDE.2013.2.1.001 인용 PDF KSCI

Application of a support vector machine for prediction of piping and internal stability of soils

Xue, Xinhua
- Geomechanics and Engineering
- /
- v.18 no.5
- /
- pp.493-502
- /
- 2019
Internal stability is an important safety issue for levees, embankments, and other earthen structures. Since a large part of the world's population lives near oceans, lakes and rivers, floods resulting from breaching of dams can lead to devastating disasters with tremendous loss of life and property, especially in densely populated areas. There are some main factors that affect the internal stability of dams, levees and other earthen structures, such as the erodibility of the soil, the water velocity inside the soil mass and the geometry of the earthen structure, etc. Thus, the mechanism of internal erosion and stability of soils is very complicated and it is vital to investigate the assessment methods of internal stability of soils in embankment dams and their foundations. This paper presents an improved support vector machine (SVM) model to predict the internal stability of soils. The grid search algorithm (GSA) is employed to find the optimal parameters of SVM firstly, and then the cross - validation (CV) method is employed to estimate the classification accuracy of the GSA-SVM model. Two examples of internal stability of soils are presented to validate the predictive capability of the proposed GSA-SVM model. In addition to verify the effectiveness of the proposed GSA-SVM model, the predictions from the proposed GSA-SVM model were compared with those from the traditional back propagation neural network (BPNN) model. The results showed that the proposed GSA-SVM model is a feasible and efficient tool for assessing the internal stability of soils with high accuracy.
https://doi.org/10.12989/gae.2019.18.5.493 인용 KSCI

Transfer Learning based DNN-SVM Hybrid Model for Breast Cancer Classification

Gui Rae Jo;Beomsu Baek;Young Soon Kim;Dong Hoon Lim
- Journal of the Korea Society of Computer and Information
- /
- v.28 no.11
- /
- pp.1-11
- /
- 2023
Breast cancer is the disease that affects women the most worldwide. Due to the development of computer technology, the efficiency of machine learning has increased, and thus plays an important role in cancer detection and diagnosis. Deep learning is a field of machine learning technology based on an artificial neural network, and its performance has been rapidly improved in recent years, and its application range is expanding. In this paper, we propose a DNN-SVM hybrid model that combines the structure of a deep neural network (DNN) based on transfer learning and a support vector machine (SVM) for breast cancer classification. The transfer learning-based proposed model is effective for small training data, has a fast learning speed, and can improve model performance by combining all the advantages of a single model, that is, DNN and SVM. To evaluate the performance of the proposed DNN-SVM Hybrid model, the performance test results with WOBC and WDBC breast cancer data provided by the UCI machine learning repository showed that the proposed model is superior to single models such as logistic regression, DNN, and SVM, and ensemble models such as random forest in various performance measures.
https://doi.org/10.9708/jksci.2023.28.11.001 인용 PDF HTML

Multiclass SVM Model with Order Information

Ahn, Hyun-Chul;Kim, Kyoung-Jae
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.6 no.4
- /
- pp.331-334
- /
- 2006
Original Support Vsctor Machines (SVMs) by Vapnik were used for binary classification problems. Some researchers have tried to extend original SVM to multiclass classification. However, their studies have only focused on classifying samples into nominal categories. This study proposes a novel multiclass SVM model in order to handle ordinal multiple classes. Our suggested model may use less classifiers but predict more accurately because it utilizes additional hidden information, the order of the classes. To validate our model, we apply it to the real-world bond rating case. In this study, we compare the results of the model to those of statistical and typical machine learning techniques, and another multi class SVM algorithm. The result shows that proposed model may improve classification performance in comparison to other typical multiclass classification algorithms.
https://doi.org/10.5391/IJFIS.2006.6.4.331 인용 PDF KSCI

A Decision Support Model for Sustainable Collaboration Level on Supply Chain Management using Support Vector Machines (Support Vector Machines을 이용한 공급사슬관리의 지속적 협업 수준에 대한 의사결정모델)

Lim, Se-Hun
- Journal of Distribution Research
- /
- v.10 no.3
- /
- pp.1-14
- /
- 2005
It is important to control performance and a Sustainable Collaboration (SC) for the successful Supply Chain Management (SCM). This research developed a control model which analyzed SCM performances based on a Balanced Scorecard (ESC) and an SC using Support Vector Machine (SVM). 108 specialists of an SCM completed the questionnaires. We analyzed experimental data set using SVM. This research compared the forecasting accuracy of an SCMSC through four types of SVM kernels: (1) linear, (2) polynomial (3) Radial Basis Function (REF), and (4) sigmoid kernel (linear > RBF > Sigmoid > Polynomial). Then, this study compares the prediction performance of SVM linear kernel with Artificial Neural Network. (ANN). The research findings show that using SVM linear kernel to forecast an SCMSC is the most outstanding. Thus SVM linear kernel provides a promising alternative to an SC control level. A company which pursues an SCM can use the information of an SC in the SVM model.
PDF

Search Result 698, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)