Search | Korea Science

Developing an Ensemble Classifier for Bankruptcy Prediction (부도 예측을 위한 앙상블 분류기 개발)

Min, Sung-Hwan
- Journal of Korea Society of Industrial Information Systems
- /
- v.17 no.7
- /
- pp.139-148
- /
- 2012
An ensemble of classifiers is to employ a set of individually trained classifiers and combine their predictions. It has been found that in most cases the ensembles produce more accurate predictions than the base classifiers. Combining outputs from multiple classifiers, known as ensemble learning, is one of the standard and most important techniques for improving classification accuracy in machine learning. An ensemble of classifiers is efficient only if the individual classifiers make decisions as diverse as possible. Bagging is the most popular method of ensemble learning to generate a diverse set of classifiers. Diversity in bagging is obtained by using different training sets. The different training data subsets are randomly drawn with replacement from the entire training dataset. The random subspace method is an ensemble construction technique using different attribute subsets. In the random subspace, the training dataset is also modified as in bagging. However, this modification is performed in the feature space. Bagging and random subspace are quite well known and popular ensemble algorithms. However, few studies have dealt with the integration of bagging and random subspace using SVM Classifiers, though there is a great potential for useful applications in this area. The focus of this paper is to propose methods for improving SVM performance using hybrid ensemble strategy for bankruptcy prediction. This paper applies the proposed ensemble model to the bankruptcy prediction problem using a real data set from Korean companies.
https://doi.org/10.9723/jksiis.2012.17.7.139 인용 PDF KSCI

Bankruptcy prediction using ensemble SVM model (앙상블 SVM 모형을 이용한 기업 부도 예측)

Choi, Ha Na;Lim, Dong Hoon
- Journal of the Korean Data and Information Science Society
- /
- v.24 no.6
- /
- pp.1113-1125
- /
- 2013
Corporate bankruptcy prediction has been an important topic in the accounting and finance field for a long time. Several data mining techniques have been used for bankruptcy prediction. However, there are many limits for application to real classification problem with a single model. This study proposes ensemble SVM (support vector machine) model which assembles different SVM models with each different kernel functions. Our ensemble model is made and evaluated by v-fold cross-validation approach. The k top performing models are recruited into the ensemble. The classification is then carried out using the majority voting opinion of the ensemble. In this paper, we investigate the performance of ensemble SVM classifier in terms of accuracy, error rate, sensitivity, specificity, ROC curve, and AUC to compare with single SVM classifiers based on financial ratios dataset and simulation dataset. The results confirmed the advantages of our method: It is robust while providing good performance.
https://doi.org/10.7465/jkdi.2013.24.6.1113 인용 PDF KSCI

Improving an Ensemble Model by Optimizing Bootstrap Sampling (부트스트랩 샘플링 최적화를 통한 앙상블 모형의 성능 개선)

Min, Sung-Hwan
- Journal of Internet Computing and Services
- /
- v.17 no.2
- /
- pp.49-57
- /
- 2016
Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving prediction accuracy. Bagging is one of the most popular ensemble learning techniques. Bagging has been known to be successful in increasing the accuracy of prediction of the individual classifiers. Bagging draws bootstrap samples from the training sample, applies the classifier to each bootstrap sample, and then combines the predictions of these classifiers to get the final classification result. Bootstrap samples are simple random samples selected from the original training data, so not all bootstrap samples are equally informative, due to the randomness. In this study, we proposed a new method for improving the performance of the standard bagging ensemble by optimizing bootstrap samples. A genetic algorithm is used to optimize bootstrap samples of the ensemble for improving prediction accuracy of the ensemble model. The proposed model is applied to a bankruptcy prediction problem using a real dataset from Korean companies. The experimental results showed the effectiveness of the proposed model.
https://doi.org/10.7472/jksii.2016.17.2.49 인용 PDF KSCI

Application of Random Forests to Assessment of Importance of Variables in Multi-sensor Data Fusion for Land-cover Classification

Park No-Wook;Chi kwang-Hoon
- Korean Journal of Remote Sensing
- /
- v.22 no.3
- /
- pp.211-219
- /
- 2006
A random forests classifier is applied to multi-sensor data fusion for supervised land-cover classification in order to account for the importance of variable. The random forests approach is a non-parametric ensemble classifier based on CART-like trees. The distinguished feature is that the importance of variable can be estimated by randomly permuting the variable of interest in all the out-of-bag samples for each classifier. Two different multi-sensor data sets for supervised classification were used to illustrate the applicability of random forests: one with optical and polarimetric SAR data and the other with multi-temporal Radarsat-l and ENVISAT ASAR data sets. From the experimental results, the random forests approach could extract important variables or bands for land-cover discrimination and showed reasonably good performance in terms of classification accuracy.
https://doi.org/10.7780/kjrs.2006.22.3.211 인용 PDF KSCI

Improvement of Vocal Detection Accuracy Using Convolutional Neural Networks

You, Shingchern D.;Liu, Chien-Hung;Lin, Jia-Wei
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.15 no.2
- /
- pp.729-748
- /
- 2021
Vocal detection is one of the fundamental steps in musical information retrieval. Typically, the detection process consists of feature extraction and classification steps. Recently, neural networks are shown to outperform traditional classifiers. In this paper, we report our study on how to improve detection accuracy further by carefully choosing the parameters of the deep network model. Through experiments, we conclude that a feature-classifier model is still better than an end-to-end model. The recommended model uses a spectrogram as the input plane and the classifier is an 18-layer convolutional neural network (CNN). With this arrangement, when compared with existing literature, the proposed model improves the accuracy from 91.8% to 94.1% in Jamendo dataset. As the dataset has an accuracy of more than 90%, the improvement of 2.3% is difficult and valuable. If even higher accuracy is required, the ensemble learning may be used. The recommend setting is a majority vote with seven proposed models. Doing so, the accuracy increases by about 1.1% in Jamendo dataset.
https://doi.org/10.3837/tiis.2021.02.019 인용 PDF KSCI HTML

Credit Risk Evaluations of Online Retail Enterprises Using Support Vector Machines Ensemble: An Empirical Study from China

LI, Xin;XIA, Han
- The Journal of Asian Finance, Economics and Business
- /
- v.9 no.8
- /
- pp.89-97
- /
- 2022
The e-commerce market faces significant credit risks due to the complexity of the industry and information asymmetries. Therefore, credit risk has started to stymie the growth of e-commerce. However, there is no reliable system for evaluating the creditworthiness of e-commerce companies. Therefore, this paper constructs a credit risk evaluation index system that comprehensively considers the online and offline behavior of online retail enterprises, including 15 indicators that reflect online credit risk and 15 indicators that reflect offline credit risk. This paper establishes an integration method based on a fuzzy integral support vector machine, which takes the factor analysis results of the credit risk evaluation index system of online retail enterprises as the input and the credit risk evaluation results of online retail enterprises as the output. The classification results of each sub-classifier and the importance of each sub-classifier decision to the final decision have been taken into account in this method. Select the sample data of 1500 online retail loan customers from a bank to test the model. The empirical results demonstrate that the proposed method outperforms a single SVM and traditional SVMs aggregation technique via majority voting in terms of classification accuracy, which provides a basis for banks to establish a reliable evaluation system.
https://doi.org/10.13106/jafeb.2022.vol9.no8.0089 인용 PDF KSCI HTML

Ensemble Learning of Region Based Classifiers (지역 기반 분류기의 앙상블 학습)

Choi, Sung-Ha;Lee, Byung-Woo;Yang, Ji-Hoon
- The KIPS Transactions:PartB
- /
- v.14B no.4
- /
- pp.303-310
- /
- 2007
In machine learning, the ensemble classifier that is a set of classifiers have been introduced for higher accuracy than individual classifiers. We propose a new ensemble learning method that employs a set of region based classifiers. To show the performance of the proposed method. we compared its performance with that of bagging and boosting, which ard existing ensemble methods. Since the distribution of data can be different in different regions in the feature space, we split the data and generate classifiers based on each region and apply a weighted voting among the classifiers. We used 11 data sets from the UCI Machine Learning Repository to compare the performance of our new ensemble method with that of individual classifiers as well as existing ensemble methods such as bagging and boosting. As a result, we found that our method produced improved performance, particularly when the base learner is Naive Bayes or SVM.
https://doi.org/10.3745/KIPSTB.2007.14-B.4.303 인용 PDF KSCI

Illegal Cash Accommodation Detection Modeling Using Ensemble Size Reduction (신용카드 불법현금융통 적발을 위한 축소된 앙상블 모형)

Lee, Hwa-Kyung;Han, Sang-Bum;Jhee, Won-Chul
- Journal of Intelligence and Information Systems
- /
- v.16 no.1
- /
- pp.93-116
- /
- 2010
Ensemble approach is applied to the detection modeling of illegal cash accommodation (ICA) that is the well-known type of fraudulent usages of credit cards in far east nations and has not been addressed in the academic literatures. The performance of fraud detection model (FDM) suffers from the imbalanced data problem, which can be remedied to some extent using an ensemble of many classifiers. It is generally accepted that ensembles of classifiers produce better accuracy than a single classifier provided there is diversity in the ensemble. Furthermore, recent researches reveal that it may be better to ensemble some selected classifiers instead of all of the classifiers at hand. For the effective detection of ICA, we adopt ensemble size reduction technique that prunes the ensemble of all classifiers using accuracy and diversity measures. The diversity in ensemble manifests itself as disagreement or ambiguity among members. Data imbalance intrinsic to FDM affects our approach for ICA detection in two ways. First, we suggest the training procedure with over-sampling methods to obtain diverse training data sets. Second, we use some variants of accuracy and diversity measures that focus on fraud class. We also dynamically calculate the diversity measure-Forward Addition and Backward Elimination. In our experiments, Neural Networks, Decision Trees and Logit Regressions are the base models as the ensemble members and the performance of homogeneous ensembles are compared with that of heterogeneous ensembles. The experimental results show that the reduced size ensemble is as accurate on average over the data-sets tested as the non-pruned version, which provides benefits in terms of its application efficiency and reduced complexity of the ensemble.
PDF KSCI

Learning to Prevent Inactive Student of Indonesia Open University

Tama, Bayu Adhi
- Journal of Information Processing Systems
- /
- v.11 no.2
- /
- pp.165-172
- /
- 2015
The inactive student rate is becoming a major problem in most open universities worldwide. In Indonesia, roughly 36% of students were found to be inactive, in 2005. Data mining had been successfully employed to solve problems in many domains, such as for educational purposes. We are proposing a method for preventing inactive students by mining knowledge from student record systems with several state of the art ensemble methods, such as Bagging, AdaBoost, Random Subspace, Random Forest, and Rotation Forest. The most influential attributes, as well as demographic attributes (marital status and employment), were successfully obtained which were affecting student of being inactive. The complexity and accuracy of classification techniques were also compared and the experimental results show that Rotation Forest, with decision tree as the base-classifier, denotes the best performance compared to other classifiers.
https://doi.org/10.3745/JIPS.04.0015 인용 PDF KSCI

Pattern Selection Using the Bias and Variance of Ensemble (앙상블의 편기와 분산을 이용한 패턴 선택)

Shin, Hyunjung;Cho, Sungzoon
- Journal of Korean Institute of Industrial Engineers
- /
- v.28 no.1
- /
- pp.112-127
- /
- 2002
A useful pattern is a pattern that contributes much to learning. For a classification problem those patterns near the class boundary surfaces carry more information to the classifier. For a regression problem the ones near the estimated surface carry more information. In both cases, the usefulness is defined only for those patterns either without error or with negligible error. Using only the useful patterns gives several benefits. First, computational complexity in memory and time for learning is decreased. Second, overfitting is avoided even when the learner is over-sized. Third, learning results in more stable learners. In this paper, we propose a pattern 'utility index' that measures the utility of an individual pattern. The utility index is based on the bias and variance of a pattern trained by a network ensemble. In classification, the pattern with a low bias and a high variance gets a high score. In regression, on the other hand, the one with a low bias and a low variance gets a high score. Based on the distribution of the utility index, the original training set is divided into a high-score group and a low-score group. Only the high-score group is then used for training. The proposed method is tested on synthetic and real-world benchmark datasets. The proposed approach gives a better or at least similar performance.
PDF KSCI

Search Result 112, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)