• Title/Summary/Keyword: optimal classification method

Search Result 368, Processing Time 0.025 seconds

Improving an Ensemble Model Using Instance Selection Method (사례 선택 기법을 활용한 앙상블 모형의 성능 개선)

  • Min, Sung-Hwan
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.39 no.1
    • /
    • pp.105-115
    • /
    • 2016
  • Ensemble classification involves combining individually trained classifiers to yield more accurate prediction, compared with individual models. Ensemble techniques are very useful for improving the generalization ability of classifiers. The random subspace ensemble technique is a simple but effective method for constructing ensemble classifiers; it involves randomly drawing some of the features from each classifier in the ensemble. The instance selection technique involves selecting critical instances while deleting and removing irrelevant and noisy instances from the original dataset. The instance selection and random subspace methods are both well known in the field of data mining and have proven to be very effective in many applications. However, few studies have focused on integrating the instance selection and random subspace methods. Therefore, this study proposed a new hybrid ensemble model that integrates instance selection and random subspace techniques using genetic algorithms (GAs) to improve the performance of a random subspace ensemble model. GAs are used to select optimal (or near optimal) instances, which are used as input data for the random subspace ensemble model. The proposed model was applied to both Kaggle credit data and corporate credit data, and the results were compared with those of other models to investigate performance in terms of classification accuracy, levels of diversity, and average classification rates of base classifiers in the ensemble. The experimental results demonstrated that the proposed model outperformed other models including the single model, the instance selection model, and the original random subspace ensemble model.

Optimal EEG Locations for EEG Feature Extraction with Application to User's Intension using a Robust Neuro-Fuzzy System in BCI

  • Lee, Chang Young;Aliyu, Ibrahim;Lim, Chang Gyoon
    • Journal of Integrative Natural Science
    • /
    • v.11 no.4
    • /
    • pp.167-183
    • /
    • 2018
  • Electroencephalogram (EEG) recording provides a new way to support human-machine communication. It gives us an opportunity to analyze the neuro-dynamics of human cognition. Machine learning is a powerful for the EEG classification. In addition, machine learning can compensate for high variability of EEG when analyzing data in real time. However, the optimal EEG electrode location must be prioritized in order to extract the most relevant features from brain wave data. In this paper, we propose an intelligent system model for the extraction of EEG data by training the optimal electrode location of EEG in a specific problem. The proposed system is basically a fuzzy system and uses a neural network structurally. The fuzzy clustering method is used to determine the optimal number of fuzzy rules using the features extracted from the EEG data. The parameters and weight values found in the process of determining the number of rules determined here must be tuned for optimization in the learning process. Genetic algorithms are used to obtain optimized parameters. We present useful results by using optimal rule numbers and non - symmetric membership function using EEG data for four movements with the right arm through various experiments.

A Hybrid Soft Computing Technique for Software Fault Prediction based on Optimal Feature Extraction and Classification

  • Balaram, A.;Vasundra, S.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.348-358
    • /
    • 2022
  • Software fault prediction is a method to compute fault in the software sections using software properties which helps to evaluate the quality of software in terms of cost and effort. Recently, several software fault detection techniques have been proposed to classifying faulty or non-faulty. However, for such a person, and most studies have shown the power of predictive errors in their own databases, the performance of the software is not consistent. In this paper, we propose a hybrid soft computing technique for SFP based on optimal feature extraction and classification (HST-SFP). First, we introduce the bat induced butterfly optimization (BBO) algorithm for optimal feature selection among multiple features which compute the most optimal features and remove unnecessary features. Second, we develop a layered recurrent neural network (L-RNN) based classifier for predict the software faults based on their features which enhance the detection accuracy. Finally, the proposed HST-SFP technique has the more effectiveness in some sophisticated technical terms that outperform databases of probability of detection, accuracy, probability of false alarms, precision, ROC, F measure and AUC.

Voice Classification Algorithm for Sasang Constitution Using Support Vector Machine (SVM을 이용한 음성 사상체질 분류 알고리즘)

  • Kang, Jae-Hwan;Do, Jun-Hyeong;Kim, Jong-Yeol
    • Journal of Sasang Constitutional Medicine
    • /
    • v.22 no.1
    • /
    • pp.17-25
    • /
    • 2010
  • 1. Objectives: Voice diagnosis has been used to classify individuals into the Sasang constitution in SCM(Sasang Constitution Medicine) and to recognize his/her health condition in TKM(Traditional Korean Medicine). In this paper, we purposed a new speech classification algorithm for Sasang constitution. 2. Methods: This algorithm is based on the SVM(Support Vector Machine) technique, which is a classification method to classify two distinct groups by finding voluntary nonlinear boundary in vector space. It showed high performance in classification with a few numbers of trained data set. We designed for this algorithm using 3 SVM classifiers to classify into 4 groups, which are composed of 3 constitutional groups and additional indecision group. 3. Results: For the optimal performance, we found that 32.2% of the voice data were classified into three constitutional groups and 79.8% out of them were grouped correctly. 4. Conclusions: This new classification method including indecision group appears efficient compared to the standard classification algorithm which classifies only into 3 constitutional groups. We find that more thorough investigation on the voice features is required to improve the classification efficiency into Sasang constitution.

A Three-Step Preprocessing Algorithm for Enhanced Classification of E-Mail Recommendation System (이메일 추천 시스템의 분류 향상을 위한 3단계 전처리 알고리즘)

  • Jeong Ok-Ran;Cho Dong-Sub
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.54 no.4
    • /
    • pp.251-258
    • /
    • 2005
  • Automatic document classification may differ significantly according to the characteristics of documents that are subject to classification, as well as classifier's performance. This research identifies e-mail document's characteristics to apply a three-step preprocessing algorithm that can minimize e-mail document's atypical characteristics. In the first 5go, uncertain based sampling algorithm that used Mean Absolute Deviation(MAD), is used to address the question of selection learning document for the rule generation at the time of classification. In the subsequent stage, Weighted vlaue assigning method by attribute is applied to increase the discriminating capability of the terms that appear on the title on the e-mail document characteristic level. in the third and last stage, accuracy level during classification by each category is increased by using Naive Bayesian Presumptive Algorithm's Dynamic Threshold. And, we implemented an E-Mail Recommendtion System using a three-step preprocessing algorithm the enable users for direct and optimal classification with the recommendation of the applicable category when a mail arrives.

Fast Pattern Classification with the Multi-layer Cellular Nonlinear Networks (CNN) (다층 셀룰라 비선형 회로망(CNN)을 이용한 고속 패턴 분류)

  • 오태완;이혜정;손홍락;김형석
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.52 no.9
    • /
    • pp.540-546
    • /
    • 2003
  • A fast pattern classification algorithm with Cellular Nonlinear Network-based dynamic programming is proposed. The Cellular Nonlinear Networks is an analog parallel processing architecture and the dynamic programing is an efficient computation algorithm for optimization problem. Combining merits of these two technologies, fast pattern classification with optimization is formed. On such CNN-based dynamic programming, if exemplars and test patterns are presented as the goals and the start positions, respectively, the optimal paths from test patterns to their closest exemplars are found. Such paths are utilized as aggregating keys for the classification. The algorithm is similar to the conventional neural network-based method in the use of the exemplar patterns but quite different in the use of the most likely path finding of the dynamic programming. The pattern classification is performed well regardless of degree of the nonlinearity in class borders.

A Study on the Application of Interpolation and Terrain Classification for Accuracy Improvement of Digital Elevation Model (수지표고지형의 정확도 향상을 위한 지형의 분류와 보간법의 상용에 관한 연구)

  • 문두열
    • Journal of Ocean Engineering and Technology
    • /
    • v.8 no.2
    • /
    • pp.64-79
    • /
    • 1994
  • In this study, terrain classification, which was done by using the quantitative classification parameters and suitable interpolation method was applied to improve the accuracy of digital elevation models, and to increase its practical use of aerial photogrammetry. A terrain area was classified into three groups using the quantitative classification parameters to the ratio of horizontal, inclined area, magnitude of harmonic vectors, deviation of vector, the number of breakline and proposed the suitable interpolation. Also, the accuracy of digital elevation models was improved in case of large grid intervals by applying combined interpolation suitable for each terrain group. As a result of this study, I have an algorithm to perform the classification of the topography in the area of interest objectively and decided optimal data interpolation scheme for given topography.

  • PDF

Integrating Spatial Proximity with Manifold Learning for Hyperspectral Data

  • Kim, Won-Kook;Crawford, Melba M.;Lee, Sang-Hoon
    • Korean Journal of Remote Sensing
    • /
    • v.26 no.6
    • /
    • pp.693-703
    • /
    • 2010
  • High spectral resolution of hyperspectral data enables analysis of complex natural phenomena that is reflected on the data nonlinearly. Although many manifold learning methods have been developed for such problems, most methods do not consider the spatial correlation between samples that is inherent and useful in remote sensing data. We propose a manifold learning method which directly combines the spatial proximity and the spectral similarity through kernel PCA framework. A gain factor caused by spatial proximity is first modelled with a heat kernel, and is added to the original similarity computed from the spectral values of a pair of samples. Parameters are tuned with intelligent grid search (IGS) method for the derived manifold coordinates to achieve optimal classification accuracies. Of particular interest is its performance with small training size, because labelled samples are usually scarce due to its high acquisition cost. The proposed spatial kernel PCA (KPCA) is compared with PCA in terms of classification accuracy with the nearest-neighbourhood classification method.

Selection Method of Fuzzy Partitions in Fuzzy Rule-Based Classification Systems (퍼지 규칙기반 분류시스템에서 퍼지 분할의 선택방법)

  • Son, Chang-S.;Chung, Hwan-M.;Kwon, Soon-H.
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.3
    • /
    • pp.360-366
    • /
    • 2008
  • The initial fuzzy partitions in fuzzy rule-based classification systems are determined by considering the domain region of each attribute with the given data, and the optimal classification boundaries within the fuzzy partitions can be discovered by tuning their parameters using various learning processes such as neural network, genetic algorithm, and so on. In this paper, we propose a selection method for fuzzy partition based on statistical information to maximize the performance of pattern classification without learning processes where statistical information is used to extract the uncertainty regions (i.e., the regions which the classification boundaries in pattern classification problems are determined) in each input attribute from the numerical data. Moreover the methods for extracting the candidate rules which are associated with the partition intervals generated by statistical information and for minimizing the coupling problem between the candidate rules are additionally discussed. In order to show the effectiveness of the proposed method, we compared the classification accuracy of the proposed with those of conventional methods on the IRIS and New Thyroid Cancer data. From experimental results, we can confirm the fact that the proposed method only considering statistical information of the numerical patterns provides equal to or better classification accuracy than that of the conventional methods.

Selecting the Optimal Hidden Layer of Extreme Learning Machine Using Multiple Kernel Learning

  • Zhao, Wentao;Li, Pan;Liu, Qiang;Liu, Dan;Liu, Xinwang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.12
    • /
    • pp.5765-5781
    • /
    • 2018
  • Extreme learning machine (ELM) is emerging as a powerful machine learning method in a variety of application scenarios due to its promising advantages of high accuracy, fast learning speed and easy of implementation. However, how to select the optimal hidden layer of ELM is still an open question in the ELM community. Basically, the number of hidden layer nodes is a sensitive hyperparameter that significantly affects the performance of ELM. To address this challenging problem, we propose to adopt multiple kernel learning (MKL) to design a multi-hidden-layer-kernel ELM (MHLK-ELM). Specifically, we first integrate kernel functions with random feature mapping of ELM to design a hidden-layer-kernel ELM (HLK-ELM), which serves as the base of MHLK-ELM. Then, we utilize the MKL method to propose two versions of MHLK-ELMs, called sparse and non-sparse MHLK-ELMs. Both two types of MHLK-ELMs can effectively find out the optimal linear combination of multiple HLK-ELMs for different classification and regression problems. Experimental results on seven data sets, among which three data sets are relevant to classification and four ones are relevant to regression, demonstrate that the proposed MHLK-ELM achieves superior performance compared with conventional ELM and basic HLK-ELM.