• Title/Summary/Keyword: Feature elimination method

Search Result 46, Processing Time 0.169 seconds

A Design of an Optimized Classifier based on Feature Elimination for Gene Selection (유전자 선택을 위해 속성 삭제에 기반을 둔 최적화된 분류기 설계)

  • Lee, Byung-Kwan;Park, Seok-Gyu;Tifani, Yusrina
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.8 no.5
    • /
    • pp.384-393
    • /
    • 2015
  • This paper proposes an optimized classifier based on feature elimination (OCFE) for gene selection with combining two feature elimination methods, ReliefF and SVM-RFE. ReliefF algorithm is filter feature selection which rank the data by the importance of the data. SVM-RFE algorithm is a wrapper feature selection which wrapped the data and rank the data based on the weight of feature. With combining these two methods we get less error rate average, 0.3016138 for OCFE and 0.3096779 for SVM-RFE. The proposed method also get better accuracy with 70% for OCFE and 69% for SVM-RFE.

Gene Selection Based on Support Vector Machine using Bootstrap (붓스트랩 방법을 활용한 SVM 기반 유전자 선택 기법)

  • Song, Seuck-Heun;Kim, Kyoung-Hee;Park, Chang-Yi;Koo, Ja-Yong
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.3
    • /
    • pp.531-540
    • /
    • 2007
  • The recursive feature elimination for support vector machine is known to be useful in selecting relevant genes. Since the criterion for choosing relevant genes is the absolute value of a coefficient, the recursive feature elimination may suffer from a scaling problem. We propose a modified version of the recursive feature elimination algorithm using bootstrap. In our method, the criterion for determining relevant genes is the absolute value of a coefficient divided by its standard error, which accounts for statistical variability of the coefficient. Through numerical examples, we illustrate that our method is effective in gene selection.

Performance Evaluation of a Feature-Importance-based Feature Selection Method for Time Series Prediction

  • Hyun, Ahn
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.1
    • /
    • pp.82-89
    • /
    • 2023
  • Various machine-learning models may yield high predictive power for massive time series for time series prediction. However, these models are prone to instability in terms of computational cost because of the high dimensionality of the feature space and nonoptimized hyperparameter settings. Considering the potential risk that model training with a high-dimensional feature set can be time-consuming, we evaluate a feature-importance-based feature selection method to derive a tradeoff between predictive power and computational cost for time series prediction. We used two machine learning techniques for performance evaluation to generate prediction models from a retail sales dataset. First, we ranked the features using impurity- and Local Interpretable Model-agnostic Explanations (LIME) -based feature importance measures in the prediction models. Then, the recursive feature elimination method was applied to eliminate unimportant features sequentially. Consequently, we obtained a subset of features that could lead to reduced model training time while preserving acceptable model performance.

Cast-Shadow Elimination of Vehicle Objects Using Backpropagation Neural Network (신경망을 이용한 차량 객체의 그림자 제거)

  • Jeong, Sung-Hwan;Lee, Jun-Whoan
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.7 no.1
    • /
    • pp.32-41
    • /
    • 2008
  • The moving object tracking in vision based observation using video uses difference method between GMM(Gaussian Mixture Model) based background and present image. In the case of racking object using binary image made by threshold, the object is merged not by object information but by Cast-Shadow. This paper proposed the method that eliminates Cast-Shadow using backpropagation Neural Network. The neural network is trained by abstracting feature value form training image of object range in 10-movies and Cast-Shadow range. The method eliminating Cast-Shadow is based on the method distinguishing shadow from binary image, its Performance is better(16.2%, 38.2%, 28.1%, 22.3%, 44.4%) than existing Cast-Shadow elimination algorithm(SNP, SP, DNM1, DNM2, CNCC).

  • PDF

Prediction on the Ratio of Added Value in Industry Using Forecasting Combination based on Machine Learning Method (머신러닝 기법 기반의 예측조합 방법을 활용한 산업 부가가치율 예측 연구)

  • Kim, Jeong-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.12
    • /
    • pp.49-57
    • /
    • 2020
  • This study predicts the ratio of added value, which represents the competitiveness of export industries in South Korea, using various machine learning techniques. To enhance the accuracy and stability of prediction, forecast combination technique was applied to predicted values of machine learning techniques. In particular, this study improved the efficiency of the prediction process by selecting key variables out of many variables using recursive feature elimination method and applying them to machine learning techniques. As a result, it was found that the predicted value by the forecast combination method was closer to the actual value than the predicted values of the machine learning techniques. In addition, the forecast combination method showed stable prediction results unlike volatile predicted values by machine learning techniques.

Generalization of the Stream Network by the Geographic Hierarchy of Landform Data (지형자료의 계층화를 이용한 하계망 일반화)

  • Kim Nam-Shin
    • Journal of the Korean Geographical Society
    • /
    • v.40 no.4 s.109
    • /
    • pp.441-453
    • /
    • 2005
  • This study aims to generalize the stream network developing algorithm of the geographic hierarchy Stream networks with hierarchy system should be spatially hierarchized in linear features. The generalization procedure of the stream networks are composed of the hierarchy of stream, selection and elimination, and algorithm. Working of stream networks is composed by the decision of direction on stream networks, ranking of stroke segments, and ordering by the strahler method, using geographic data query for controlling selection and elimination of the linear feature by scale. Improved Simoo algorithm was effective in enhancement and decreasing curvature of linear features. Resultantly, it is expected to improve generalization of features with various spatial hierarchy.

Diagnosis of Alzheimer's Disease using Combined Feature Selection Method

  • Faisal, Fazal Ur Rehman;Khatri, Uttam;Kwon, Goo-Rak
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.5
    • /
    • pp.667-675
    • /
    • 2021
  • The treatments for symptoms of Alzheimer's disease are being provided and for the early diagnosis several researches are undergoing. In this regard, by using T1-weighted images several classification techniques had been proposed to distinguish among AD, MCI, and Healthy Control (HC) patients. In this paper, we also used some traditional Machine Learning (ML) approaches in order to diagnose the AD. This paper consists of an improvised feature selection method which is used to reduce the model complexity which accounted an issue while utilizing the ML approaches. In our presented work, combination of subcortical and cortical features of 308 subjects of ADNI dataset has been used to diagnose AD using structural magnetic resonance (sMRI) images. Three classification experiments were performed: binary classification. i.e., AD vs eMCI, AD vs lMCI, and AD vs HC. Proposed Feature Selection method consist of a combination of Principal Component Analysis and Recursive Feature Elimination method that has been used to reduce the dimension size and selection of best features simultaneously. Experiment on the dataset demonstrated that SVM is best suited for the AD vs lMCI, AD vs HC, and AD vs eMCI classification with the accuracy of 95.83%, 97.83%, and 97.87% respectively.

Korean Character Recognition by the Extraction of Feature Points and Neural Chip Design for its Preprocessing (특징점 추출에 의한 한글 문자 인식 및 전처리용 신경 칩의 설계)

  • 김종렬;정호선;이우일
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.6
    • /
    • pp.929-936
    • /
    • 1990
  • This paper describes the method of the Korean character recognition by means of feature points extraction. Also, the preprocessing neural chip for noise elimination, smoothing, thinning and feature point extraction has been designs. The subpatterns were separated by means of advanced index algorithm using mask, and recognized by means of feature points classification. The separation of the Korean character subpatterns was abtained about 97%, and the recognition of the Korean characters was abtained about 95%. The preprocessing neural chip was simulated on SPICE and layouted by double CMOS 2\ulcorner design rule.

  • PDF

Application of Random Forest Algorithm for the Decision Support System of Medical Diagnosis with the Selection of Significant Clinical Test (의료진단 및 중요 검사 항목 결정 지원 시스템을 위한 랜덤 포레스트 알고리즘 적용)

  • Yun, Tae-Gyun;Yi, Gwan-Su
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.6
    • /
    • pp.1058-1062
    • /
    • 2008
  • In clinical decision support system(CDSS), unlike rule-based expert method, appropriate data-driven machine learning method can easily provide the information of individual feature(clinical test) for disease classification. However, currently developed methods focus on the improvement of the classification accuracy for diagnosis. With the analysis of feature importance in classification, one may infer the novel clinical test sets which highly differentiate the specific diseases or disease states. In this background, we introduce a novel CDSS that integrate a classifier and feature selection module together. Random forest algorithm is applied for the classifier and the feature importance measure. The system selects the significant clinical tests discriminating the diseases by examining the classification error during backward elimination of the features. The superior performance of random forest algorithm in clinical classification was assessed against artificial neural network and decision tree algorithm by using breast cancer, diabetes and heart disease data in UCI Machine Learning Repository. The test with the same data sets shows that the proposed system can successfully select the significant clinical test set for each disease.

Landslide susceptibility assessment using feature selection-based machine learning models

  • Liu, Lei-Lei;Yang, Can;Wang, Xiao-Mi
    • Geomechanics and Engineering
    • /
    • v.25 no.1
    • /
    • pp.1-16
    • /
    • 2021
  • Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The large number of inputs or conditioning factors for these models, however, can reduce the computation efficiency and increase the difficulty in collecting data. Feature selection is a good tool to address this problem by selecting the most important features among all factors to reduce the size of the input variables. However, two important questions need to be solved: (1) how do feature selection methods affect the performance of machine learning models? and (2) which feature selection method is the most suitable for a given machine learning model? This paper aims to address these two questions by comparing the predictive performance of 13 feature selection-based machine learning (FS-ML) models and 5 ordinary machine learning models on LSA. First, five commonly used machine learning models (i.e., logistic regression, support vector machine, artificial neural network, Gaussian process and random forest) and six typical feature selection methods in the literature are adopted to constitute the proposed models. Then, fifteen conditioning factors are chosen as input variables and 1,017 landslides are used as recorded data. Next, feature selection methods are used to obtain the importance of the conditioning factors to create feature subsets, based on which 13 FS-ML models are constructed. For each of the machine learning models, a best optimized FS-ML model is selected according to the area under curve value. Finally, five optimal FS-ML models are obtained and applied to the LSA of the studied area. The predictive abilities of the FS-ML models on LSA are verified and compared through the receive operating characteristic curve and statistical indicators such as sensitivity, specificity and accuracy. The results showed that different feature selection methods have different effects on the performance of LSA machine learning models. FS-ML models generally outperform the ordinary machine learning models. The best FS-ML model is the recursive feature elimination (RFE) optimized RF, and RFE is an optimal method for feature selection.