• Title/Summary/Keyword: Feature Set Selection

Search Result 186, Processing Time 0.021 seconds

Prototype-based Classifier with Feature Selection and Its Design with Particle Swarm Optimization: Analysis and Comparative Studies

  • Park, Byoung-Jun;Oh, Sung-Kwun
    • Journal of Electrical Engineering and Technology
    • /
    • v.7 no.2
    • /
    • pp.245-254
    • /
    • 2012
  • In this study, we introduce a prototype-based classifier with feature selection that dwells upon the usage of a biologically inspired optimization technique of Particle Swarm Optimization (PSO). The design comprises two main phases. In the first phase, PSO selects P % of patterns to be treated as prototypes of c classes. During the second phase, the PSO is instrumental in the formation of a core set of features that constitute a collection of the most meaningful and highly discriminative coordinates of the original feature space. The proposed scheme of feature selection is developed in the wrapper mode with the performance evaluated with the aid of the nearest prototype classifier. The study offers a complete algorithmic framework and demonstrates the effectiveness (quality of solution) and efficiency (computing cost) of the approach when applied to a collection of selected data sets. We also include a comparative study which involves the usage of genetic algorithms (GAs). Numerical experiments show that a suitable selection of prototypes and a substantial reduction of the feature space could be accomplished and the classifier formed in this manner becomes characterized by low classification error. In addition, the advantage of the PSO is quantified in detail by running a number of experiments using Machine Learning datasets.

Image Set Optimization for Real-Time Video Photomosaics (실시간 비디오 포토 모자이크를 위한 이미지 집합 최적화)

  • Choi, Yoon-Seok;Koo, Bon-Ki
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.502-507
    • /
    • 2009
  • We present a real-time photomosaics method for small image set optimized by feature selection method. Photomosaics is an image that is divided into cells (usually rectangular grids), each of which is replaced with another image of appropriate color, shape and texture pattern. This method needs large set of tile images which have various types of image pattern. But large amount of photo images requires high cost for pattern searching and large space for saving the images. These requirements can cause problems in the application to a real-time domain or mobile devices with limited resources. Our approach is a genetic feature selection method for building an optimized image set to accelerate pattern searching speed and minimize the memory cost.

  • PDF

A New Variable Selection Method Based on Mutual Information Maximization by Replacing Collinear Variables for Nonlinear Quantitative Structure-Property Relationship Models

  • Ghasemi, Jahan B.;Zolfonoun, Ehsan
    • Bulletin of the Korean Chemical Society
    • /
    • v.33 no.5
    • /
    • pp.1527-1535
    • /
    • 2012
  • Selection of the most informative molecular descriptors from the original data set is a key step for development of quantitative structure activity/property relationship models. Recently, mutual information (MI) has gained increasing attention in feature selection problems. This paper presents an effective mutual information-based feature selection approach, named mutual information maximization by replacing collinear variables (MIMRCV), for nonlinear quantitative structure-property relationship models. The proposed variable selection method was applied to three different QSPR datasets, soil degradation half-life of 47 organophosphorus pesticides, GC-MS retention times of 85 volatile organic compounds, and water-to-micellar cetyltrimethylammonium bromide partition coefficients of 62 organic compounds.The obtained results revealed that using MIMRCV as feature selection method improves the predictive quality of the developed models compared to conventional MI based variable selection algorithms.

Identification of Chinese Event Types Based on Local Feature Selection and Explicit Positive & Negative Feature Combination

  • Tan, Hongye;Zhao, Tiejun;Wang, Haochang;Hong, Wan-Pyo
    • Journal of information and communication convergence engineering
    • /
    • v.5 no.3
    • /
    • pp.233-238
    • /
    • 2007
  • An approach to identify Chinese event types is proposed in this paper which combines a good feature selection policy and a Maximum Entropy (ME) model. The approach not only effectively alleviates the problem that classifier performs poorly on the small and difficult types, but improve overall performance. Experiments on the ACE2005 corpus show that performance is satisfying with the 83.5% macro - average F measure. The main characters and ideas of the approach are: (1) Optimal feature set is built for each type according to local feature selection, which fully ensures the performance of each type. (2) Positive and negative features are explicitly discriminated and combined by using one - sided metrics, which makes use of both features' advantages. (3) Wrapper methods are used to search new features and evaluate the various feature subsets to obtain the optimal feature subset.

A Study on Predicting TDI(Trophic Diatom Index) in tributaries of Han river basin using Correlation-based Feature Selection technique and Random Forest algorithm (Correlation-based Feature Selection 기법과 Random Forest 알고리즘을 이용한 한강유역 지류의 TDI 예측 연구)

  • Kim, Minkyu;Yoon, Chun Gyeong;Rhee, Han-Pil;Hwang, Soon-Jin;Lee, Sang-Woo
    • Journal of Korean Society on Water Environment
    • /
    • v.35 no.5
    • /
    • pp.432-438
    • /
    • 2019
  • The purpose of this study is to predict Trophic Diatom Index (TDI) in tributaries of the Han River watershed using the random forest algorithm. The one year (2017) and supplied aquatic ecology health data were used. The data includes water quality(BOD, T-N, $NH_3-N$, T-P, $PO_4-P$, water temperature, DO, pH, conductivity, turbidity), hydraulic factors(water width, average water depth, average velocity of water), and TDI score. Seven factors including water temperature, BOD, T-N, $NH_3-N$, T-P, $PO_4-P$, and average water depth are selected by the Correlation Feature Selection. A TDI prediction model was generated by random forest using the seven factors. To evaluate this model, 2017 data set was used first. As a result of the evaluation, $R^2$, % Difference, NSE(Nash-Sutcliffe Efficiency), RMSE(Root Mean Square Error) and accuracy rate show that this model is compatible with predicting TDI. To be more concrete, $R^2$ is 0.93, % Difference is -0.37, NSE is 0.89, RMSE is 8.22 and accuracy rate is 70.4%. Also, additional evaluation using data set more than 17 times the measured point was performed. The results were similar when the 2017 data set were used. The Wilcoxon Signed Ranks Test shows there was no statistically significant difference between actual and predicted data for the 2017 data set. These results can specify the elements which probably affect aquatic ecology health. Also, these will provide direction relative to water quality management for a watershed that must be continuously preserved.

A Feature Vector Selection Method for Cancer Classification

  • Yun, Zheng;Keong, Kwoh-Chee
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.23-28
    • /
    • 2005
  • The high-dimensionality and insufficiency of gene expression profiles and proteomic profiles makes feature selection become a critical step in efficiently building accurate models for cancer problems based on such data sets. In this paper, we use a method, called Discrete Function Learning algorithm, to find discriminatory feature vectors based on information theory. The target feature vectors contain all or most information (in terms of entropy) of the class attribute. Two data sets are selected to validate our approach, one leukemia subtype gene expression data set and one ovarian cancer proteomic data set. The experimental results show that the our method generalizes well when applied to these insufficient and high-dimensional data sets. Furthermore, the obtained classifiers are highly understandable and accurate.

  • PDF

Classification of Epilepsy Using Distance-Based Feature Selection (거리 기반의 특징 선택을 이용한 간질 분류)

  • Lee, Sang-Hong
    • Journal of Digital Convergence
    • /
    • v.12 no.8
    • /
    • pp.321-327
    • /
    • 2014
  • Feature selection is the technique to improve the classification performance by using a minimal set by removing features that are not related with each other and characterized by redundancy. This study proposed new feature selection using the distance between the center of gravity of the bounded sum of weighted fuzzy membership functions (BSWFMs) provided by the neural network with weighted fuzzy membership functions (NEWFM) in order to improve the classification performance. The distance-based feature selection selects the minimum features by removing the worst features with the shortest distance between the center of gravity of BSWFMs from the 24 initial features one by one, and then 22 minimum features are selected with the highest performance result. The proposed methodology shows that sensitivity, specificity, and accuracy are 97.7%, 99.7%, and 98.7% with 22 minimum features, respectively.

Feature Filtering Methods for Web Documents Clustering (웹 문서 클러스터링에서의 자질 필터링 방법)

  • Park Heum;Kwon Hyuk-Chul
    • The KIPS Transactions:PartB
    • /
    • v.13B no.4 s.107
    • /
    • pp.489-498
    • /
    • 2006
  • Clustering results differ according to the datasets and the performance worsens even while using web documents which are manually processed by an indexer, because although representative clusters for a feature can be obtained by statistical feature selection methods, irrelevant features(i.e., non-obvious features and those appearing in general documents) are not eliminated. Those irrelevant features should be eliminated for improving clustering performance. Therefore, this paper proposes three feature-filtering algorithms which consider feature values per document set, together with distribution, frequency, and weights of features per document set: (l) features filtering algorithm in a document (FFID), (2) features filtering algorithm in a document matrix (FFIM), and (3) a hybrid method combining both FFID and FFIM (HFF). We have tested the clustering performance by feature selection using term frequency and expand co link information, and by feature filtering using the above methods FFID, FFIM, HFF methods. According to the results of our experiments, HFF had the best performance, whereas FFIM performed better than FFID.

Prioritizing Maintenance of Naval Command and Control System Using Feature Selection

  • Choi, Junhyeong;Kang, Dongsu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.11
    • /
    • pp.219-228
    • /
    • 2019
  • Naval command and control system are very important for operation and their failures can be fatal for warfare. To prepare for these failures, we use feature selection method which is one of the data mining techniques. First, we analyzes failure data set of Navy from 2016 to 2018. And then We derive attributes that are associated with failure and to predict failure using feature selection method. We propose a method for prioritizing maintenance using the degree of association of attributes. This improves the efficiency and economics of command and control system maintenance.

Feature Selection Effect of Classification Tree Using Feature Importance : Case of Credit Card Customer Churn Prediction (특성중요도를 활용한 분류나무의 입력특성 선택효과 : 신용카드 고객이탈 사례)

  • Yoon Hanseong
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.20 no.2
    • /
    • pp.1-10
    • /
    • 2024
  • For the purpose of predicting credit card customer churn accurately through data analysis, a model can be constructed with various machine learning algorithms, including decision tree. And feature importance has been utilized in selecting better input features that can improve performance of data analysis models for several application areas. In this paper, a method of utilizing feature importance calculated from the MDI method and its effects are investigated in the credit card customer churn prediction problem with classification trees. Compared with several random feature selections from case data, a set of input features selected from higher value of feature importance shows higher predictive power. It can be an efficient method for classifying and choosing input features necessary for improving prediction performance. The method organized in this paper can be an alternative to the selection of input features using feature importance in composing and using classification trees, including credit card customer churn prediction.