• Title/Summary/Keyword: Local feature selection

Search Result 59, Processing Time 0.028 seconds

New Feature Selection Method for Text Categorization

  • Wang, Xingfeng;Kim, Hee-Cheol
    • Journal of information and communication convergence engineering
    • /
    • v.15 no.1
    • /
    • pp.53-61
    • /
    • 2017
  • The preferred feature selection methods for text classification are filter-based. In a common filter-based feature selection scheme, unique scores are assigned to features; then, these features are sorted according to their scores. The last step is to add the top-N features to the feature set. In this paper, we propose an improved global feature selection scheme wherein its last step is modified to obtain a more representative feature set. The proposed method aims to improve the classification performance of global feature selection methods by creating a feature set representing all classes almost equally. For this purpose, a local feature selection method is used in the proposed method to label features according to their discriminative power on classes; these labels are used while producing the feature sets. Experimental results obtained using the well-known 20 Newsgroups and Reuters-21578 datasets with the k-nearest neighbor algorithm and a support vector machine indicate that the proposed method improves the classification performance in terms of a widely known metric ($F_1$).

Design of Lazy Classifier based on Fuzzy k-Nearest Neighbors and Reconstruction Error (퍼지 k-Nearest Neighbors 와 Reconstruction Error 기반 Lazy Classifier 설계)

  • Roh, Seok-Beom;Ahn, Tae-Chon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.1
    • /
    • pp.101-108
    • /
    • 2010
  • In this paper, we proposed a new lazy classifier with fuzzy k-nearest neighbors approach and feature selection which is based on reconstruction error. Reconstruction error is the performance index for locally linear reconstruction. When a new query point is given, fuzzy k-nearest neighbors approach defines the local area where the local classifier is available and assigns the weighting values to the data patterns which are involved within the local area. After defining the local area and assigning the weighting value, the feature selection is carried out to reduce the dimension of the feature space. When some features are selected in terms of the reconstruction error, the local classifier which is a sort of polynomial is developed using weighted least square estimation. In addition, the experimental application covers a comparative analysis including several previously commonly encountered methods such as standard neural networks, support vector machine, linear discriminant analysis, and C4.5 trees.

Identification of Chinese Event Types Based on Local Feature Selection and Explicit Positive & Negative Feature Combination

  • Tan, Hongye;Zhao, Tiejun;Wang, Haochang;Hong, Wan-Pyo
    • Journal of information and communication convergence engineering
    • /
    • v.5 no.3
    • /
    • pp.233-238
    • /
    • 2007
  • An approach to identify Chinese event types is proposed in this paper which combines a good feature selection policy and a Maximum Entropy (ME) model. The approach not only effectively alleviates the problem that classifier performs poorly on the small and difficult types, but improve overall performance. Experiments on the ACE2005 corpus show that performance is satisfying with the 83.5% macro - average F measure. The main characters and ideas of the approach are: (1) Optimal feature set is built for each type according to local feature selection, which fully ensures the performance of each type. (2) Positive and negative features are explicitly discriminated and combined by using one - sided metrics, which makes use of both features' advantages. (3) Wrapper methods are used to search new features and evaluate the various feature subsets to obtain the optimal feature subset.

Microblog User Geolocation by Extracting Local Words Based on Word Clustering and Wrapper Feature Selection

  • Tian, Hechan;Liu, Fenlin;Luo, Xiangyang;Zhang, Fan;Qiao, Yaqiong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.10
    • /
    • pp.3972-3988
    • /
    • 2020
  • Existing methods always rely on statistical features to extract local words for microblog user geolocation. There are many non-local words in extracted words, which makes geolocation accuracy lower. Considering the statistical and semantic features of local words, this paper proposes a microblog user geolocation method by extracting local words based on word clustering and wrapper feature selection. First, ordinary words without positional indications are initially filtered based on statistical features. Second, a word clustering algorithm based on word vectors is proposed. The remaining semantically similar words are clustered together based on the distance of word vectors with semantic meanings. Next, a wrapper feature selection algorithm based on sequential backward subset search is proposed. The cluster subset with the best geolocation effect is selected. Words in selected cluster subset are extracted as local words. Finally, the Naive Bayes classifier is trained based on local words to geolocate the microblog user. The proposed method is validated based on two different types of microblog data - Twitter and Weibo. The results show that the proposed method outperforms existing two typical methods based on statistical features in terms of accuracy, precision, recall, and F1-score.

Feature Selection via Embedded Learning Based on Tangent Space Alignment for Microarray Data

  • Ye, Xiucai;Sakurai, Tetsuya
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.4
    • /
    • pp.121-129
    • /
    • 2017
  • Feature selection has been widely established as an efficient technique for microarray data analysis. Feature selection aims to search for the most important feature/gene subset of a given dataset according to its relevance to the current target. Unsupervised feature selection is considered to be challenging due to the lack of label information. In this paper, we propose a novel method for unsupervised feature selection, which incorporates embedded learning and $l_{2,1}-norm$ sparse regression into a framework to select genes in microarray data analysis. Local tangent space alignment is applied during embedded learning to preserve the local data structure. The $l_{2,1}-norm$ sparse regression acts as a constraint to aid in learning the gene weights correlatively, by which the proposed method optimizes for selecting the informative genes which better capture the interesting natural classes of samples. We provide an effective algorithm to solve the optimization problem in our method. Finally, to validate the efficacy of the proposed method, we evaluate the proposed method on real microarray gene expression datasets. The experimental results demonstrate that the proposed method obtains quite promising performance.

Construction of Composite Feature Vector Based on Discriminant Analysis for Face Recognition (얼굴인식을 위한 판별분석에 기반한 복합특징 벡터 구성 방법)

  • Choi, Sang-Il
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.7
    • /
    • pp.834-842
    • /
    • 2015
  • We propose a method to construct composite feature vector based on discriminant analysis for face recognition. For this, we first extract the holistic- and local-features from whole face images and local images, which consist of the discriminant pixels, by using a discriminant feature extraction method. In order to utilize both advantages of holistic- and local-features, we evaluate the amount of the discriminative information in each feature and then construct a composite feature vector with only the features that contain a large amount of discriminative information. The experimental results for the FERET, CMU-PIE and Yale B databases show that the proposed composite feature vector has improvement of face recognition performance.

Performance Evaluation of a Feature-Importance-based Feature Selection Method for Time Series Prediction

  • Hyun, Ahn
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.1
    • /
    • pp.82-89
    • /
    • 2023
  • Various machine-learning models may yield high predictive power for massive time series for time series prediction. However, these models are prone to instability in terms of computational cost because of the high dimensionality of the feature space and nonoptimized hyperparameter settings. Considering the potential risk that model training with a high-dimensional feature set can be time-consuming, we evaluate a feature-importance-based feature selection method to derive a tradeoff between predictive power and computational cost for time series prediction. We used two machine learning techniques for performance evaluation to generate prediction models from a retail sales dataset. First, we ranked the features using impurity- and Local Interpretable Model-agnostic Explanations (LIME) -based feature importance measures in the prediction models. Then, the recursive feature elimination method was applied to eliminate unimportant features sequentially. Consequently, we obtained a subset of features that could lead to reduced model training time while preserving acceptable model performance.

Hybrid Genetic Algorithms for Feature Selection and Classification Performance Comparisons (특징 선택을 위한 혼합형 유전 알고리즘과 분류 성능 비교)

  • 오일석;이진선;문병로
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.8
    • /
    • pp.1113-1120
    • /
    • 2004
  • This paper proposes a novel hybrid genetic algorithm for the feature selection. Local search operations are devised and embedded in hybrid GAs to fine-tune the search. The operations are parameterized in terms of the fine-tuning power, and their effectiveness and timing requirement are analyzed and compared. Experimentations performed with various standard datasets revealed that the proposed hybrid GA is superior to a simple GA and sequential search algorithms.

Analysis of Problem Spaces and Algorithm Behaviors for Feature Selection (특징 선택을 위한 문제 공간과 알고리즘 동작 분석)

  • Lee Jin-Seon;Oh Il-Seok
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.6
    • /
    • pp.574-579
    • /
    • 2006
  • The feature selection algorithms should broadly and efficiently explore the huge problem spaces to find a good solution. This paper attempts to gain insights on the fitness landscape of the spaces and to improve search capability of the algorithms. We investigate the solution spaces in terms of statistics on local maxima and minima. We also analyze behaviors of the existing algorithms and improve their solutions.

Combined Features with Global and Local Features for Gas Classification

  • Choi, Sang-Il
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.9
    • /
    • pp.11-18
    • /
    • 2016
  • In this paper, we propose a gas classification method using combined features for an electronic nose system that performs well even when some loss occurs in measuring data samples. We first divide the entire measurement for a data sample into three local sections, which are the stabilization, exposure, and purge; local features are then extracted from each section. Based on the discrimination analysis, measurements of the discriminative information amounts are taken. Subsequently, the local features that have a large amount of discriminative information are chosen to compose the combined features together with the global features that extracted from the entire measurement section of the data sample. The experimental results show that the combined features by the proposed method gives better classification performance for a variety of volatile organic compound data than the other feature types, especially when there is data loss.