• Title/Summary/Keyword: Multi-label Feature Selection

Search Result 7, Processing Time 0.022 seconds

Effective Multi-label Feature Selection based on Large Offspring Set created by Enhanced Evolutionary Search Process

  • Lim, Hyunki;Seo, Wangduk;Lee, Jaesung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.9
    • /
    • pp.7-13
    • /
    • 2018
  • Recent advancement in data gathering technique improves the capability of information collecting, thus allowing the learning process between gathered data patterns and application sub-tasks. A pattern can be associated with multiple labels, demanding multi-label learning capability, resulting in significant attention to multi-label feature selection since it can improve multi-label learning accuracy. However, existing evolutionary multi-label feature selection methods suffer from ineffective search process. In this study, we propose a evolutionary search process for the task of multi-label feature selection problem. The proposed method creates large set of offspring or new feature subsets and then retains the most promising feature subset. Experimental results demonstrate that the proposed method can identify feature subsets giving good multi-label classification accuracy much faster than conventional methods.

Sparse and low-rank feature selection for multi-label learning

  • Lim, Hyunki
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.7
    • /
    • pp.1-7
    • /
    • 2021
  • In this paper, we propose a feature selection technique for multi-label classification. Many existing feature selection techniques have selected features by calculating the relation between features and labels such as a mutual information scale. However, since the mutual information measure requires a joint probability, it is difficult to calculate the joint probability from an actual premise feature set. Therefore, it has the disadvantage that only a few features can be calculated and only local optimization is possible. Away from this regional optimization problem, we propose a feature selection technique that constructs a low-rank space in the entire given feature space and selects features with sparsity. To this end, we designed a regression-based objective function using Nuclear norm, and proposed an algorithm of gradient descent method to solve the optimization problem of this objective function. Based on the results of multi-label classification experiments on four data and three multi-label classification performance, the proposed methodology showed better performance than the existing feature selection technique. In addition, it was showed by experimental results that the performance change is insensitive even to the parameter value change of the proposed objective function.

Exploring the Performance of Multi-Label Feature Selection for Effective Decision-Making: Focusing on Sentiment Analysis (효과적인 의사결정을 위한 다중레이블 기반 속성선택 방법에 관한 연구: 감성 분석을 중심으로)

  • Jong Yoon Won;Kun Chang Lee
    • Information Systems Review
    • /
    • v.25 no.1
    • /
    • pp.47-73
    • /
    • 2023
  • Management decision-making based on artificial intelligence(AI) plays an important role in helping decision-makers. Business decision-making centered on AI is evaluated as a driving force for corporate growth. AI-based on accurate analysis techniques could support decision-makers in making high-quality decisions. This study proposes an effective decision-making method with the application of multi-label feature selection. In this regard, We present a CFS-BR (Correlation-based Feature Selection based on Binary Relevance approach) that reduces data sets in high-dimensional space. As a result of analyzing sample data and empirical data, CFS-BR can support efficient decision-making by selecting the best combination of meaningful attributes based on the Best-First algorithm. In addition, compared to the previous multi-label feature selection method, CFS-BR is useful for increasing the effectiveness of decision-making, as its accuracy is higher.

Approach to diagnosing multiple abnormal events with single-event training data

  • Ji Hyeon Shin;Seung Gyu Cho;Seo Ryong Koo;Seung Jun Lee
    • Nuclear Engineering and Technology
    • /
    • v.56 no.2
    • /
    • pp.558-567
    • /
    • 2024
  • Diagnostic support systems are being researched to assist operators in identifying and responding to abnormal events in a nuclear power plant. Most studies to date have considered single abnormal events only, for which it is relatively straightforward to obtain data to train the deep learning model of the diagnostic support system. However, cases in which multiple abnormal events occur must also be considered, for which obtaining training data becomes difficult due to the large number of combinations of possible abnormal events. This study proposes an approach to maintain diagnostic performance for multiple abnormal events by training a deep learning model with data on single abnormal events only. The proposed approach is applied to an existing algorithm that can perform feature selection and multi-label classification. We choose an extremely randomized trees classifier to select dedicated monitoring parameters for target abnormal events. In diagnosing each event occurrence independently, two-channel convolutional neural networks are employed as sub-models. The algorithm was tested in a case study with various scenarios, including single and multiple abnormal events. Results demonstrated that the proposed approach maintained diagnostic performance for 15 single abnormal events and significantly improved performance for 105 multiple abnormal events compared to the base model.

Feature selection for text data via topic modeling (토픽 모형을 이용한 텍스트 데이터의 단어 선택)

  • Woosol, Jang;Ye Eun, Kim;Won, Son
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.6
    • /
    • pp.739-754
    • /
    • 2022
  • Usually, text data consists of many variables, and some of them are closely correlated. Such multi-collinearity often results in inefficient or inaccurate statistical analysis. For supervised learning, one can select features by examining the relationship between target variables and explanatory variables. On the other hand, for unsupervised learning, since target variables are absent, one cannot use such a feature selection procedure as in supervised learning. In this study, we propose a word selection procedure that employs topic models to find latent topics. We substitute topics for the target variables and select terms which show high relevance for each topic. Applying the procedure to real data, we found that the proposed word selection procedure can give clear topic interpretation by removing high-frequency words prevalent in various topics. In addition, we observed that, by applying the selected variables to the classifiers such as naïve Bayes classifiers and support vector machines, the proposed feature selection procedure gives results comparable to those obtained by using class label information.

An Analytical Study on Automatic Classification of Domestic Journal articles Using Random Forest (랜덤포레스트를 이용한 국내 학술지 논문의 자동분류에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.2
    • /
    • pp.57-77
    • /
    • 2019
  • Random Forest (RF), a representative ensemble technique, was applied to automatic classification of journal articles in the field of library and information science. Especially, I performed various experiments on the main factors such as tree number, feature selection, and learning set size in terms of classification performance that automatically assigns class labels to domestic journals. Through this, I explored ways to optimize the performance of random forests (RF) for imbalanced datasets in real environments. Consequently, for the automatic classification of domestic journal articles, Random Forest (RF) can be expected to have the best classification performance when using tree number interval 100~1000(C), small feature set (10%) based on chi-square statistic (CHI), and most learning sets (9-10 years).

A Nature-inspired Multiple Kernel Extreme Learning Machine Model for Intrusion Detection

  • Shen, Yanping;Zheng, Kangfeng;Wu, Chunhua;Yang, Yixian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.2
    • /
    • pp.702-723
    • /
    • 2020
  • The application of machine learning (ML) in intrusion detection has attracted much attention with the rapid growth of information security threat. As an efficient multi-label classifier, kernel extreme learning machine (KELM) has been gradually used in intrusion detection system. However, the performance of KELM heavily relies on the kernel selection. In this paper, a novel multiple kernel extreme learning machine (MKELM) model combining the ReliefF with nature-inspired methods is proposed for intrusion detection. The MKELM is designed to estimate whether the attack is carried out and the ReliefF is used as a preprocessor of MKELM to select appropriate features. In addition, the nature-inspired methods whose fitness functions are defined based on the kernel alignment are employed to build the optimal composite kernel in the MKELM. The KDD99, NSL and Kyoto datasets are used to evaluate the performance of the model. The experimental results indicate that the optimal composite kernel function can be determined by using any heuristic optimization method, including PSO, GA, GWO, BA and DE. Since the filter-based feature selection method is combined with the multiple kernel learning approach independent of the classifier, the proposed model can have a good performance while saving a lot of training time.