• Title/Summary/Keyword: Feature selection optimization

Search Result 96, Processing Time 0.041 seconds

Feature Selection Using Submodular Approach for Financial Big Data

  • Attigeri, Girija;Manohara Pai, M.M.;Pai, Radhika M.
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1306-1325
    • /
    • 2019
  • As the world is moving towards digitization, data is generated from various sources at a faster rate. It is getting humungous and is termed as big data. The financial sector is one domain which needs to leverage the big data being generated to identify financial risks, fraudulent activities, and so on. The design of predictive models for such financial big data is imperative for maintaining the health of the country's economics. Financial data has many features such as transaction history, repayment data, purchase data, investment data, and so on. The main problem in predictive algorithm is finding the right subset of representative features from which the predictive model can be constructed for a particular task. This paper proposes a correlation-based method using submodular optimization for selecting the optimum number of features and thereby, reducing the dimensions of the data for faster and better prediction. The important proposition is that the optimal feature subset should contain features having high correlation with the class label, but should not correlate with each other in the subset. Experiments are conducted to understand the effect of the various subsets on different classification algorithms for loan data. The IBM Bluemix BigData platform is used for experimentation along with the Spark notebook. The results indicate that the proposed approach achieves considerable accuracy with optimal subsets in significantly less execution time. The algorithm is also compared with the existing feature selection and extraction algorithms.

Unsupervised feature selection using orthogonal decomposition and low-rank approximation

  • Lim, Hyunki
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.5
    • /
    • pp.77-84
    • /
    • 2022
  • In this paper, we propose a novel unsupervised feature selection method. Conventional unsupervised feature selection method defines virtual label and uses a regression analysis that projects the given data to this label. However, since virtual labels are generated from data, they can be formed similarly in the space. Thus, in the conventional method, the features can be selected in only restricted space. To solve this problem, in this paper, features are selected using orthogonal projections and low-rank approximations. To solve this problem, in this paper, a virtual label is projected to orthogonal space and the given data set is also projected to this space. Through this process, effective features can be selected. In addition, projection matrix is restricted low-rank to allow more effective features to be selected in low-dimensional space. To achieve these objectives, a cost function is designed and an efficient optimization method is proposed. Experimental results for six data sets demonstrate that the proposed method outperforms existing conventional unsupervised feature selection methods in most cases.

AutoFe-Sel: A Meta-learning based methodology for Recommending Feature Subset Selection Algorithms

  • Irfan Khan;Xianchao Zhang;Ramesh Kumar Ayyasam;Rahman Ali
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.7
    • /
    • pp.1773-1793
    • /
    • 2023
  • Automated machine learning, often referred to as "AutoML," is the process of automating the time-consuming and iterative procedures that are associated with the building of machine learning models. There have been significant contributions in this area across a number of different stages of accomplishing a data-mining task, including model selection, hyper-parameter optimization, and preprocessing method selection. Among them, preprocessing method selection is a relatively new and fast growing research area. The current work is focused on the recommendation of preprocessing methods, i.e., feature subset selection (FSS) algorithms. One limitation in the existing studies regarding FSS algorithm recommendation is the use of a single learner for meta-modeling, which restricts its capabilities in the metamodeling. Moreover, the meta-modeling in the existing studies is typically based on a single group of data characterization measures (DCMs). Nonetheless, there are a number of complementary DCM groups, and their combination will allow them to leverage their diversity, resulting in improved meta-modeling. This study aims to address these limitations by proposing an architecture for preprocess method selection that uses ensemble learning for meta-modeling, namely AutoFE-Sel. To evaluate the proposed method, we performed an extensive experimental evaluation involving 8 FSS algorithms, 3 groups of DCMs, and 125 datasets. Results show that the proposed method achieves better performance compared to three baseline methods. The proposed architecture can also be easily extended to other preprocessing method selections, e.g., noise-filter selection and imbalance handling method selection.

A Hybrid PSO-BPSO Based Kernel Extreme Learning Machine Model for Intrusion Detection

  • Shen, Yanping;Zheng, Kangfeng;Wu, Chunhua
    • Journal of Information Processing Systems
    • /
    • v.18 no.1
    • /
    • pp.146-158
    • /
    • 2022
  • With the success of the digital economy and the rapid development of its technology, network security has received increasing attention. Intrusion detection technology has always been a focus and hotspot of research. A hybrid model that combines particle swarm optimization (PSO) and kernel extreme learning machine (KELM) is presented in this work. Continuous-valued PSO and binary PSO (BPSO) are adopted together to determine the parameter combination and the feature subset. A fitness function based on the detection rate and the number of selected features is proposed. The results show that the method can simultaneously determine the parameter values and select features. Furthermore, competitive or better accuracy can be obtained using approximately one quarter of the raw input features. Experiments proved that our method is slightly better than the genetic algorithm-based KELM model.

Predicting Corporate Bankruptcy using Simulated Annealing-based Random Fores (시뮬레이티드 어니일링 기반의 랜덤 포레스트를 이용한 기업부도예측)

  • Park, Hoyeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.155-170
    • /
    • 2018
  • Predicting a company's financial bankruptcy is traditionally one of the most crucial forecasting problems in business analytics. In previous studies, prediction models have been proposed by applying or combining statistical and machine learning-based techniques. In this paper, we propose a novel intelligent prediction model based on the simulated annealing which is one of the well-known optimization techniques. The simulated annealing is known to have comparable optimization performance to the genetic algorithms. Nevertheless, since there has been little research on the prediction and classification of business decision-making problems using the simulated annealing, it is meaningful to confirm the usefulness of the proposed model in business analytics. In this study, we use the combined model of simulated annealing and machine learning to select the input features of the bankruptcy prediction model. Typical types of combining optimization and machine learning techniques are feature selection, feature weighting, and instance selection. This study proposes a combining model for feature selection, which has been studied the most. In order to confirm the superiority of the proposed model in this study, we apply the real-world financial data of the Korean companies and analyze the results. The results show that the predictive accuracy of the proposed model is better than that of the naïve model. Notably, the performance is significantly improved as compared with the traditional decision tree, random forests, artificial neural network, SVM, and logistic regression analysis.

Image Set Optimization for Real-Time Video Photomosaics (실시간 비디오 포토 모자이크를 위한 이미지 집합 최적화)

  • Choi, Yoon-Seok;Koo, Bon-Ki
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.502-507
    • /
    • 2009
  • We present a real-time photomosaics method for small image set optimized by feature selection method. Photomosaics is an image that is divided into cells (usually rectangular grids), each of which is replaced with another image of appropriate color, shape and texture pattern. This method needs large set of tile images which have various types of image pattern. But large amount of photo images requires high cost for pattern searching and large space for saving the images. These requirements can cause problems in the application to a real-time domain or mobile devices with limited resources. Our approach is a genetic feature selection method for building an optimized image set to accelerate pattern searching speed and minimize the memory cost.

  • PDF

Support vector machines with optimal instance selection: An application to bankruptcy prediction

  • Ahn Hyun-Chul;Kim Kyoung-Jae;Han In-Goo
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2006.06a
    • /
    • pp.167-175
    • /
    • 2006
  • Building accurate corporate bankruptcy prediction models has been one of the most important research issues in finance. Recently, support vector machines (SVMs) are popularly applied to bankruptcy prediction because of its many strong points. However, in order to use SVM, a modeler should determine several factors by heuristics, which hinders from obtaining accurate prediction results by using SVM. As a result, some researchers have tried to optimize these factors, especially the feature subset and kernel parameters of SVM But, there have been no studies that have attempted to determine appropriate instance subset of SVM, although it may improve the performance by eliminating distorted cases. Thus in the study, we propose the simultaneous optimization of the instance selection as well as the parameters of a kernel function of SVM by using genetic algorithms (GAs). Experimental results show that our model outperforms not only conventional SVM, but also prior approaches for optimizing SVM.

  • PDF

Feature selection using genetic algorithm for constructing time-series modelling

  • Oh, Sang-Keon;Hong, Sun-Gi;Kim, Chang-Hyun;Lee, Ju-Jang
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.102.4-102
    • /
    • 2001
  • An evolutionary structure optimization method for the Gaussian radial basis function (RBF) network is presented, for modelling and predicting nonlinear time series. Generalization performance is significantly improved with a much smaller network, compared with that of the usual clustering and least square learning method.

  • PDF

Design of Prototype-Based Emotion Recognizer Using Physiological Signals

  • Park, Byoung-Jun;Jang, Eun-Hye;Chung, Myung-Ae;Kim, Sang-Hyeob
    • ETRI Journal
    • /
    • v.35 no.5
    • /
    • pp.869-879
    • /
    • 2013
  • This study is related to the acquisition of physiological signals of human emotions and the recognition of human emotions using such physiological signals. To acquire physiological signals, seven emotions are evoked through stimuli. Regarding the induced emotions, the results of skin temperature, photoplethysmography, electrodermal activity, and an electrocardiogram are recorded and analyzed as physiological signals. The suitability and effectiveness of the stimuli are evaluated by the subjects themselves. To address the problem of the emotions not being recognized, we introduce a methodology for a recognizer using prototype-based learning and particle swarm optimization (PSO). The design involves two main phases: i) PSO selects the P% of the patterns to be treated as prototypes of the seven emotions; ii) PSO is instrumental in the formation of the core set of features. The experiments show that a suitable selection of prototypes and a substantial reduction of the feature space can be accomplished, and the recognizer formed in this manner is characterized by high recognition accuracy for the seven emotions using physiological signals.

An optimal feature selection algorithm for the network intrusion detection system (네트워크 침입 탐지를 위한 최적 특징 선택 알고리즘)

  • Jung, Seung-Hyun;Moon, Jun-Geol;Kang, Seung-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.10a
    • /
    • pp.342-345
    • /
    • 2014
  • Network intrusion detection system based on machine learning methods is quite dependent on the selected features in terms of accuracy and efficiency. Nevertheless, choosing the optimal combination of features from generally used features to detect network intrusion requires extensive computing resources. For instance, the number of possible feature combinations from given n features is $2^n-1$. In this paper, to tackle this problem we propose a optimal feature selection algorithm. Proposed algorithm is based on the local search algorithm, one of representative meta-heuristic algorithm for solving optimization problem. In addition, the accuracy of clusters which obtained using selected feature components and k-means clustering algorithm is adopted to evaluate a feature assembly. In order to estimate the performance of our proposed algorithm, comparing with a method where all features are used on NSL-KDD data set and multi-layer perceptron.

  • PDF