• 제목/요약/키워드: feature vector selection

Search Result 184, Processing Time 0.024 seconds

Variable Selection of Feature Pattern using SVM-based Criterion with Q-Learning in Reinforcement Learning (SVM-기반 제약 조건과 강화학습의 Q-learning을 이용한 변별력이 확실한 특징 패턴 선택)

  • Kim, Chayoung
    • Journal of Internet Computing and Services
    • /
    • v.20 no.4
    • /
    • pp.21-27
    • /
    • 2019
  • Selection of feature pattern gathered from the observation of the RNA sequencing data (RNA-seq) are not all equally informative for identification of differential expressions: some of them may be noisy, correlated or irrelevant because of redundancy in Big-Data sets. Variable selection of feature pattern aims at differential expressed gene set that is significantly relevant for a special task. This issues are complex and important in many domains, for example. In terms of a computational research field of machine learning, selection of feature pattern has been studied such as Random Forest, K-Nearest and Support Vector Machine (SVM). One of most the well-known machine learning algorithms is SVM, which is classical as well as original. The one of a member of SVM-criterion is Support Vector Machine-Recursive Feature Elimination (SVM-RFE), which have been utilized in our research work. We propose a novel algorithm of the SVM-RFE with Q-learning in reinforcement learning for better variable selection of feature pattern. By comparing our proposed algorithm with the well-known SVM-RFE combining Welch' T in published data, our result can show that the criterion from weight vector of SVM-RFE enhanced by Q-learning has been improved by an off-policy by a more exploratory scheme of Q-learning.

Linear SVM-Based Android Malware Detection and Feature Selection for Performance Improvement (선형 SVM을 사용한 안드로이드 기반의 악성코드 탐지 및 성능 향상을 위한 Feature 선정)

  • Kim, Ki-Hyun;Choi, Mi-Jung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.39C no.8
    • /
    • pp.738-745
    • /
    • 2014
  • Recently, mobile users continuously increase, and mobile applications also increase As mobile applications increase, the mobile users used to store sensitive and private information such as Bank information, location information, ID, password on their mobile devices. Therefore, recent malicious application targeted to mobile device instead of PC environment is increasing. In particular, since the Android is an open platform and includes security vulnerabilities, attackers prefer this environment. This paper analyzes the performance of malware detection system applying linear SVM machine learning classifier to detect Android malware application. This paper also performs feature selection in order to improve detection performance.

Feature Selection Based on Class Separation in Handwritten Numeral Recognition Using Neural Network (신경망을 이용한 필기 숫자 인식에서 부류 분별에 기반한 특징 선택)

  • Lee, Jin-Seon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.2
    • /
    • pp.543-551
    • /
    • 1999
  • The primary purposes in this paper are to analyze the class separation of features in handwritten numeral recognition and to make use of the results in feature selection. Using the Parzen window technique, we compute the class distributions and define the class separation to be the overlapping distance of two class distributions. The dimension of a feature vector is reduced by removing the void or redundant feature cells based on the class separation information. The experiments have been performed on the CENPARMI handwritten numeral database, and partial classification and full classification have been tested. The results show that the class separation is very effective for the feature selection in the 10-class handwritten numeral recognition problem since we could reduce the dimension of the original 256-dimensional feature vector by 22%.

  • PDF

Design of Lazy Classifier based on Fuzzy k-Nearest Neighbors and Reconstruction Error (퍼지 k-Nearest Neighbors 와 Reconstruction Error 기반 Lazy Classifier 설계)

  • Roh, Seok-Beom;Ahn, Tae-Chon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.1
    • /
    • pp.101-108
    • /
    • 2010
  • In this paper, we proposed a new lazy classifier with fuzzy k-nearest neighbors approach and feature selection which is based on reconstruction error. Reconstruction error is the performance index for locally linear reconstruction. When a new query point is given, fuzzy k-nearest neighbors approach defines the local area where the local classifier is available and assigns the weighting values to the data patterns which are involved within the local area. After defining the local area and assigning the weighting value, the feature selection is carried out to reduce the dimension of the feature space. When some features are selected in terms of the reconstruction error, the local classifier which is a sort of polynomial is developed using weighted least square estimation. In addition, the experimental application covers a comparative analysis including several previously commonly encountered methods such as standard neural networks, support vector machine, linear discriminant analysis, and C4.5 trees.

A study of creative human judgment through the application of machine learning algorithms and feature selection algorithms

  • Kim, Yong Jun;Park, Jung Min
    • International journal of advanced smart convergence
    • /
    • v.11 no.2
    • /
    • pp.38-43
    • /
    • 2022
  • In this study, there are many difficulties in defining and judging creative people because there is no systematic analysis method using accurate standards or numerical values. Analyze and judge whether In the previous study, A study on the application of rule success cases through machine learning algorithm extraction, a case study was conducted to help verify or confirm the psychological personality test and aptitude test. We proposed a solution to a research problem in psychology using machine learning algorithms, Data Mining's Cross Industry Standard Process for Data Mining, and CRISP-DM, which were used in previous studies. After that, this study proposes a solution that helps to judge creative people by applying the feature selection algorithm. In this study, the accuracy was found by using seven feature selection algorithms, and by selecting the feature group classified by the feature selection algorithms, and the result of deriving the classification result with the highest feature obtained through the support vector machine algorithm was obtained.

Study on Correlation-based Feature Selection in an Automatic Quality Inspection System using Support Vector Machine (SVM) (SVM 기반 자동 품질검사 시스템에서 상관분석 기반 데이터 선정 연구)

  • Song, Donghwan;Oh, Yeong Gwang;Kim, Namhun
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.6
    • /
    • pp.370-376
    • /
    • 2016
  • Manufacturing data analysis and its applications are getting a huge popularity in various industries. In spite of the fast advancement in the big data analysis technology, however, the manufacturing quality data monitored from the automated inspection system sometimes is not reliable enough due to the complex patterns of product quality. In this study, thus, we aim to define the level of trusty of an automated quality inspection system and improve the reliability of the quality inspection data. By correlation analysis and feature selection, this paper presents a method of improving the inspection accuracy and efficiency in an SVM-based automatic product quality inspection system using thermal image data in an auto part manufacturing case. The proposed method is implemented in the sealer dispensing process of the automobile manufacturing and verified by the analysis of the optimal feature selection from the quality analysis results.

Evaluating the Contribution of Spectral Features to Image Classification Using Class Separability

  • Ye, Chul-Soo
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.1
    • /
    • pp.55-65
    • /
    • 2020
  • Image classification needs the spectral similarity comparison between spectral features of each pixel and the representative spectral features of each class. The spectral similarity is obtained by computing the spectral feature vector distance between the pixel and the class. Each spectral feature contributes differently in the image classification depending on the class separability of the spectral feature, which is computed using a suitable vector distance measure such as the Bhattacharyya distance. We propose a method to determine the weight value of each spectral feature in the computation of feature vector distance for the similarity measurement. The weight value is determined by the ratio between each feature separability value to the total separability values of all the spectral features. We created ten spectral features consisting of seven bands of Landsat-8 OLI image and three indices, NDVI, NDWI and NDBI. For three experimental test sites, we obtained the overall accuracies between 95.0% and 97.5% and the kappa coefficients between 90.43% and 94.47%.

Energy Theft Detection Based on Feature Selection Methods and SVM (특징 선택과 서포트 벡터 머신을 활용한 에너지 절도 검출)

  • Lee, Jiyoung;Sun, Young-Ghyu;Lee, Seongwoo;Kim, Jin-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.5
    • /
    • pp.119-125
    • /
    • 2021
  • As the electricity grid systems has been intelligent with the development of ICT technology, power consumption information of users connected to the grid is available to acquired and analyzed for the power utilities. In this paper, the energy theft problem is solved by feature selection methods, which is emerging as the main cause of economic loss in smart grid. The data preprocessing steps of the proposed system consists of five steps. In the feature selection step, features are selected using analysis of variance and mutual information (MI) based method, which are filtering-based feature selection methods. According to the simulation results, the performance of support vector machine classifier is higher than the case of using all the input features of the input data for the case of the MI based feature selection method.

Landslide susceptibility assessment using feature selection-based machine learning models

  • Liu, Lei-Lei;Yang, Can;Wang, Xiao-Mi
    • Geomechanics and Engineering
    • /
    • v.25 no.1
    • /
    • pp.1-16
    • /
    • 2021
  • Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The large number of inputs or conditioning factors for these models, however, can reduce the computation efficiency and increase the difficulty in collecting data. Feature selection is a good tool to address this problem by selecting the most important features among all factors to reduce the size of the input variables. However, two important questions need to be solved: (1) how do feature selection methods affect the performance of machine learning models? and (2) which feature selection method is the most suitable for a given machine learning model? This paper aims to address these two questions by comparing the predictive performance of 13 feature selection-based machine learning (FS-ML) models and 5 ordinary machine learning models on LSA. First, five commonly used machine learning models (i.e., logistic regression, support vector machine, artificial neural network, Gaussian process and random forest) and six typical feature selection methods in the literature are adopted to constitute the proposed models. Then, fifteen conditioning factors are chosen as input variables and 1,017 landslides are used as recorded data. Next, feature selection methods are used to obtain the importance of the conditioning factors to create feature subsets, based on which 13 FS-ML models are constructed. For each of the machine learning models, a best optimized FS-ML model is selected according to the area under curve value. Finally, five optimal FS-ML models are obtained and applied to the LSA of the studied area. The predictive abilities of the FS-ML models on LSA are verified and compared through the receive operating characteristic curve and statistical indicators such as sensitivity, specificity and accuracy. The results showed that different feature selection methods have different effects on the performance of LSA machine learning models. FS-ML models generally outperform the ordinary machine learning models. The best FS-ML model is the recursive feature elimination (RFE) optimized RF, and RFE is an optimal method for feature selection.

Support vector machines with optimal instance selection: An application to bankruptcy prediction

  • Ahn Hyun-Chul;Kim Kyoung-Jae;Han In-Goo
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2006.06a
    • /
    • pp.167-175
    • /
    • 2006
  • Building accurate corporate bankruptcy prediction models has been one of the most important research issues in finance. Recently, support vector machines (SVMs) are popularly applied to bankruptcy prediction because of its many strong points. However, in order to use SVM, a modeler should determine several factors by heuristics, which hinders from obtaining accurate prediction results by using SVM. As a result, some researchers have tried to optimize these factors, especially the feature subset and kernel parameters of SVM But, there have been no studies that have attempted to determine appropriate instance subset of SVM, although it may improve the performance by eliminating distorted cases. Thus in the study, we propose the simultaneous optimization of the instance selection as well as the parameters of a kernel function of SVM by using genetic algorithms (GAs). Experimental results show that our model outperforms not only conventional SVM, but also prior approaches for optimizing SVM.

  • PDF