• Title/Summary/Keyword: feature vector selection

Search Result 184, Processing Time 0.026 seconds

Feature Selection for Performance Improvement of Android Malware Detection (안드로이드 악성코드 탐지 성능 향상을 위한 Feature 선정)

  • Kim, Hwan-Hee;Ham, Hyo-Sik;Choi, Mi-Jung
    • Annual Conference of KIPS
    • /
    • 2013.11a
    • /
    • pp.751-753
    • /
    • 2013
  • 안드로이드 플랫폼은 타 모바일 플랫폼보다 보안에 있어서 더 많은 취약점을 안고 있다. 따라서 현재 발생하고 있는 대부분의 모바일 악성코드는 안드로이드 플랫폼에서 발생하고 있다. 현재 악성코드 탐지 기법 중 기계학습을 도입한 방법은 변종 악성코드의 대처에 유연하다. 하지만 기계학습기법은 불필요한 Feature를 학습데이터로 사용할 경우, 오버피팅이 발생하여 전체적인 성능을 저하시킬 수 있다. 본 논문에서는 안드로이드 플랫폼에서 발생하는 리소스를 모니터링하여 Feature vector를 생성하고, Feature-selection 알고리즘을 통하여 Feature의 수에 따라 기계학습 Classifier를 통한 악성코드 탐지의 성능지표를 보인다. 이를 통하여, 기계학습을 통한 악성코드 탐지에서 Feature-selection의 필요성과 중요성을 설명한다.

Efficient Iris Recognition through Improvement of Feature Vector and Classifier

  • Lim, Shin-Young;Lee, Kwan-Yong;Byeon, Ok-Hwan;Kim, Tai-Yun
    • ETRI Journal
    • /
    • v.23 no.2
    • /
    • pp.61-70
    • /
    • 2001
  • In this paper, we propose an efficient method for personal identification by analyzing iris patterns that have a high level of stability and distinctiveness. To improve the efficiency and accuracy of the proposed system, we present a new approach to making a feature vector compact and efficient by using wavelet transform, and two straightforward but efficient mechanisms for a competitive learning method such as a weight vector initialization and the winner selection. With all of these novel mechanisms, the experimental results showed that the proposed system could be used for personal identification in an efficient and effective manner.

  • PDF

Compiler Analysis Framework Using SVM-Based Genetic Algorithm : Feature and Model Selection Sensitivity (SVM 기반 유전 알고리즘을 이용한 컴파일러 분석 프레임워크 : 특징 및 모델 선택 민감성)

  • Hwang, Cheol-Hun;Shin, Gun-Yoon;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.4
    • /
    • pp.537-544
    • /
    • 2020
  • Advances in detection techniques, such as mutation and obfuscation, are being advanced with the development of malware technology. In the malware detection technology, unknown malware detection technology is important, and a method for Malware Authorship Attribution that detects an unknown malicious code by identifying the author through distributed malware is being studied. In this paper, we try to extract the compiler information affecting the binary-based author identification method and to investigate the sensitivity of feature selection, probability and non-probability models, and optimization to classification efficiency between studies. In the experiment, the feature selection method through information gain and the support vector machine, which is a non-probability model, showed high efficiency. Among the optimization studies, high classification accuracy was obtained through feature selection and model optimization through the proposed framework, and resulted in 48% feature reduction and 53 faster execution speed. Through this study, we can confirm the sensitivity of feature selection, model, and optimization methods to classification efficiency.

A Study on automatic assignment of descriptors using machine learning (기계학습을 통한 디스크립터 자동부여에 관한 연구)

  • Kim, Pan-Jun
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.1 s.59
    • /
    • pp.279-299
    • /
    • 2006
  • This study utilizes various approaches of machine learning in the process of automatically assigning descriptors to journal articles. The effectiveness of feature selection and the size of training set were examined, after selecting core journals in the field of information science and organizing test collection from the articles of the past 11 years. Regarding feature selection, after reducing the feature set using $x^2$ statistics(CHI) and criteria that prefer high-frequency features(COS, GSS, JAC), the trained Support Vector Machines(SVM) performed the best. With respect to the size of the training set, it significantly influenced the performance of Support Vector Machines(SVM) and Voted Perceptron(VTP). However, it had little effect on Naive Bayes(NB).

Gene Selection Based on Support Vector Machine using Bootstrap (붓스트랩 방법을 활용한 SVM 기반 유전자 선택 기법)

  • Song, Seuck-Heun;Kim, Kyoung-Hee;Park, Chang-Yi;Koo, Ja-Yong
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.3
    • /
    • pp.531-540
    • /
    • 2007
  • The recursive feature elimination for support vector machine is known to be useful in selecting relevant genes. Since the criterion for choosing relevant genes is the absolute value of a coefficient, the recursive feature elimination may suffer from a scaling problem. We propose a modified version of the recursive feature elimination algorithm using bootstrap. In our method, the criterion for determining relevant genes is the absolute value of a coefficient divided by its standard error, which accounts for statistical variability of the coefficient. Through numerical examples, we illustrate that our method is effective in gene selection.

Hybrid Feature Selection Method Based on Genetic Algorithm for the Diagnosis of Coronary Heart Disease

  • Wiharto, Wiharto;Suryani, Esti;Setyawan, Sigit;Putra, Bintang PE
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.1
    • /
    • pp.31-40
    • /
    • 2022
  • Coronary heart disease (CHD) is a comorbidity of COVID-19; therefore, routine early diagnosis is crucial. A large number of examination attributes in the context of diagnosing CHD is a distinct obstacle during the pandemic when the number of health service users is significant. The development of a precise machine learning model for diagnosis with a minimum number of examination attributes can allow examinations and healthcare actions to be undertaken quickly. This study proposes a CHD diagnosis model based on feature selection, data balancing, and ensemble-based classification methods. In the feature selection stage, a hybrid SVM-GA combined with fast correlation-based filter (FCBF) is used. The proposed system achieved an accuracy of 94.60% and area under the curve (AUC) of 97.5% when tested on the z-Alizadeh Sani dataset and used only 8 of 54 inspection attributes. In terms of performance, the proposed model can be placed in the very good category.

Evaluation of the Effect of using Fractal Feature on Machine learning based Pancreatic Tumor Classification (기계학습 기반 췌장 종양 분류에서 프랙탈 특징의 유효성 평가)

  • Oh, Seok;Kim, Young Jae;Kim, Kwang Gi
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.12
    • /
    • pp.1614-1623
    • /
    • 2021
  • In this paper, the purpose is evaluation of the effect of using fractal feature in machine learning based pancreatic tumor classification. We used the data that Pancreas CT series 469 case including 1995 slice of benign and 1772 slice of malignant. Feature selection is implemented from 109 feature to 7 feature by Lasso regularization. In Fractal feature, fractal dimension is obtained by box-counting method, and hurst coefficient is calculated range data of pixel value in ROI. As a result, there were significant differences in both benign and malignancies tumor. Additionally, we compared the classification performance between model without fractal feature and model with fractal feature by using support vector machine. The train model with fractal feature showed statistically significant performance in comparison with train model without fractal feature.

Two dimensional reduction technique of Support Vector Machines for Bankruptcy Prediction

  • Ahn, Hyun-Chul;Kim, Kyoung-Jae;Lee, Ki-Chun
    • 한국경영정보학회:학술대회논문집
    • /
    • 2007.06a
    • /
    • pp.608-613
    • /
    • 2007
  • Prediction of corporate bankruptcies has long been an important topic and has been studied extensively in the finance and management literature because it is an essential basis for the risk management of financial institutions. Recently, support vector machines (SVMs) are becoming popular as a tool for bankruptcy prediction because they use a risk function consisting of the empirical error and a regularized term which is derived from the structural risk minimization principle. In addition, they don't require huge training samples and have little possibility of overfitting. However. in order to Use SVM, a user should determine several factors such as the parameters ofa kernel function, appropriate feature subset, and proper instance subset by heuristics, which hinders accurate prediction results when using SVM In this study, we propose a novel hybrid SVM classifier with simultaneous optimization of feature subsets, instance subsets, and kernel parameters. This study introduces genetic algorithms (GAs) to optimize the feature selection, instance selection, and kernel parameters simultaneously. Our study applies the proposed model to the real-world case for bankruptcy prediction. Experimental results show that the prediction accuracy of conventional SVM may be improved significantly by using our model.

  • PDF

Improved marine predators algorithm for feature selection and SVM optimization

  • Jia, Heming;Sun, Kangjian;Li, Yao;Cao, Ning
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.4
    • /
    • pp.1128-1145
    • /
    • 2022
  • Owing to the rapid development of information science, data analysis based on machine learning has become an interdisciplinary and strategic area. Marine predators algorithm (MPA) is a novel metaheuristic algorithm inspired by the foraging strategies of marine organisms. Considering the randomness of these strategies, an improved algorithm called co-evolutionary cultural mechanism-based marine predators algorithm (CECMPA) is proposed. Through this mechanism, search agents in different spaces can share knowledge and experience to improve the performance of the native algorithm. More specifically, CECMPA has a higher probability of avoiding local optimum and can search the global optimum quickly. In this paper, it is the first to use CECMPA to perform feature subset selection and optimize hyperparameters in support vector machine (SVM) simultaneously. For performance evaluation the proposed method, it is tested on twelve datasets from the university of California Irvine (UCI) repository. Moreover, the coronavirus disease 2019 (COVID-19) can be a real-world application and is spreading in many countries. CECMPA is also applied to a COVID-19 dataset. The experimental results and statistical analysis demonstrate that CECMPA is superior to other compared methods in the literature in terms of several evaluation metrics. The proposed method has strong competitive abilities and promising prospects.

Discriminative Power Feature Selection Method for Motor Imagery EEG Classification in Brain Computer Interface Systems

  • Yu, XinYang;Park, Seung-Min;Ko, Kwang-Eun;Sim, Kwee-Bo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.13 no.1
    • /
    • pp.12-18
    • /
    • 2013
  • Motor imagery classification in electroencephalography (EEG)-based brain-computer interface (BCI) systems is an important research area. To simplify the complexity of the classification, selected power bands and electrode channels have been widely used to extract and select features from raw EEG signals, but there is still a loss in classification accuracy in the state-of- the-art approaches. To solve this problem, we propose a discriminative feature extraction algorithm based on power bands with principle component analysis (PCA). First, the raw EEG signals from the motor cortex area were filtered using a bandpass filter with ${\mu}$ and ${\beta}$ bands. This research considered the power bands within a 0.4 second epoch to select the optimal feature space region. Next, the total feature dimensions were reduced by PCA and transformed into a final feature vector set. The selected features were classified by applying a support vector machine (SVM). The proposed method was compared with a state-of-art power band feature and shown to improve classification accuracy.