• 제목/요약/키워드: hybrid feature selection model

검색결과 25건 처리시간 0.028초

Hybrid Case-based Reasoning and Genetic Algorithms Approach for Customer Classification

  • Kim Kyoung-jae;Ahn Hyunchul
    • Journal of information and communication convergence engineering
    • /
    • 제3권4호
    • /
    • pp.209-212
    • /
    • 2005
  • This study proposes hybrid case-based reasoning and genetic algorithms model for customer classification. In this study, vertical and horizontal dimensions of the research data are reduced through integrated feature and instance selection process using genetic algorithms. We applied the proposed model to customer classification model which utilizes customers' demographic characteristics as inputs to predict their buying behavior for the specific product. Experimental results show that the proposed model may improve the classification accuracy and outperform various optimization models of typical CBR system.

Hybrid Feature Selection과 Data Balancing을 통한 효율적인 네트워크 침입 탐지 모델 (Improved Network Intrusion Detection Model through Hybrid Feature Selection and Data Balancing)

  • 민병준;유지훈;신동규;신동일
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제10권2호
    • /
    • pp.65-72
    • /
    • 2021
  • 최근 네트워크 환경에 대한 공격이 급속도로 고도화 및 지능화 되고 있기에, 기존의 시그니처 기반 침입탐지 시스템은 한계점이 명확해지고 있다. 이러한 문제를 해결하기 위해서 기계학습 기반의 침입 탐지 시스템에 대한 연구가 활발히 진행되고 있다. 하지만 기계학습을 침입 탐지에 이용하기 위해서는 두 가지 문제에 직면한다. 첫 번째는 실시간 탐지를 위한 학습과 연관된 중요 특징들을 선별하는 문제이며, 두 번째는 학습에 사용되는 데이터의 불균형 문제로, 기계학습 알고리즘들은 데이터에 의존적이기에 이러한 문제는 치명적이다. 본 논문에서는 위 제시된 문제들을 해결하기 위해서 Hybrid Feature Selection과 Data Balancing을 통한 심층 신경망 기반의 네트워크 침입 탐지 모델인 HFS-DNN을 제안한다. NSL-KDD 데이터 셋을 통해 학습을 진행하였으며, 기존 분류 모델들과 성능 비교를 수행한다. 본 연구에서 제안된 Hybrid Feature Selection 알고리즘이 학습 모델의 성능을 왜곡 시키지 않는 것을 확인하였으며, 불균형을 해소한 학습 모델들간 실험에서 본 논문에서 제안한 학습 모델이 가장 좋은 성능을 보였다.

KNHNAES (2013~2015) 에 기반한 대형 특징 공간 데이터집 혼합형 효율적인 특징 선택 모델 (A Hybrid Efficient Feature Selection Model for High Dimensional Data Set based on KNHNAES (2013~2015))

  • 권태일;이정곤;박현우;류광선;김의탁;박명호
    • 디지털콘텐츠학회 논문지
    • /
    • 제19권4호
    • /
    • pp.739-747
    • /
    • 2018
  • 고차원 데이터에서는 데이터마이닝 기법 중에서 특징 선택은 매우 중요한 과정이 되었다. 그러나 전통적인 단일 특징 선택방법은 더 이상 효율적인 특징선택 기법으로 적합하지 않을 수 있다. 본 논문에서 우리는 고차원 데이터에 대한 효율적인 특징선택을 위하여 혼합형 특징선택 기법을 제안하였다. 본 논문에서는 KNHANES 데이터에 제안한 혼합형 특징선택기법을 적용하여 분류한 결과 기존의 분류기법을 적용한 모델보다 5% 이상의 정확도가 향상되었다.

Hybrid Feature Selection Method Based on a Naïve Bayes Algorithm that Enhances the Learning Speed while Maintaining a Similar Error Rate in Cyber ISR

  • Shin, GyeongIl;Yooun, Hosang;Shin, DongIl;Shin, DongKyoo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권12호
    • /
    • pp.5685-5700
    • /
    • 2018
  • Cyber intelligence, surveillance, and reconnaissance (ISR) has become more important than traditional military ISR. An agent used in cyber ISR resides in an enemy's networks and continually collects valuable information. Thus, this agent should be able to determine what is, and is not, useful in a short amount of time. Moreover, the agent should maintain a classification rate that is high enough to select useful data from the enemy's network. Traditional feature selection algorithms cannot comply with these requirements. Consequently, in this paper, we propose an effective hybrid feature selection method derived from the filter and wrapper methods. We illustrate the design of the proposed model and the experimental results of the performance comparison between the proposed model and the existing model.

Hybrid Feature Selection Method Based on Genetic Algorithm for the Diagnosis of Coronary Heart Disease

  • Wiharto, Wiharto;Suryani, Esti;Setyawan, Sigit;Putra, Bintang PE
    • Journal of information and communication convergence engineering
    • /
    • 제20권1호
    • /
    • pp.31-40
    • /
    • 2022
  • Coronary heart disease (CHD) is a comorbidity of COVID-19; therefore, routine early diagnosis is crucial. A large number of examination attributes in the context of diagnosing CHD is a distinct obstacle during the pandemic when the number of health service users is significant. The development of a precise machine learning model for diagnosis with a minimum number of examination attributes can allow examinations and healthcare actions to be undertaken quickly. This study proposes a CHD diagnosis model based on feature selection, data balancing, and ensemble-based classification methods. In the feature selection stage, a hybrid SVM-GA combined with fast correlation-based filter (FCBF) is used. The proposed system achieved an accuracy of 94.60% and area under the curve (AUC) of 97.5% when tested on the z-Alizadeh Sani dataset and used only 8 of 54 inspection attributes. In terms of performance, the proposed model can be placed in the very good category.

A Hybrid Multi-Level Feature Selection Framework for prediction of Chronic Disease

  • G.S. Raghavendra;Shanthi Mahesh;M.V.P. Chandrasekhara Rao
    • International Journal of Computer Science & Network Security
    • /
    • 제23권12호
    • /
    • pp.101-106
    • /
    • 2023
  • Chronic illnesses are among the most common serious problems affecting human health. Early diagnosis of chronic diseases can assist to avoid or mitigate their consequences, potentially decreasing mortality rates. Using machine learning algorithms to identify risk factors is an exciting strategy. The issue with existing feature selection approaches is that each method provides a distinct set of properties that affect model correctness, and present methods cannot perform well on huge multidimensional datasets. We would like to introduce a novel model that contains a feature selection approach that selects optimal characteristics from big multidimensional data sets to provide reliable predictions of chronic illnesses without sacrificing data uniqueness.[1] To ensure the success of our proposed model, we employed balanced classes by employing hybrid balanced class sampling methods on the original dataset, as well as methods for data pre-processing and data transformation, to provide credible data for the training model. We ran and assessed our model on datasets with binary and multivalued classifications. We have used multiple datasets (Parkinson, arrythmia, breast cancer, kidney, diabetes). Suitable features are selected by using the Hybrid feature model consists of Lassocv, decision tree, random forest, gradient boosting,Adaboost, stochastic gradient descent and done voting of attributes which are common output from these methods.Accuracy of original dataset before applying framework is recorded and evaluated against reduced data set of attributes accuracy. The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy on multi valued class datasets than on binary class attributes.[1]

Hybrid Feature Selection과 Data Balancing을 통한 네트워크 침입 탐지 모델 (Network intrusion detection Model through Hybrid Feature Selection and Data Balancing)

  • 민병준;신동규;신동일
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2020년도 춘계학술발표대회
    • /
    • pp.526-529
    • /
    • 2020
  • 최근 네트워크 환경에 대한 공격이 급속도로 고도화 및 지능화 되고 있기에, 기존의 시그니처 기반 침입탐지 시스템은 한계점이 명확해지고 있다. 이러한 문제를 해결하기 위해서 기계학습 기반의 침입 탐지 시스템에 대한 연구가 활발히 진행되고 있지만 기계학습을 침입 탐지에 이용하기 위해서는 두 가지 문제에 직면한다. 첫 번째는 실시간 탐지를 위한 학습과 연관된 중요 특징들을 선별하는 문제이며 두 번째는 학습에 사용되는 데이터의 불균형 문제로, 기계학습 알고리즘들은 데이터에 의존적이기에 이러한 문제는 치명적이다. 본 논문에서는 위 제시된 문제들을 해결하기 위해서 Hybrid Feature Selection과 Data Balancing을 통한 심층 신경망 기반의 네트워크 침입 탐지 모델을 제안한다. NSL-KDD 데이터 셋을 통해 학습을 진행하였으며, 평가를 위해 Accuracy, Precision, Recall, F1 Score 지표를 사용하였다. 본 논문에서 제안된 모델은 Random Forest 및 기본 심층 신경망 모델과 비교해 F1 Score를 기준으로 7~9%의 성능 향상을 이루었다.

Relevancy contemplation in medical data analytics and ranking of feature selection algorithms

  • P. Antony Seba;J. V. Bibal Benifa
    • ETRI Journal
    • /
    • 제45권3호
    • /
    • pp.448-461
    • /
    • 2023
  • This article performs a detailed data scrutiny on a chronic kidney disease (CKD) dataset to select efficient instances and relevant features. Data relevancy is investigated using feature extraction, hybrid outlier detection, and handling of missing values. Data instances that do not influence the target are removed using data envelopment analysis to enable reduction of rows. Column reduction is achieved by ranking the attributes through feature selection methodologies, namely, extra-trees classifier, recursive feature elimination, chi-squared test, analysis of variance, and mutual information. These methodologies are ranked via Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) using weight optimization to identify the optimal features for model building from the CKD dataset to facilitate better prediction while diagnosing the severity of the disease. An efficient hybrid ensemble and novel similarity-based classifiers are built using the pruned dataset, and the results are thereafter compared with random forest, AdaBoost, naive Bayes, k-nearest neighbors, and support vector machines. The hybrid ensemble classifier yields a better prediction accuracy of 98.31% for the features selected by extra tree classifier (ETC), which is ranked as the best by TOPSIS.

A Novel Image Classification Method for Content-based Image Retrieval via a Hybrid Genetic Algorithm and Support Vector Machine Approach

  • Seo, Kwang-Kyu
    • 반도체디스플레이기술학회지
    • /
    • 제10권3호
    • /
    • pp.75-81
    • /
    • 2011
  • This paper presents a novel method for image classification based on a hybrid genetic algorithm (GA) and support vector machine (SVM) approach which can significantly improve the classification performance for content-based image retrieval (CBIR). Though SVM has been widely applied to CBIR, it has some problems such as the kernel parameters setting and feature subset selection of SVM which impact the classification accuracy in the learning process. This study aims at simultaneously optimizing the parameters of SVM and feature subset without degrading the classification accuracy of SVM using GA for CBIR. Using the hybrid GA and SVM model, we can classify more images in the database effectively. Experiments were carried out on a large-size database of images and experiment results show that the classification accuracy of conventional SVM may be improved significantly by using the proposed model. We also found that the proposed model outperformed all the other models such as neural network and typical SVM models.

Simultaneous optimization method of feature transformation and weighting for artificial neural networks using genetic algorithm : Application to Korean stock market

  • Kim, Kyoung-jae;Ingoo Han
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 1999년도 추계학술대회-지능형 정보기술과 미래조직 Information Technology and Future Organization
    • /
    • pp.323-335
    • /
    • 1999
  • In this paper, we propose a new hybrid model of artificial neural networks(ANNs) and genetic algorithm (GA) to optimal feature transformation and feature weighting. Previous research proposed several variants of hybrid ANNs and GA models including feature weighting, feature subset selection and network structure optimization. Among the vast majority of these studies, however, ANNs did not learn the patterns of data well, because they employed GA for simple use. In this study, we incorporate GA in a simultaneous manner to improve the learning and generalization ability of ANNs. In this study, GA plays role to optimize feature weighting and feature transformation simultaneously. Globally optimized feature weighting overcome the well-known limitations of gradient descent algorithm and globally optimized feature transformation also reduce the dimensionality of the feature space and eliminate irrelevant factors in modeling ANNs. By this procedure, we can improve the performance and enhance the generalisability of ANNs.

  • PDF