Feature Selection for Anomaly Detection Based on Genetic Algorithm

유전 알고리즘 기반의 비정상 행위 탐지를 위한 특징선택

  • Received : 2018.04.16
  • Accepted : 2018.07.20
  • Published : 2018.07.28


Feature selection, one of data preprocessing techniques, is one of major research areas in many applications dealing with large dataset. It has been used in pattern recognition, machine learning and data mining, and is now widely applied in a variety of fields such as text classification, image retrieval, intrusion detection and genome analysis. The proposed method is based on a genetic algorithm which is one of meta-heuristic algorithms. There are two methods of finding feature subsets: a filter method and a wrapper method. In this study, we use a wrapper method, which evaluates feature subsets using a real classifier, to find an optimal feature subset. The training dataset used in the experiment has a severe class imbalance and it is difficult to improve classification performance for rare classes. After preprocessing the training dataset with SMOTE, we select features and evaluate them with various machine learning algorithms.


Intrusion detection;Machine Learning;Genetic Algorithm;Feature Selection;PCA


  1. H. Liu & L. Yu. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on knowledge and data engineering, 17(4), 491-502.
  2. I. Guyon & A. Elisseeff. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
  3. E. M. Yang, H. J. Lee & C. H. Seo. (2017). Comparison of Detection Performance of Intrusion Detection System Using Fuzzy and Artificial Neural Network. Journal of Digital Convergence, 15(6), 391-398.
  4. H. Y. Lee & H. S. Y. (2014). Quality Evaluation Model for Intrusion Detection System based on Security and Performance. Journal of Digital Convergence, 12(6), 289-295.
  5. H. Y. Lee & H. S. Y. (2015). Convergence Performance Evaluation Model for Intrusion Protection System based on CC and ISO Standard. Journal of Digital Convergence, 13(5), 251-257.
  6. A. Jain & D. Zongker. (1997). Feature selection: Evaluation, application, and small sample performance. IEEE transactions on pattern analysis and machine intelligence, 19(2), 153-158.
  7. A. Blum & R. L. Rivest. (1989). Training a 3-node neural network is NP-complete. In Advances in neural information processing systems, 494-501.
  8. R. Kohavi & G. H. John. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1-2), 273-324.
  9. P. Pudil, J. Novovicva & J. Kittler. (1994). Floating search methods in feature selection. Pattern recognition letters, 15(11), 1119-1125.
  10. V. Bolon-Canedo, N. Sanchez-Marono & A. Alonso- Betanzos. (2011). Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset. Expert Systems with Applications, 38(5), 5947-5957.
  11. H. Nguyen, K. Franke & S. Petrovic. (2010, February). Improving effectiveness of intrusion detection by correlation feature selection. In Availability, Reliability, and Security, 2010. ARES'10 International Conference on, 17-24.
  12. T. S. Chou, K. K. Yen & J. Luo. (2008). Network intrusion detection design using feature selection of soft computing paradigms. International journal of computational intelligence, 4(3), 196-208.
  13. KDD Cup 1999 Data,
  14. N. V. Chawla, K. W. Bowyer, L. O. Hall & W. P. Kegelmeyer. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
  15. WEKA,
  16. D. E. Goldberg. (1989). Genetic Algorithms in Search, Optimization & Machine Learning. Addison. Wesely Publishing Co., Inc, 1998(3), 25.
  17. J. H. Seo. (2015). A study on the performance evaluation of unbalanced intrusion detection dataset classification based on machine learning. Journal of the Korean Institute of Intelligence Systems, 27, 466-474.


Supported by : National Research Foundation of Korea(NRF)