DOI QR코드

DOI QR Code

Feature Selection for Anomaly Detection Based on Genetic Algorithm

유전 알고리즘 기반의 비정상 행위 탐지를 위한 특징선택

  • Received : 2018.04.16
  • Accepted : 2018.07.20
  • Published : 2018.07.28

Abstract

Feature selection, one of data preprocessing techniques, is one of major research areas in many applications dealing with large dataset. It has been used in pattern recognition, machine learning and data mining, and is now widely applied in a variety of fields such as text classification, image retrieval, intrusion detection and genome analysis. The proposed method is based on a genetic algorithm which is one of meta-heuristic algorithms. There are two methods of finding feature subsets: a filter method and a wrapper method. In this study, we use a wrapper method, which evaluates feature subsets using a real classifier, to find an optimal feature subset. The training dataset used in the experiment has a severe class imbalance and it is difficult to improve classification performance for rare classes. After preprocessing the training dataset with SMOTE, we select features and evaluate them with various machine learning algorithms.

Keywords

Intrusion detection;Machine Learning;Genetic Algorithm;Feature Selection;PCA

References

  1. H. Liu & L. Yu. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on knowledge and data engineering, 17(4), 491-502. https://doi.org/10.1109/TKDE.2005.66
  2. I. Guyon & A. Elisseeff. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
  3. E. M. Yang, H. J. Lee & C. H. Seo. (2017). Comparison of Detection Performance of Intrusion Detection System Using Fuzzy and Artificial Neural Network. Journal of Digital Convergence, 15(6), 391-398. https://doi.org/10.14400/JDC.2017.15.6.391
  4. H. Y. Lee & H. S. Y. (2014). Quality Evaluation Model for Intrusion Detection System based on Security and Performance. Journal of Digital Convergence, 12(6), 289-295. https://doi.org/10.14400/JDC.2014.12.6.289
  5. H. Y. Lee & H. S. Y. (2015). Convergence Performance Evaluation Model for Intrusion Protection System based on CC and ISO Standard. Journal of Digital Convergence, 13(5), 251-257. https://doi.org/10.14400/JDC.2015.13.5.251
  6. A. Jain & D. Zongker. (1997). Feature selection: Evaluation, application, and small sample performance. IEEE transactions on pattern analysis and machine intelligence, 19(2), 153-158. https://doi.org/10.1109/34.574797
  7. A. Blum & R. L. Rivest. (1989). Training a 3-node neural network is NP-complete. In Advances in neural information processing systems, 494-501.
  8. R. Kohavi & G. H. John. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1-2), 273-324. https://doi.org/10.1016/S0004-3702(97)00043-X
  9. P. Pudil, J. Novovicva & J. Kittler. (1994). Floating search methods in feature selection. Pattern recognition letters, 15(11), 1119-1125. https://doi.org/10.1016/0167-8655(94)90127-9
  10. V. Bolon-Canedo, N. Sanchez-Marono & A. Alonso- Betanzos. (2011). Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset. Expert Systems with Applications, 38(5), 5947-5957. https://doi.org/10.1016/j.eswa.2010.11.028
  11. H. Nguyen, K. Franke & S. Petrovic. (2010, February). Improving effectiveness of intrusion detection by correlation feature selection. In Availability, Reliability, and Security, 2010. ARES'10 International Conference on, 17-24.
  12. T. S. Chou, K. K. Yen & J. Luo. (2008). Network intrusion detection design using feature selection of soft computing paradigms. International journal of computational intelligence, 4(3), 196-208.
  13. KDD Cup 1999 Data, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
  14. N. V. Chawla, K. W. Bowyer, L. O. Hall & W. P. Kegelmeyer. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357. https://doi.org/10.1613/jair.953
  15. WEKA, https://www.cs.waikato.ac.nz/ml/weka/
  16. D. E. Goldberg. (1989). Genetic Algorithms in Search, Optimization & Machine Learning. Addison. Wesely Publishing Co., Inc, 1998(3), 25.
  17. J. H. Seo. (2015). A study on the performance evaluation of unbalanced intrusion detection dataset classification based on machine learning. Journal of the Korean Institute of Intelligence Systems, 27, 466-474.

Acknowledgement

Supported by : National Research Foundation of Korea(NRF)