DOI QR코드

DOI QR Code

데이터의 불균형성을 제거한 네트워크 침입 탐지 모델 비교 분석

Experimental Comparison of Network Intrusion Detection Models Solving Imbalanced Data Problem

  • Lee, Jong-Hwa (Kangwon University, IGP. in Medical Bigdata Convergence) ;
  • Bang, Jiwon (Kangwon University, IGP. in Medical Bigdata Convergence) ;
  • Kim, Jong-Wouk (Kangwon University, Dept. of Computer Science/Interdisciplinary Graduate Program) ;
  • Choi, Mi-Jung (Kangwon University, Dept. of Computer Science/IG.P in Medical Bigdata Convergence)
  • 투고 : 2020.11.01
  • 심사 : 2020.12.22
  • 발행 : 2020.12.31

초록

컴퓨팅 환경의 발전에 따라 IT 기술이 의료, 산업, 통신, 문화 등의 분야에서 사람들에게 제공해주는 혜택이 늘어나 삶의 질도 향상되고 있다. 그에 따라 발전된 네트워크 환경을 노리는 다양한 악의적인 공격이 존재한다. 이러한 공격들을 사전에 탐지하기 위해 방화벽, 침입 탐지 시스템 등이 존재하지만, 나날이 진화하는 악성 공격들을 탐지하는 데에는 한계가 있다. 이를 해결하기 위해 기계 학습을 이용한 침입 탐지 연구가 활발히 진행되고 있지만, 학습 데이터셋의 불균형으로 인한 오탐 및 미탐이 발생하고 있다. 본 논문에서는 네트워크 침입 탐지에 사용되는 UNSW-NB15 데이터셋의 불균형성 문제를 해결하기 위해 랜덤 오버샘플링 방법을 사용했다. 실험을 통해 모델들의 accuracy, precision, recall, F1-score, 학습 및 예측 시간, 하드웨어 자원 소모량을 비교 분석했다. 나아가 본 연구를 기반으로 랜덤 오버샘플링 방법 이외에 불균형한 데이터 문제를 해결할 수 있는 다른 방법들과 성능이 높은 모델들을 이용하여 좀 더 효율적인 네트워크 침입 탐지 모델 연구로 발전시키고자 한다.

With the development of the virtual community, the benefits that IT technology provides to people in fields such as healthcare, industry, communication, and culture are increasing, and the quality of life is also improving. Accordingly, there are various malicious attacks targeting the developed network environment. Firewalls and intrusion detection systems exist to detect these attacks in advance, but there is a limit to detecting malicious attacks that are evolving day by day. In order to solve this problem, intrusion detection research using machine learning is being actively conducted, but false positives and false negatives are occurring due to imbalance of the learning dataset. In this paper, a Random Oversampling method is used to solve the unbalance problem of the UNSW-NB15 dataset used for network intrusion detection. And through experiments, we compared and analyzed the accuracy, precision, recall, F1-score, training and prediction time, and hardware resource consumption of the models. Based on this study using the Random Oversampling method, we develop a more efficient network intrusion detection model study using other methods and high-performance models that can solve the unbalanced data problem.

키워드

과제정보

본 연구는 2020년도 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업임.(NRF-2020R1A2C1012117).

참고문헌

  1. Saurabh Vadiya, Prashant Ambad and Santosh Bhosle, "Industry 4.0 - a glimpse," in Proc. of 2nd International Conference on Materials Manufacturing and Design Engineering, MIT Aurangabad, pp. 233-238, Maharashtra, India, Feb. 2018.
  2. Yang Xin et al., "Machine learning and deep learning methods for cybersecurity," Journal of IEEE Access, vol. 6, pp. 35365-35381, May 2018. https://doi.org/10.1109/ACCESS.2018.2836950
  3. GwiHoon Kim and YongGeun hong, "Machine learning technology trends in the network," Journal of The Korean Institute of Communication Sciences, vol. 34, no. 10, pp. 38-44, Sept. 2017.
  4. Seong-Eun Mun, Su-Beom Jang, Jeong-Hyeok and Jong-Seok Lee, "Machine learning and deep learning technology trends," Journal of The Korean Institute of Communication Sciences, vol. 33, no. 10, pp. 49-56, Sept. 2016.
  5. Yun-Gyung Cheong, Kinam Park, Hyunjoo Kim, Jonghyun Kim and Sangwon Hyun, "Machine learning based intrusion detection systems for class imbalanced datasets," Journal of Korea Institute of Information Security & Cryptology, vol. 27, no. 6, pp. 1385-1395, Dec. 2017. https://doi.org/10.13089/JKIISC.2017.27.6.1385
  6. Kyung Min Kim, Ha Young Jang and Byoung Tak Zhang, "Oversampling-based ensemble learning methods for imbalanced data," Journal of KIISE Transactions on Computing Practices, vol. 20, no. 10, pp. 549-554, Oct. 2014. https://doi.org/10.5626/KTCP.2014.20.10.549
  7. Ram Kumar Singh and T. Ramajujam, "Intrusion detection system Using advanced honeypots," arXiv preprint arXiv:0906.5031, June 2009.
  8. Ravi Vinayakumar et al., "Deep learning approach for intelligent intrusion detection system," Journal of IEEE Access, vol. 7, pp. 41525-41550, Oct. 2015. https://doi.org/10.1109/ACCESS.2019.2895334
  9. Anna Buczak and Erhan Guven, "A survey of data mining and machine learning methods for cyber security intrusion detection," Journal of IEEE Communications Surveys & Tutorials, vol. 18, no. 2, pp. 1153-1176, Oct. 2016. https://doi.org/10.1109/COMST.2015.2494502
  10. Corinna Cortes and Vladimir Vanpik, "Support-vector networks," Journal of Machine Learning, vol. 20, no. 3, pp. 273-297, Sept. 1995.
  11. Dishan Jing and Hai-Bao Chen, "SVM based network intrusion detection for the UNSW-NB15 dataset," in Proc. of 2019 IEEE 13th International Conference on ASIC, pp. 1-4, Chongqing, China, Nov. 2019.
  12. Nahla Ben Amor, Salem Benferhat and Zied Elouedi, "Naive bayes vs decision trees in intrusion detection systems," in Proc. of the 2004 ACM symposium on Applied computing, pp. 420-424, Nicosia, Cyprus, Mar. 2004.
  13. Abhishek Divekar et al., "Benchmarking datasets for anomaly-based network intrusion detection: KDD CUP 99 alternatives," in Proc. of 2018 IEEE 3rd International Conference on Computing, Communication and Security, pp. 1-8, Kathmandu, Nepal, Oct. 2018.
  14. Ashfaq hussain Farooqi and Ali Munir, "Intrusion detection system for IP multimedia subsystem using k nearest neighbor classifier," in Proc. of 2008 IEEE International Multitopic Conference, pp. 423-428, Karachi, Pakistan, Dec. 2008.
  15. Indrajeet Kumar, Noor Mohd, Chandradeep Bhatt and Shashi Kumar Sharma, "Development of IDS using supervised machine learning," in Proc. of Soft Computing: Theories and Applications, vol. 1154, pp. 565-577, Singapore, June 2020. https://doi.org/10.1007/978-981-15-4032-5_52
  16. Chuanlong Yin, Yuefei Zhu, Jinlong Fei and Xinzheng He, "A deep learning approach for intrusion detection using recurrent neural networks," Journal of IEEE Access, vol. 5, pp. 21954-21961, Oct. 2017. https://doi.org/10.1109/ACCESS.2017.2762418
  17. Hasim Sak, Andrew Senior and Francoise Beaufays, "Long short-term memory recurrent neural network architectures for large scale acoustic modeling," in Proc. of 15th Annual Conference of the International Speech Communication Association, pp. 338-342, Singapore, Sept. 2014.
  18. Sydney Mambwe and Yanxia Sun, "A deep gated recurrent unit based model for wireless intrusion detection system," Journal of ICT Express, In press, KICS, Mar. 2020.
  19. Waeseem Rawat and Zenghui Wang, "Deep convolutional neural networks for image classification: a comprehensive review," Journal of Neural computation, vol. 29, no. 9, pp. 2352-2449, Sept. 2017. https://doi.org/10.1162/neco_a_00990
  20. Meliboev Azizjon, Alikhanov Jumabek and Wooseong Kim, "1d cnn based network intrusion detection with normalization on imbalanced data," in Proc. of 2020 International Conference on Artificial Intelligence in Information and Communication, pp. 218-224, Fukuoka, Japan, Apr. 2020.
  21. Nour Moustafa and Jill Slay, "UNSW-NB15: a comprehensive data set for network intrusion detection systems," in Proc. of 2015 military communications and information systems conference, pp. 1-6, Canberra, Australia, Nov. 2015.
  22. Gustavo Enrique de Almedia Prado Alves Batista et al., "A study of the behavior of several methods for balancing machine learning training data." Journal of ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 20-29, June 2004. https://doi.org/10.1145/1007730.1007735
  23. Ch. Sarada and M. SathyaDevi, "Imbalanced Big Data Classification using Feature Selection Under-Sampling," CVR Journal of Science and Technology, vol. 17, no. 1, pp. 78-82, Dec. 2019. https://doi.org/10.32377/cvrjst1714