DOI QR코드

DOI QR Code

Host-Based Intrusion Detection Model Using Few-Shot Learning

Few-Shot Learning을 사용한 호스트 기반 침입 탐지 모델

  • 박대경 (세종대학교 컴퓨터공학과 지능형드론 융합전공) ;
  • 신동일 (세종대학교 컴퓨터공학과 지능형드론 융합전공) ;
  • 신동규 (세종대학교 컴퓨터공학과 지능형드론 융합전공) ;
  • 김상수 (국방과학연구소 사이버/네트워크 기술센터)
  • Received : 2020.12.30
  • Accepted : 2021.02.25
  • Published : 2021.07.31

Abstract

As the current cyber attacks become more intelligent, the existing Intrusion Detection System is difficult for detecting intelligent attacks that deviate from the existing stored patterns. In an attempt to solve this, a model of a deep learning-based intrusion detection system that analyzes the pattern of intelligent attacks through data learning has emerged. Intrusion detection systems are divided into host-based and network-based depending on the installation location. Unlike network-based intrusion detection systems, host-based intrusion detection systems have the disadvantage of having to observe the inside and outside of the system as a whole. However, it has the advantage of being able to detect intrusions that cannot be detected by a network-based intrusion detection system. Therefore, in this study, we conducted a study on a host-based intrusion detection system. In order to evaluate and improve the performance of the host-based intrusion detection system model, we used the host-based Leipzig Intrusion Detection-Data Set (LID-DS) published in 2018. In the performance evaluation of the model using that data set, in order to confirm the similarity of each data and reconstructed to identify whether it is normal data or abnormal data, 1D vector data is converted to 3D image data. Also, the deep learning model has the drawback of having to re-learn every time a new cyber attack method is seen. In other words, it is not efficient because it takes a long time to learn a large amount of data. To solve this problem, this paper proposes the Siamese Convolutional Neural Network (Siamese-CNN) to use the Few-Shot Learning method that shows excellent performance by learning the little amount of data. Siamese-CNN determines whether the attacks are of the same type by the similarity score of each sample of cyber attacks converted into images. The accuracy was calculated using Few-Shot Learning technique, and the performance of Vanilla Convolutional Neural Network (Vanilla-CNN) and Siamese-CNN was compared to confirm the performance of Siamese-CNN. As a result of measuring Accuracy, Precision, Recall and F1-Score index, it was confirmed that the recall of the Siamese-CNN model proposed in this study was increased by about 6% from the Vanilla-CNN model.

현재 사이버 공격이 더욱 지능화됨에 따라 기존의 침입 탐지 시스템(Intrusion Detection System)은 저장된 패턴에서 벗어난 지능형 공격을 탐지하기 어렵다. 이를 해결하려는 방법으로, 데이터 학습을 통해 지능형 공격의 패턴을 분석하는 딥러닝(Deep Learning) 기반의 침입 탐지 시스템 모델이 등장했다. 침입 탐지 시스템은 설치 위치에 따라 호스트 기반과 네트워크 기반으로 구분된다. 호스트 기반 침입 탐지 시스템은 네트워크 기반 침입 탐지 시스템과 달리 시스템 내부와 외부를 전체적으로 관찰해야 하는 단점이 있다. 하지만 네트워크 기반 침입 탐지 시스템에서 탐지할 수 없는 침입을 탐지할 수 있는 장점이 있다. 따라서, 본 연구에서는 호스트 기반의 침입 탐지 시스템에 관한 연구를 수행했다. 호스트 기반의 침입 탐지 시스템 모델의 성능을 평가하고 개선하기 위해서 2018년에 공개된 호스트 기반 LID-DS(Leipzig Intrusion Detection-Data Set)를 사용했다. 해당 데이터 세트를 통한 모델의 성능 평가에 있어서 각 데이터에 대한 유사성을 확인하여 정상 데이터인지 비정상 데이터인지 식별하기 위해 1차원 벡터 데이터를 3차원 이미지 데이터로 변환하여 재구성했다. 또한, 딥러닝 모델은 새로운 사이버 공격 방법이 발견될 때마다 학습을 다시 해야 한다는 단점이 있다. 즉, 데이터의 양이 많을수록 학습하는 시간이 오래 걸리기 때문에 효율적이지 못하다. 이를 해결하기 위해 본 논문에서는 적은 양의 데이터를 학습하여 우수한 성능을 보이는 Few-Shot Learning 기법을 사용하기 위해 Siamese-CNN(Siamese Convolutional Neural Network)을 제안한다. Siamese-CNN은 이미지로 변환한 각 사이버 공격의 샘플에 대한 유사성 점수에 의해 같은 유형의 공격인지 아닌지 판단한다. 정확성은 Few-Shot Learning 기법을 사용하여 정확성을 계산했으며, Siamese-CNN의 성능을 확인하기 위해 Vanilla-CNN(Vanilla Convolutional Neural Network)과 Siamese-CNN의 성능을 비교했다. Accuracy, Precision, Recall 및 F1-Score 지표를 측정한 결과, Vanilla-CNN 모델보다 본 연구에서 제안한 Siamese-CNN 모델의 Recall이 약 6% 증가한 것을 확인했다.

Keywords

Acknowledgement

본 연구는 방위사업청과 국방과학연구소의 지원으로 수행되었음(UD2000014ED).

References

  1. Y. G. Choi and S. S. Park, "Reinforcement Mining Method for Anomaly Detection and Misuse Detection using Post-processing and Training Method," Proceedings of the Korean Information Science Society Conference, pp.238-240, 2006.
  2. S. O. Choi and W. N. Kim, "Control system intrusion detection system technology research trend," Review of Korea Institute of Information Security & Cryptology, Vol.24, No.5, pp.7-14, 2014.
  3. G. Pang, C. Shen, L. Cao, and A. V. D. Hengel, "Deep learning for anomaly detection: A review," arXiv preprint arXiv: 2007.02500 (2020).
  4. M. M. Rohling, M. Grimmer, D. Kreubel, J. Hoffmann, and B. Franczyk, "Standardized container virtualization approach for collecting host intrusion detection data," 2019 Federated Conference on Computer Science and Information Systems (FedCSIS), IEEE, 2019.
  5. O. Yavanoglu and M. Aydos, "A review on cyber security datasets for machine learning algorithms," 2017 IEEE International Conference on Big Data (Big Data), IEEE, 2017.
  6. M. Pendleton and S. Xu, "A dataset generator for next generation system call host intrusion detection systems," MILCOM 2017-2017 IEEE Military Communications Conference (MILCOM), IEEE, 2017.
  7. L.N. Tidjon, M. Frappier, and A. Mammar, "Intrusion detection systems: A cross-domain overview," IEEE Communications Surveys & Tutorials, Vol.21, No.4, pp.3639-3681, 2019. https://doi.org/10.1109/COMST.2019.2922584
  8. H. Kwon, Y. Kim, H. Yoon, and D. Choi, "Optimal cluster expansion-based intrusion tolerant system to prevent denial of service attacks," Applied Sciences, Vol.7, No.11, pp.1186, 2017. https://doi.org/10.3390/app7111186
  9. P. Laskov, P. Dussel, C. Schafer, and K. Rieck, "Learning intrusion detection: supervised or unsupervised?," International Conference on Image Analysis and Processing, Springer, Berlin, Heidelberg, 2005.
  10. J. H. Kim and H. W. Kim, "An effective intrusion detection classifier using long short-term memory with gradient descent optimization," 2017 International Conference on Platform Technology and Service (PlatCon), IEEE, 2017.
  11. G. Kim, H. Yi, J. Lee, Y. Paek, and S. Yoon, "LSTM-based system-call language modeling and robust ensemble method for designing host-based intrusion detection systems," arXiv preprint arXiv:1611.01726, 2016.
  12. R. D. Ravipati and M. Abualkibash, "Intrusion Detection System Classification Using Different Machine Learning Algorithms on KDD-99 and NSL-KDD Datasets-A Review Paper," International Journal of Computer Science & Information Technology, Vol.11, 2019.
  13. A. K Verma, P. Kaushik, and G. Shrivastava, "A Network Intrusion Detection Approach Using Variant of Convolution Neural Network," 2019 International Conference on Communication and Electronics Systems (ICCES), IEEE, 2019.
  14. J. Kim, J. Kim, H. Kim, M. Shim, and E. Choi, "CNN-Based Network Intrusion Detection against Denial-of-Service Attacks," Electronics, Vol.9, No.6, pp.916, 2020. https://doi.org/10.3390/electronics9060916
  15. R. U. Khan, X. Zhang, M. Alazab, and R. Kumar, "An improved convolutional neural network model for intrusion detection in networks," 2019 Cybersecurity and Cyberforensics Conference (CCC), IEEE, 2019.
  16. R. Upadhyay and D. Pantiukhin, "Application of convolutional neural network to intrusion type recognition," Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics, Udupi, India, pp.13-16, 2017.
  17. S. C. Hsiao, D. Y. Kao, Z. Y. Liu, and R. Tso, "Malware image classification using one-shot learning with Siamese networks," Procedia Computer Science, Vol.159, pp.1863-1871, 2019. https://doi.org/10.1016/j.procs.2019.09.358
  18. S. Moustakidis and P. Karlsson, "A novel feature extraction methodology using Siamese convolutional neural networks for intrusion detection," Cybersecurity, Vol.3, No.1, pp.1-13, 2020. https://doi.org/10.1186/s42400-019-0043-x
  19. Y. Taigman, M. Yang, M. A. Ranzato, and L. Wolf, "Deepface: Closing the gap to human-level performance in face verification," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014.
  20. S. E. Jang and J. T. Kim, "Few-shot classification of Histopathology image using Batch Hard Loss-based Siamese Networks," The Korean Institute of Information Scientists and Engineers, pp.634-636, 2019.