DOI QR코드

DOI QR Code

Web Attack Classification via WAF Log Analysis: AutoML, CNN, RNN, ALBERT

웹 방화벽 로그 분석을 통한 공격 분류: AutoML, CNN, RNN, ALBERT

  • Youngbok Jo (Korea University) ;
  • Jaewoo Park (KAIST) ;
  • Mee Lan Han (Korea University)
  • 조영복 (고려대학교) ;
  • 박재우 ;
  • 한미란 (고려대학교)
  • Received : 2024.03.19
  • Accepted : 2024.06.28
  • Published : 2024.08.31

Abstract

Cyber Attack and Cyber Threat are getting confused and evolved. Therefore, using AI(Artificial Intelligence), which is the most important technology in Fourth Industry Revolution, to build a Cyber Threat Detection System is getting important. Especially, Government's SOC(Security Operation Center) is highly interested in using AI to build SOAR(Security Orchestration, Automation and Response) Solution to predict and build CTI(Cyber Threat Intelligence). In this thesis, We introduce the Cyber Threat Detection System by analyzing Network Traffic and Web Application Firewall(WAF) Log data. Additionally, we apply the well-known TF-IDF(Term Frequency-Inverse Document Frequency) method and AutoML technology to classify Web traffic attack type.

사이버 공격, 위협이 복잡해지고 빠르게 진화하면서, 4차 산업 혁명의 핵심 기술인 인공지능(AI)을 이용하여 사이버 위협 탐지 시스템 구축이 계속해서 주목받고 있다. 특히, 기업 및 정부 조직의 보안 운영 센터(Security Operations Center)에서는 보안 오케스트레이션, 자동화, 대응을 뜻하는 SOAR(Security Orchestration, Automation and Response) 솔루션 구현을 위해 AI를 활용하는 사례가 증가하고 있으며, 이는 향후 예견되는 근거를 바탕으로 한 지식인 사이버 위협 인텔리전스(Cyber Threat Intelligence, CTI) 구축 및 공유를 목적으로 한다. 본 논문에서는 네트워크 트래픽, 웹 방화벽(WAF) 로그 데이터를 대상으로 한 사이버 위협 탐지 기술 동향을 소개하고, TF-IDF(Term Frequency-Inverse Document Frequency) 기술과 자동화된 머신러닝(AutoML)을 이용하여 웹 트래픽 로그 공격 유형을 분류하는 방법을 제시한다.

Keywords

Acknowledgement

본 논문은 2024년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임. (No.2021-0-00903, 고신뢰 온-디바이스 딥러닝 가속기 설계를 위한 물리채널 기반 취약점 검증 및 대응기술개발) 또한, 이 논문은 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구(No. NRF-00252157)이며, 고려대학교에서 지원된 연구비로 수행되었음.

References

  1. Musa, D.S, "Advanced Persistent Threat-APT." https://www.academia.edu/6309905/Advanced_Persistent_Threat_-_APT, 2014.
  2. Igloo Security, "Example of control application using security orchestration", July. 2022
  3. J. Holland, P. Schmitt, N. Feamster, and P. Mittal, "New directions in automated traffic analysis," Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp. 3366-3383, Nov. 2021.
  4. A. Brown, M. Gupta, and M. Abdelsalam. "Automated machine learning for deep learning based malware detection." Computers & Security vol. 137 p.103582, 2024.
  5. D. Xin, E.Y. Wu, D.J.L. Lee, N. Salehi, and A. Parameswaran, "Whither automl? understanding the role of automation in machine learning workflows," Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1-16, May 2021.
  6. M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter, "Efficient and robust automated machine learning," Advances in Neural Information Processing Systems, vol. 28, 2015.
  7. H. Jin, Q. Song, and X. Hu, "Auto-keras: An efficient neural architecture search system," Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1946-1956, July 2019.
  8. C. Cortes, X. Gonzalvo, V. Kuznetsov, M. Mohri, and S. Yang, "Adanet: Adaptive structural learning of artificial neural networks," International conference on machine learning, pp. 874-883, July 2017.
  9. N. Erickson, J. Mueller, A. Shirkov, H. Zhang, P. Larroy, M. Li, and A. Smola, "Autogluon-tabular: Robust and accurate automl for structured data," arXiv preprint arXiv: 2003. 06505, 2020.
  10. Jaewoo Park, Minsu Kim, and Heejun Roh, "On the Effectiveness of nPrint to an Encrypted Malware Traffic Dataset," Proceedings of the Korean Information Science Society Conference, pp. 1938-1940, 2023.
  11. A. Razzaq, A. Hur, H.F. Ahmad, and M. Masood, "Cyber security: Threats, reasons, challenges, methodologies and state of the art solutions for industrial applications," Proceedings of the 2013 IEEE Eleventh International Symposium on Autonomous Decentralized Systems (ISADS), pp. 1-6,March 2013.
  12. Y. Gao, Y. Ma, and D. Li, "Anomaly detection of malicious users' behaviors for web applications based on web logs," Proceedings of the2017IEEE 17th International Conference on Communication Technology (ICCT), pp. 1352-1355, Oct. 2017.
  13. J. Zhan, X. Liao, Y. Bao, L. Gan, Z.Tan, M. Zhang, ... &J. Lu, "An effective feature representation of weblog data by leveraging byte pair encoding and TF-IDF," Proceedings of the ACM Turing Celebration Conference-China, pp. 1-6, May 2019.
  14. KISA, "Cyber security AI Big Data Challenge 2023, A track," https://aibigdatasec.kr/, Oct. 2023.
  15. Torpeda, "CSIC2012 Dataset(Attacks)", ISI-CSIC, https://www.tic.itefi.csic.es/torpeda/datasets.html, Sept. 2012.