Intrusion Detection Method Using Unsupervised Learning-Based Embedding and Autoencoder

Junwoo Lee;Kangseok Kim;

doi:10.3745/KTSDE.2023.12.8.355

정보처리학회논문지:소프트웨어 및 데이터공학 (KIPS Transactions on Software and Data Engineering)

제12권8호
/
Pages.355-364
/
2023
/
2287-5905(pISSN)
/
2734-0503(eISSN)

한국정보처리학회 (Korea Information Processing Society)

DOI QR Code

비지도 학습 기반의 임베딩과 오토인코더를 사용한 침입 탐지 방법

Intrusion Detection Method Using Unsupervised Learning-Based Embedding and Autoencoder

이준우 (아주대학교 지식정보공학과) ;
김강석 (아주대학교 사이버보안학과)

Junwoo Lee ;
Kangseok Kim

투고 : 2023.03.10
심사 : 2023.07.19
발행 : 2023.08.31

https://doi.org/10.3745/KTSDE.2023.12.8.355 인용 PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

최근 지능화된 사이버 위협이 지속적으로 증가함에 따라 기존의 패턴 혹은 시그니처 기반의 침입 탐지 방식은 새로운 유형의 사이버 공격을 탐지하는데 어려움이 있다. 따라서 데이터 학습 기반 인공지능 기술을 적용한 이상 징후 탐지 방법에 관한 연구가 증가하고 있다. 또한 지도학습 기반 이상 탐지 방식은 학습을 위해 레이블 된 이용 가능한 충분한 데이터를 필요로 하기 때문에 실제 환경에서 사용하기에는 어려움이 있다. 최근에는 정상 데이터로 학습하고 데이터 자체에서 패턴을 찾아 이상 징후를 탐지하는 비지도 학습 기반의 방법에 대한 연구가 활발히 진행되고 있다. 그러므로 본 연구는 시퀀스 로그 데이터로부터 유용한 시퀀스 정보를 보존하는 잠재 벡터(Latent Vector)를 추출하고, 추출된 잠재 벡터를 사용하여 이상 탐지 학습 모델을 개발하는데 있다. 각 시퀀스의 특성들에 대응하는 밀집 벡터 표현을 생성하기 위하여 Word2Vec을 사용하였으며, 밀집 벡터로 표현된 시퀀스 데이터로부터 잠재 벡터를 추출하기 위하여 비지도 방식의 오토인코더(Autoencoder)를 사용하였다. 개발된 오토인코더 모델은 시퀀스 데이터에 적합한 순환신경망 GRU(Gated Recurrent Unit) 기반의 잡음 제거 오토인코더, GRU 네트워크의 제한적인 단기 기억문제를 해결하기 위한 1차원 합성곱 신경망 기반의 오토인코더 및 GRU와 1차원 합성곱을 결합한 오토인코더를 사용하였다. 실험에 사용된 데이터는 시계열 기반의 NGIDS(Next Generation IDS Dataset) 데이터이며, 실험 결과 GRU 기반의 오토인코더나, 1차원 합성곱 기반의 오토인코더를 사용한 모델보다 GRU와 1차원 합성곱을 결합한 오토인코더가 훈련 데이터로부터 유용한 잠재 패턴을 추출하기 위한 학습 시간적 측면에서 효율적이었고 이상 탐지 성능 변동의 폭이 더 작은 안정된 성능을 보였다.

As advanced cyber threats continue to increase in recent years, it is difficult to detect new types of cyber attacks with existing pattern or signature-based intrusion detection method. Therefore, research on anomaly detection methods using data learning-based artificial intelligence technology is increasing. In addition, supervised learning-based anomaly detection methods are difficult to use in real environments because they require sufficient labeled data for learning. Research on an unsupervised learning-based method that learns from normal data and detects an anomaly by finding a pattern in the data itself has been actively conducted. Therefore, this study aims to extract a latent vector that preserves useful sequence information from sequence log data and develop an anomaly detection learning model using the extracted latent vector. Word2Vec was used to create a dense vector representation corresponding to the characteristics of each sequence, and an unsupervised autoencoder was developed to extract latent vectors from sequence data expressed as dense vectors. The developed autoencoder model is a recurrent neural network GRU (Gated Recurrent Unit) based denoising autoencoder suitable for sequence data, a one-dimensional convolutional neural network-based autoencoder to solve the limited short-term memory problem that GRU can have, and an autoencoder combining GRU and one-dimensional convolution was used. The data used in the experiment is time-series-based NGIDS (Next Generation IDS Dataset) data, and as a result of the experiment, an autoencoder that combines GRU and one-dimensional convolution is better than a model using a GRU-based autoencoder or a one-dimensional convolution-based autoencoder. It was efficient in terms of learning time for extracting useful latent patterns from training data, and showed stable performance with smaller fluctuations in anomaly detection performance.

키워드

과제정보

이 논문은 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임(No. NRF-2019R1F1A1059036).

참고문헌

J. Song, H. Takakura, and Y. Kwon, "A generalized feature extraction scheme to detect 0-day attacks via IDS alerts," International Symposium on Applications and the Internet, 2008. https://doi.org/10.1109/SAINT.2008.85
E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo, "A geometric framework for unsupervised anomaly detection," Applications of Data Mining in Computer Security, Vol.6, pp.77-101, 2002. Springer, Boston, MA, https://doi.org/10.1007/978-1-4615-0953-0_4
S. Selva Birunda and R. Kanniga Devi, "A review on word embedding techniques for text classification," Innovative Data Communication Technologies and Application, Vol. 59, pp.267-281, Springer, Singapore, 2021. https://doi.org/10.1007/978-981-15-9651-3_23
T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781v3, 2013. https://doi.org/10.48550/arXiv.1301.3781
M. A. Kramer, "Nonlinear principal component analysis using autoassociative neural networks," AIChE Journal, Vol.37, No.2, pp.233-243, 1991. https://doi.org/10.1002/aic.690370209
F. T. Liu, K. M. Ting, and Z. Zhou, "Isolation forest," Eighth IEEE International Conference on Data Mining, pp.413-422, 2008. Pisa, https://doi.org/10.1109/ICDM.2008.17
W. Haider, J. Hua, J. Slaya, B. P. Turnbull, and Y. Xieb, "Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling," Journal of Network and Computer Applications, Vol.87, No.1, pp.185-192, 2017. https://doi.org/10.1016/j.jnca.2017.03.018
M. M. Breunig et al., "LOF: Identifying density-based local outliers," Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas Texas, USA, 2000. https://doi.org/10.1145/342009.335388
Y. Chen, X. S. Zhou, and T. S. Huang, "One-class SVM for learning in image retrieval," Proceedings of International Conference on Image Processing, Vol.1, pp.34-37, 2001. https://doi.org/10.1109/ICIP.2001.958946
L. Ruff et al., "Deep one-class classification," Proceedings of the 35th International Conference on Machine Learning (PMLR), Vol.80, pp.4393-4402, 2018. https://proceedings.mlr.press/v80/ruff18a.html
C. Baur et al., "Deep autoencoding models for unsupervised anomaly segmentation in brain MR images," International MICCAI Brainlesion Workshop, pp.161-169, Granada Spain, 2018. https://doi.org/10.1007/978-3-030-11723-8_16
P. Bergmann et al., "Improving unsupervised defect segmentation by applying structural similarity to autoencoders," arXiv preprint arXiv:1807.02011v3, 2018. https://doi.org/10.48550/arXiv.1807.02011
S. Pidhorskyi, R. Almohsen, D. A. Adjeroh, and G. Doretto, "Generative probabilistic novelty detection with adversarial autoencoders," Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS 2018), pp.6823-6834, Montreal Canada, Dec. 2018. https://dl.acm.org/doi/10.5555/3327757.3327787
T. Kieu, B. Yang, C. Guo, and C. S. Jensen, "Outlier detection for time series with recurrent autoencoder ensembles," Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI-19), pp.2725-2732, Macao China, Aug. 2019. https://doi.org/10.24963/ijcai.2019/378
K. Sadaf and J. Sultana, "Intrusion detection based on autoencoder and isolation forest in fog computing," IEEE Access, Vol.8, pp.167059-167068, 2020. https://doi.org/10.1109/ACCESS.2020.3022855
G. Andresini, A. Appice, N. D. Mauro, C. Loglisci, and D. Malerba, "Multi-channel deep feature learning for intrusion detection," IEEE Access, Vol.8, pp.53346-53359, 2020. https://doi.org/10.1109/ACCESS.2020.2980937
C. Kim, M. Jang, S. Seo, K. Park, and P. Kang, "Intrusion detection based on sequential information preserving log embedding methods and anomaly detection algorithms," IEEE Access, Vol.9, pp.58088-58101, 2021. https://doi.org/10.1109/ACCESS.2021.3071763
S. Ranga and M. N. Guptha, "Log anomaly detection using sequential convolution neural networks and Dual-LSTM model," SN Computer Science, Vol.4, No.3, 2023. https://doi.org/10.1007/s42979-023-01676-6
W. Tang, C. M. Vian, Z. Tang, and B. Yang, "Anomaly detection of core failures in die casting X-ray inspection images using a convolutional autoencoder," Machine Vision and Application, Vol.32, No.4, pp.1-17, 2021. https://doi.org/10.1007/s00138-021-01226-1
M. S. Elsayed et al., "Network anomaly detection using LSTM based autoencoder," Proceedings of the 16th ACM Symposium on QoS and Security for Wireless and Mobile Networks, pp.37-45, Alicante, Spain, Nov. 2020. https://doi.org/10.1145/3416013.3426457
M. A. Kabir and X. Luo, "Unsupervised learning for network flow based anomaly detection in the era of deep learning," IEEE Sixth International Conference on Big Data Computing Service and Applications (BigDataService), pp.165-168, Oxford, UK, 2020. https://doi.org/10.1109/BigDataService49289.2020.00032
M. Aljanabi et al., "Intrusion detection systems, issues, challenges, and needs," International Journal of Computational Intelligence Systems, Vol.14, No.1, pp.560-571, 2021. https://doi.org/10.2991/ijcis.d.210105.001
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," Presented in NIPS 2014 Deep Learning and Representation Learning Workshop, arXiv preprint arXiv: 1412.3555, 2014. https://doi.org/10.48550/arXiv.1412.3555
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, Vol.9, No.8, pp.1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
R. Chalapathy, A. K. Menon, and S. Chawla, "Anomaly detection using one-class neural networks," arXiv preprint arXiv:1802.06360, 2019. https://doi.org/10.48550/arXiv.1802.0636
A. Vaswani et al., "Attention is all you need," 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 2017. https://doi.org/10.48550/arXiv.1706.03762

정보처리학회논문지:소프트웨어 및 데이터공학 (KIPS Transactions on Software and Data Engineering)

비지도 학습 기반의 임베딩과 오토인코더를 사용한 침입 탐지 방법

Intrusion Detection Method Using Unsupervised Learning-Based Embedding and Autoencoder

초록

키워드

과제정보

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)