A Pre-processing Study to Solve the Problem of Rare Class Classification of Network Traffic Data

Ryu, Kyung Joon;Shin, DongIl;Shin, DongKyoo;Park, JeongChan;Kim, JinGoog;

doi:10.3745/KTSDE.2020.9.12.411

정보처리학회논문지:소프트웨어 및 데이터공학 (KIPS Transactions on Software and Data Engineering)

제9권12호
/
Pages.411-418
/
2020
/
2287-5905(pISSN)
/
2734-0503(eISSN)

한국정보처리학회 (Korea Information Processing Society)

DOI QR Code

네트워크 트래픽 데이터의 희소 클래스 분류 문제 해결을 위한 전처리 연구

A Pre-processing Study to Solve the Problem of Rare Class Classification of Network Traffic Data

류경준 (세종대학교 컴퓨터공학과) ;
신동일 (세종대학교 컴퓨터공학과) ;
신동규 (세종대학교 컴퓨터공학과) ;
박정찬 (국방과학연구소) ;
김진국 (국방과학연구소)

투고 : 2020.07.22
심사 : 2020.11.19
발행 : 2020.12.31

https://doi.org/10.3745/KTSDE.2020.9.12.411 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

정보보안을 위한 IDS(Intrusion Detection Systems)는 통상적으로 서명기반(signature based) 침입탐지시스템과 이상기반(anomaly-based) 침입 탐지시스템으로 분류한다. 이 중에서도 네트워크에서 발생하는 트래픽 데이터를 기계학습으로 분석하는 이상기반 IDS 연구가 활발하게 진행됐다. 본 논문에서는 공격 유형 학습에 사용되는 데이터에 존재하는 희소 클래스 문제로 인한 성능 저하를 해결하기 위한 전처리 방안에 대해 연구했다. 희소 클래스(Rare Class)와 준 희소 클래스(Semi Rare Class)를 기준으로 데이터를 재구성하여 기계학습의 분류 성능의 개선에 대하여 실험했다. 재구성된 3종의 데이터 세트에 대하여 Wrapper와 Filter 방식을 연이어 적용하는 하이브리드 특징 선택을 수행한 이후에 Quantile Scaler로 정규화를 처리하여 전처리를 완료한다. 준비된 데이터는 DNN(Deep Neural Network) 모델로 학습한 후 TP(True Positive)와 FN(False Negative)를 기준으로 분류 성능을 평가했다. 이 연구를 통해 3종류의 데이터 세트에서 분류 성능이 모두 개선되는 결과를 얻었다.

In the field of information security, IDS(Intrusion Detection System) is normally classified in two different categories: signature-based IDS and anomaly-based IDS. Many studies in anomaly-based IDS have been conducted that analyze network traffic data generated in cyberspace by machine learning algorithms. In this paper, we studied pre-processing methods to overcome performance degradation problems cashed by rare classes. We experimented classification performance of a Machine Learning algorithm by reconstructing data set based on rare classes and semi rare classes. After reconstructing data into three different sets, wrapper and filter feature selection methods are applied continuously. Each data set is regularized by a quantile scaler. Depp neural network model is used for learning and validation. The evaluation results are compared by true positive values and false negative values. We acquired improved classification performances on all of three data sets.

키워드

참고문헌

V. Kanimozhi, and T. P. Jacob, "Artificial intelligence based network intrusion detection with hyper-parameter optimization tuning on the realistic cyber dataset CSECIC-IDS2018 using cloud computing," In: 2019 International Conference on Communication and Signal Processing (ICCSP). IEEE, pp.0033-0036, 2019.
J. H. Seo, "A comparative study on the classification of the imbalanced intrusion detection dataset based on deep learning," Journal of Korean Institute of Intelligent Systems, Vol.28, No.2, pp.152-159, 2018. https://doi.org/10.5391/jkiis.2018.28.2.152
M. H. Abdulraheem and N. B. Ibraheem, "A detailed analysis of new intrusion detection dataset," Journal of Theoretical and Applied Information Technology, Vol.97, No.17, 2019.
B. Alsughayyir, A. M. Qamar, and R. Khan, "Developing a Network Attack Detection System Using Deep Learning," In: 2019 International Conference on Computer and Information Sciences (ICCIS). IEEE. pp.1-5, 2019.
Q. Zhou and D. Pezaros, "Evaluation of Machine Learning Classifiers for Zero-Day Intrusion Detection--An Analysis on CIC-AWS-2018 dataset," arXiv preprint arXiv: 1905.03685, 2019.
B. K. Singh, K. Verma, and A. S. Thoke, "Investigations on impact of feature normalization techniques on classifier's performance in breast tumor classification," International Journal of Computer Applications, Vol.116, No.19, 2015.
Z. Liu and W. Li, "A method of SVM with normalization in intrusion detection," Procedia Environmental Sciences, Vol.11, pp.256-262, 2011. https://doi.org/10.1016/j.proenv.2011.12.040
scikit-learn.org [Internet], https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
S. C. Hicks and R. A. Irizarry, "When to use quantile normalization?," BioRxiv, 2014.
scikit-learn.org [Internet], https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.quantile_transform.html
P. Devan and N. Khare, "An efficient XGBoost-DNN-based classification model for network intrusion detection system," Neural Computing and Applications, 1-16, 2020.
N. Qazi and K. Raza, "Effect of feature selection, SMOTE and under sampling on class imbalance classification," In: 2012 UKSim 14th International Conference on Computer Modelling and Simulation. IEEE. pp.145-150, 2012.
I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, "Toward generating a new intrusion detection dataset and intrusion traffic characterization," In: ICISSP. pp.108-116, 2018.
J. M. Cadenas, M. C. Garrido, and R. MartiNez, "Feature subset selection filter-wrapper based on low quality data," Expert Systems with Applications, Vol.40, No.16, pp.6241-6252, 2013. https://doi.org/10.1016/j.eswa.2013.05.051
H. Min and Wu. Fangfang, "Filter-wrapper hybrid method on feature selection," In: 2010 Second WRI Global Congress on Intelligent Systems. IEEE. pp.98-101, 2010.

정보처리학회논문지:소프트웨어 및 데이터공학 (KIPS Transactions on Software and Data Engineering)

네트워크 트래픽 데이터의 희소 클래스 분류 문제 해결을 위한 전처리 연구

A Pre-processing Study to Solve the Problem of Rare Class Classification of Network Traffic Data

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)