Improvement of Network Intrusion Detection Rate by Using LBG Algorithm Based Data Mining

LBG 알고리즘 기반 데이터마이닝을 이용한 네트워크 침입 탐지율 향상

  • Park, Seong-Chul (The Department of Computer Engineering, Dongguk University) ;
  • Kim, Jun-Tae (The Department of Computer Engineering, Dongguk University)
  • 박성철 (동국대학교 컴퓨터공학과) ;
  • 김준태 (동국대학교 컴퓨터공학과)
  • Received : 2009.09.15
  • Accepted : 2009.10.10
  • Published : 2009.12.31

Abstract

Network intrusion detection have been continuously improved by using data mining techniques. There are two kinds of methods in intrusion detection using data mining-supervised learning with class label and unsupervised learning without class label. In this paper we have studied the way of improving network intrusion detection accuracy by using LBG clustering algorithm which is one of unsupervised learning methods. The K-means method, that starts with random initial centroids and performs clustering based on the Euclidean distance, is vulnerable to noisy data and outliers. The nonuniform binary split algorithm uses binary decomposition without assigning initial values, and it is relatively fast. In this paper we applied the EM(Expectation Maximization) based LBG algorithm that incorporates the strength of two algorithms to intrusion detection. The experimental results using the KDD cup dataset showed that the accuracy of detection can be improved by using the LBG algorithm.

네트워크 침입 탐지는 데이터마이닝 기법을 활용하면서 지속적으로 발전하여 왔다. 데이터마이닝에 의한 침입 탐지 기법에는 클래스 레이블을 이용한 감독 학습과 클래스 레이블이 없는 비감독 학습 방법이 있다. 본 논문에서는 클래스 레이블이 없는 비감독 학습 방법인 LBG 클러스터링 알고리즘을 이용하여 네트워크 침입 탐지 정확도를 높이는 방법을 연구하였다. 임의의 초기 중심값들로 시작하여 유클리디언 거리 기반에 의해 클러스터링을 수행하는 K-means 방법은 잡음(noisy) 데이터와 이상치(outlier)에 대하여 취약하다는 단점이 있다. 비균일이진 분할에 의한 클러스터링 알고리즘은 초기값 없이 이진분할에 의해 클러스터링을 수행하며 수행 속도가 빠르다. 본 논문에서는 이 두 알고리즘의 장단점을 통합한 EM(Expectation Maximization) 기반의 LBG 알고리즘을 네트워크 침입 탐지에 적용하였으며, KDD 컵 데이터셋을 대상으로 한 실험을 통하여 LBG 알고리즘을 이용함으로써 침입 탐지의 정확도를 높일 수 있음을 보였다.

Keywords

References

  1. Breunig, M., H.-P. Kriegel, R. T. Ng, J. Sander, "LO F : identifying density-based local outliers", Proceedings of the ACM SIGMOD International Conference on Management of Data, Vol.29, No.2(2000), 93-104. https://doi.org/10.1145/335191.335388
  2. Chavan, S., K. Shah, N. Dave, S. Mukherjee, A. Abraham, "Adaptive neuro-fuzzy intrusion detection systems", Information Technology : Coding and Computing, Vol.1(2004), 70-74.
  3. Denning, D. E., "An intrusion-detection model", IEEE Transactions on Software Engineering, Vol.SE-13, No.2(2004), 222-232.
  4. A.K. Ghosh, A. Schwartzbard, "A Study in Using Neural Networks for Anomaly and Misuse Detection", Proceedings of the 7th USENIX Security Symposium, 1998.
  5. Han, H., X. L. Lu, J Lu, C. Bo, R. L. Yong, "Data mining aided signature discovery in network-based intrusion detection system", ACM SIGOPS Operating Systems Review, Vol.36, No.4(2002), 7-13. https://doi.org/10.1145/583800.583801
  6. Kanungo, T., D. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Wu., "An efficient K-means clustering algorithm : analysis and implementation", In IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No.7(2002), 881-892. https://doi.org/10.1109/TPAMI.2002.1017616
  7. Kruegel, C., D. Mutz, W. Robertson, F. Valeur, (2003), "Bayesian event classification for intrusion detection", Proceedings of the 19th Annual Computer Security Applications Conference, (2003), 14-23.
  8. Lazarevic, A., L. Ertoz, V. Kumar, A. Ozgur, "A comparative study of anomaly detection schemes in network intrusion detection", Proceedings of the Third SIAM International Conference, (2004), 25-36.
  9. Lee, W., S. J. Stolfo, K. W. Mok, "A data mining framework for building intrusion detection models", IEEE Symposium on Security and Privacy, (1999), 120-132.
  10. Lee, W., S. J. Stolfo, P. K. Chan, E. Eskin, W. Fan, "Real time data mining-based intrusion detection", DARPA Information Survivability Conference (2001).
  11. Lichodzijewski, P., A. Zincir-Heywood, and M. Heywood., "Dynamic intrusion detection using self-organizing maps", The 14th Annual Canadian Information Technology Security, 2002.
  12. Linde, Y., A. Buzo, R. Gray, "An Algorithm for Vector Quantizer Design", IEEE Transaction on Communications, Vol.28 No.1(1980), 84-94 https://doi.org/10.1109/TCOM.1980.1094577
  13. Mukkamala, S., G. Janoski, A. Sung, "Intrusion detection using neural networks and support vector machines", Proceedings of IEEE International Joint Conference on Neural Networks, (2002), 1702-1707.
  14. Patan, G. and M. Russo., "The enhanced LBG algorithm", Neural Networks, Vol.14, No.9(2001), 1219-1237. https://doi.org/10.1016/S0893-6080(01)00104-6
  15. Portnoy, L., E. Eskin, S. Stolfo, "Intrusion detection with unlabeled data using clustering", Proceedings of ACM CSS Workshop on Data Mining 2001.
  16. Yahia, M. E., B. A. Ibrahim, "K-nearest neighbor and C4.5 algorithms as data mining methods-advantages and difficulties", Computer Systems and Applications, 2003.
  17. Zheng, J. and M. Hu, "An Anomaly Intrusion Detection Sys Based on Vector Quantization", IEICE-Transactions on Information and Systems archive, Vol. E89-D, No.1, (2006), 201-210. https://doi.org/10.1093/ietisy/e89-d.1.201