DOI QR코드

DOI QR Code

Developing an Intrusion Detection Framework for High-Speed Big Data Networks: A Comprehensive Approach

  • Received : 2018.03.04
  • Accepted : 2018.07.13
  • Published : 2018.08.31

Abstract

In network intrusion detection research, two characteristics are generally considered vital to building efficient intrusion detection systems (IDSs): an optimal feature selection technique and robust classification schemes. However, the emergence of sophisticated network attacks and the advent of big data concepts in intrusion detection domains require two more significant aspects to be addressed: employing an appropriate big data computing framework and utilizing a contemporary dataset to deal with ongoing advancements. As such, we present a comprehensive approach to building an efficient IDS with the aim of strengthening academic anomaly detection research in real-world operational environments. The proposed system has the following four characteristics: (i) it performs optimal feature selection using information gain and branch-and-bound algorithms; (ii) it employs machine learning techniques for classification, namely, Logistic Regression, Naïve Bayes, and Random Forest; (iii) it introduces bulk synchronous parallel processing to handle the computational requirements of large-scale networks; and (iv) it utilizes a real-time contemporary dataset generated by the Information Security Centre of Excellence at the University of Brunswick (ISCX-UNB) to validate its efficacy. Experimental analysis shows the effectiveness of the proposed framework, which is able to achieve high accuracy, low computational cost, and reduced false alarms.

Keywords

References

  1. K. Grahn, M. Westerlund, and G. Pulkkis, "Analytics for network security: A survey and taxonomy," in Proc. of Information Fusion for Cyber-security Analytics, Springer, New York, NY, USA, pp. 175-193, 2017.
  2. A. L. Buczak, and E. Guven, "A survey of data mining and machine learning methods for cyber security intrusion detection," IEEE Communications Surveys & Tutorials, Vol. 18, No. 2, pp. 1153-1176, 2016. https://doi.org/10.1109/COMST.2015.2494502
  3. Cisco Visual Networking Index, The Zettabyte Era: Trends and Analysis, June 2017.
  4. R. Heady, G. F. Luger, A. Maccabe, and M. Servilla, "The architecture of a network level intrusion detection system," Technical Report, Department of Computer Science. College of Engineering, University of New Mexico, Albuquerque, NM, USA, 15 August 1990.
  5. V. P. Janeja, A. Azari, J. M. Namayanja, and B. Heilig, "B-dids: Mining anomalies in a Big-distributed Intrusion Detection System," in Proc. of Proceedings of the 2014 IEEE International Conference on Big Data (Big Data), Washington, DC, USA, 27-30 October 2014, pp. 32-34.
  6. R. Kumari, M. K. Singh, R. Jha, and N. K. Singh, "Anomaly detection in network traffic using K-mean clustering," in Proc. of IEEE 3rd International Conference on Recent Advances in Information Technology (RAIT), Dhanbad, India, 3-5 March 2016, pp. 387-393.
  7. M. M. Rathore, A. Ahmad, and A. Paul, "Real time intrusion detection system for ultra-high-speed big data environments," The Journal of Supercomputing, Vol. 72, No. 9, pp. 3489-3510, 2016. https://doi.org/10.1007/s11227-015-1615-5
  8. R. Zuech, T. M. Khoshgoftaar, and R. Wald, "Intrusion detection and Big Heterogeneous Data: a survey," Journal of Big Data, Vol. 2, No. 1, 2015.
  9. A. Ozgur, and H. Erdem, "A review of KDD99 dataset usage in intrusion detection and machine learning between 2010 and 2015," PeerJ PrePrints, 2016.
  10. A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, "Toward developing a systematic approach to generate benchmark datasets for intrusion detection," Computers & Security, Vol. 31, No. 3, pp. 357-374, 2012. https://doi.org/10.1016/j.cose.2011.12.012
  11. MAWI Working Group Traffic Archive: Available online: (accessed on 20 February 2018).
  12. M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, "Towards Generating Real-life Datasets for Network Intrusion Detection," IJ Network Security, Vol. 17, No. 6, pp. 683-701, 2015.
  13. The UNSW-NB15 Dataset: Available online: (accessed on 20 February 2018).
  14. W. Haider, J. Hu, J. Slay, B. P. Turnbull, and Y. Xie, "Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling," Journal of Network and Computer Applications, Vol. 87, pp. 185-192, 2017. https://doi.org/10.1016/j.jnca.2017.03.018
  15. R. Sommer, and V. Paxson, "Outside the closed world: On using machine learning for network intrusion detection," in Proc. of IEEE Symposium on Security and Privacy (SP), pp. 305-316, 2010.
  16. J. P. Anderson, "Computer security threat monitoring and surveillance," Technical Report, Vol. 17, Fort Washington, USA, 1980.
  17. S. Axelsson, "Intrusion detection systems: A survey and taxonomy," Technical Report, Vol. 99, 2000.
  18. M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, "Network anomaly detection: methods, systems and tools," IEEE Communications Surveys & Tutorials, Vol. 16, No. 1, pp. 303-336, 2014. https://doi.org/10.1109/SURV.2013.052213.00046
  19. S. Suthaharan, "Big data classification: Problems and challenges in network intrusion prediction with machine learning," ACM SIGMETRICS Performance Evaluation Review, Vol. 41, No. 4, pp. 70-73, 2014. https://doi.org/10.1145/2627534.2627557
  20. L. Cheng, F. Liu, and D. D. Yao, "Enterprise data breach: causes, challenges, prevention, and future directions," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 7, No. 5, 2017.
  21. Apache Hadoop. Available online: (accessed on 20 February 2018).
  22. Apache Spark. Available online: (accessed on 20 February 2018).
  23. Apache Storm. Available online: (accessed on 20 February 2018).
  24. M. A. Manzoor, and Y. Morgan, "Network intrusion detection system using apache storm," Advances in Science, Technology and Engineering Systems Journal, Vol. 2, Issue 3, pp. 812-818, 2017. https://doi.org/10.25046/aj0203102
  25. S. H. Kang, and K. J. Kim, "A feature selection approach to find optimal feature subsets for the network intrusion detection system," Cluster Computing, Vol. 19, No. 1, pp. 325-333, 2016. https://doi.org/10.1007/s10586-015-0527-8
  26. M. Kakavand, N. Mustapha, A. Mustapha, and M. T. Abdullah, "Effective Dimensionality Reduction of Payload-Based Anomaly Detection in TMAD Model for HTTP Payload," KSII Transactions on Internet and Information Systems, Vol. 10, No. 8, pp. 3884-3910, 2016. https://doi.org/10.3837/tiis.2016.08.025
  27. G. Kumar, and K. Kumar, "Design of an evolutionary approach for intrusion detection," The Scientific World Journal, 2013.
  28. W. Yassin, N. I. Udzir, Z. Muda, and M. N. Sulaiman, "Anomaly-based intrusion detection through k-means clustering and naives bayes classification," in Proc. of Proceedings of 4th International Conference on Computing and Informatics (ICOCI), No. 49, pp. 298-303, 2013.
  29. M. H. Tahir, A. M. Said, N. H. Osman, N. H. Zakaria, P. N. M. Sabri, and N. Katuk, "Oving K-Means Clustering using discretization technique in Network Intrusion Detection System," in Proc. of IEEE 3rd International Conference on Computer and Information Sciences (ICCOINS), 15-17 August 2016, Kuala Lampur, Malaysia, pp. 248-252.
  30. Z. Tan, A. Jamdagni, X. He, P. Nanda, R. P. Liu, and J. Hu, "Detection of denial-of-service attacks based on computer vision techniques," IEEE Transactions on Computers, Vol. 64, No. 9, pp. 2519-2533, 2015. , https://doi.org/10.1109/TC.2014.2375218
  31. H. Sallay, A. Ammar, M. B. Saad, and S. Bourouis, "A real time adaptive intrusion detection alert classifier for high speed networks," in Proc. of IEEE 12th International Symposium on Network Computing and Applications (NCA), 22-24 August 2013, Cambridge, MA, USA, pp. 73-80.
  32. H. Liu, and H. Motoda, "Data reduction via instance selection," Instance selection and construction for data mining, pp. 3-20. Springer, Boston, MA, 2001.
  33. H. Trevor, T. Robert, and J. Friedman, "The elements of statistical learning," Vol. 1, 2001.
  34. Jr. D.W. Hosmer, S. Lemeshow, and R. X. Sturdivant, "Applied logistic regression," Vol. 398, John Wiley & Sons, Hoboken, NJ, USA, 2013.
  35. I. Rish, "An empirical study of the naive Bayes classifier," in Proc. of IBM IJCAI Workshop on Empirical Methods in Artificial Intelligence, Vol. 3, No. 22, pp. 41-46, 2001.
  36. S. McCann, and D. G. Lowe, "Local naive bayes nearest neighbor for image classification," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 16-21 June 2012, Providence, RI, USA, pp. 3650-3656.
  37. M. Langarizadeh, and F. Moghbeli, "Applying naive Bayesian networks to disease prediction: a systematic review," Acta Informatica Medica, Vol. 24, No. 5, 2016.
  38. G. Biau, "Analysis of a random forests model," The Journal of Machine Learning Research, pp. 1063-1095, 2012.
  39. K. Fawagreh, M. M. Gaber, and E. Elyan, "Random forests: from early developments to recent advancements," Systems Science & Control Engineering: An Open Access Journal, Vol. 2, No. 1, pp. 602-609, 2014. https://doi.org/10.1080/21642583.2014.956265
  40. M. Denil, D. Matheson, and N. D. Freitas, "Narrowing the gap: Random forests in theory and in practice," in Proc. of International Conference on Machine Learning (ICML), 2014.
  41. K. Siddique, Z. Akhtar, E. J. Yoon, Y. S. Jeong, D. Dasgupta, and Y. Kim, "Apache Hama: an emerging bulk synchronous parallel computing framework for big data applications," IEEE Access, Vol. 4, pp. 8879-8887, 2016. https://doi.org/10.1109/ACCESS.2016.2631549
  42. K. Siddique, Z. Akhtar, Y. Kim, Y. S. Jeong, and E. J. Yoon, "Investigating Apache Hama: a bulk synchronous parallel computing framework," The Journal of Supercomputing, Vol. 73, No. 9, pp. 4190-4205, 2017. https://doi.org/10.1007/s11227-017-1987-9
  43. M. Sokolova, and G. Lapalme, "A systematic analysis of performance measures for classification tasks," Information Processing & Management, Vol. 45, No. 4, pp. 427-437, 2009. https://doi.org/10.1016/j.ipm.2009.03.002

Cited by

  1. 빅데이터를 활용한 드론의 이상 예측시스템 연구 vol.21, pp.2, 2018, https://doi.org/10.7472/jksii.2020.21.2.27