Developing an Intrusion Detection Framework for High-Speed Big Data Networks: A Comprehensive Approach

Siddique, Kamran;Akhtar, Zahid;Khan, Muhammad Ashfaq;Jung, Yong-Hwan;Kim, Yangwoo;

doi:10.3837/tiis.2018.08.026

KSII Transactions on Internet and Information Systems (TIIS)

Volume 12 Issue 8
/
Pages.4021-4037
/
2018
/
1976-7277(pISSN)
/
1976-7277(eISSN)

Korean Society for Internet Information (한국인터넷정보학회)

DOI QR Code

Developing an Intrusion Detection Framework for High-Speed Big Data Networks: A Comprehensive Approach

Siddique, Kamran (Dongguk University) ;
Akhtar, Zahid (University of Memphis) ;
Khan, Muhammad Ashfaq (Dongguk University) ;
Jung, Yong-Hwan (Korea Institute of Science and Technology Information) ;
Kim, Yangwoo (Dongguk University)

Received : 2018.03.04
Accepted : 2018.07.13
Published : 2018.08.31

https://doi.org/10.3837/tiis.2018.08.026 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In network intrusion detection research, two characteristics are generally considered vital to building efficient intrusion detection systems (IDSs): an optimal feature selection technique and robust classification schemes. However, the emergence of sophisticated network attacks and the advent of big data concepts in intrusion detection domains require two more significant aspects to be addressed: employing an appropriate big data computing framework and utilizing a contemporary dataset to deal with ongoing advancements. As such, we present a comprehensive approach to building an efficient IDS with the aim of strengthening academic anomaly detection research in real-world operational environments. The proposed system has the following four characteristics: (i) it performs optimal feature selection using information gain and branch-and-bound algorithms; (ii) it employs machine learning techniques for classification, namely, Logistic Regression, Naïve Bayes, and Random Forest; (iii) it introduces bulk synchronous parallel processing to handle the computational requirements of large-scale networks; and (iv) it utilizes a real-time contemporary dataset generated by the Information Security Centre of Excellence at the University of Brunswick (ISCX-UNB) to validate its efficacy. Experimental analysis shows the effectiveness of the proposed framework, which is able to achieve high accuracy, low computational cost, and reduced false alarms.

Keywords

References

K. Grahn, M. Westerlund, and G. Pulkkis, "Analytics for network security: A survey and taxonomy," in Proc. of Information Fusion for Cyber-security Analytics, Springer, New York, NY, USA, pp. 175-193, 2017.
A. L. Buczak, and E. Guven, "A survey of data mining and machine learning methods for cyber security intrusion detection," IEEE Communications Surveys & Tutorials, Vol. 18, No. 2, pp. 1153-1176, 2016. https://doi.org/10.1109/COMST.2015.2494502
Cisco Visual Networking Index, The Zettabyte Era: Trends and Analysis, June 2017.
R. Heady, G. F. Luger, A. Maccabe, and M. Servilla, "The architecture of a network level intrusion detection system," Technical Report, Department of Computer Science. College of Engineering, University of New Mexico, Albuquerque, NM, USA, 15 August 1990.
V. P. Janeja, A. Azari, J. M. Namayanja, and B. Heilig, "B-dids: Mining anomalies in a Big-distributed Intrusion Detection System," in Proc. of Proceedings of the 2014 IEEE International Conference on Big Data (Big Data), Washington, DC, USA, 27-30 October 2014, pp. 32-34.
R. Kumari, M. K. Singh, R. Jha, and N. K. Singh, "Anomaly detection in network traffic using K-mean clustering," in Proc. of IEEE 3rd International Conference on Recent Advances in Information Technology (RAIT), Dhanbad, India, 3-5 March 2016, pp. 387-393.
M. M. Rathore, A. Ahmad, and A. Paul, "Real time intrusion detection system for ultra-high-speed big data environments," The Journal of Supercomputing, Vol. 72, No. 9, pp. 3489-3510, 2016. https://doi.org/10.1007/s11227-015-1615-5
R. Zuech, T. M. Khoshgoftaar, and R. Wald, "Intrusion detection and Big Heterogeneous Data: a survey," Journal of Big Data, Vol. 2, No. 1, 2015.
A. Ozgur, and H. Erdem, "A review of KDD99 dataset usage in intrusion detection and machine learning between 2010 and 2015," PeerJ PrePrints, 2016.
A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, "Toward developing a systematic approach to generate benchmark datasets for intrusion detection," Computers & Security, Vol. 31, No. 3, pp. 357-374, 2012. https://doi.org/10.1016/j.cose.2011.12.012
MAWI Working Group Traffic Archive: Available online: (accessed on 20 February 2018).
M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, "Towards Generating Real-life Datasets for Network Intrusion Detection," IJ Network Security, Vol. 17, No. 6, pp. 683-701, 2015.
The UNSW-NB15 Dataset: Available online: (accessed on 20 February 2018).
W. Haider, J. Hu, J. Slay, B. P. Turnbull, and Y. Xie, "Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling," Journal of Network and Computer Applications, Vol. 87, pp. 185-192, 2017. https://doi.org/10.1016/j.jnca.2017.03.018
R. Sommer, and V. Paxson, "Outside the closed world: On using machine learning for network intrusion detection," in Proc. of IEEE Symposium on Security and Privacy (SP), pp. 305-316, 2010.
J. P. Anderson, "Computer security threat monitoring and surveillance," Technical Report, Vol. 17, Fort Washington, USA, 1980.
S. Axelsson, "Intrusion detection systems: A survey and taxonomy," Technical Report, Vol. 99, 2000.
M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, "Network anomaly detection: methods, systems and tools," IEEE Communications Surveys & Tutorials, Vol. 16, No. 1, pp. 303-336, 2014. https://doi.org/10.1109/SURV.2013.052213.00046
S. Suthaharan, "Big data classification: Problems and challenges in network intrusion prediction with machine learning," ACM SIGMETRICS Performance Evaluation Review, Vol. 41, No. 4, pp. 70-73, 2014. https://doi.org/10.1145/2627534.2627557
L. Cheng, F. Liu, and D. D. Yao, "Enterprise data breach: causes, challenges, prevention, and future directions," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 7, No. 5, 2017.
Apache Hadoop. Available online: (accessed on 20 February 2018).
Apache Spark. Available online: (accessed on 20 February 2018).
Apache Storm. Available online: (accessed on 20 February 2018).
M. A. Manzoor, and Y. Morgan, "Network intrusion detection system using apache storm," Advances in Science, Technology and Engineering Systems Journal, Vol. 2, Issue 3, pp. 812-818, 2017. https://doi.org/10.25046/aj0203102
S. H. Kang, and K. J. Kim, "A feature selection approach to find optimal feature subsets for the network intrusion detection system," Cluster Computing, Vol. 19, No. 1, pp. 325-333, 2016. https://doi.org/10.1007/s10586-015-0527-8
M. Kakavand, N. Mustapha, A. Mustapha, and M. T. Abdullah, "Effective Dimensionality Reduction of Payload-Based Anomaly Detection in TMAD Model for HTTP Payload," KSII Transactions on Internet and Information Systems, Vol. 10, No. 8, pp. 3884-3910, 2016. https://doi.org/10.3837/tiis.2016.08.025
G. Kumar, and K. Kumar, "Design of an evolutionary approach for intrusion detection," The Scientific World Journal, 2013.
W. Yassin, N. I. Udzir, Z. Muda, and M. N. Sulaiman, "Anomaly-based intrusion detection through k-means clustering and naives bayes classification," in Proc. of Proceedings of 4th International Conference on Computing and Informatics (ICOCI), No. 49, pp. 298-303, 2013.
M. H. Tahir, A. M. Said, N. H. Osman, N. H. Zakaria, P. N. M. Sabri, and N. Katuk, "Oving K-Means Clustering using discretization technique in Network Intrusion Detection System," in Proc. of IEEE 3rd International Conference on Computer and Information Sciences (ICCOINS), 15-17 August 2016, Kuala Lampur, Malaysia, pp. 248-252.
Z. Tan, A. Jamdagni, X. He, P. Nanda, R. P. Liu, and J. Hu, "Detection of denial-of-service attacks based on computer vision techniques," IEEE Transactions on Computers, Vol. 64, No. 9, pp. 2519-2533, 2015. , https://doi.org/10.1109/TC.2014.2375218
H. Sallay, A. Ammar, M. B. Saad, and S. Bourouis, "A real time adaptive intrusion detection alert classifier for high speed networks," in Proc. of IEEE 12th International Symposium on Network Computing and Applications (NCA), 22-24 August 2013, Cambridge, MA, USA, pp. 73-80.
H. Liu, and H. Motoda, "Data reduction via instance selection," Instance selection and construction for data mining, pp. 3-20. Springer, Boston, MA, 2001.
H. Trevor, T. Robert, and J. Friedman, "The elements of statistical learning," Vol. 1, 2001.
Jr. D.W. Hosmer, S. Lemeshow, and R. X. Sturdivant, "Applied logistic regression," Vol. 398, John Wiley & Sons, Hoboken, NJ, USA, 2013.
I. Rish, "An empirical study of the naive Bayes classifier," in Proc. of IBM IJCAI Workshop on Empirical Methods in Artificial Intelligence, Vol. 3, No. 22, pp. 41-46, 2001.
S. McCann, and D. G. Lowe, "Local naive bayes nearest neighbor for image classification," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 16-21 June 2012, Providence, RI, USA, pp. 3650-3656.
M. Langarizadeh, and F. Moghbeli, "Applying naive Bayesian networks to disease prediction: a systematic review," Acta Informatica Medica, Vol. 24, No. 5, 2016.
G. Biau, "Analysis of a random forests model," The Journal of Machine Learning Research, pp. 1063-1095, 2012.
K. Fawagreh, M. M. Gaber, and E. Elyan, "Random forests: from early developments to recent advancements," Systems Science & Control Engineering: An Open Access Journal, Vol. 2, No. 1, pp. 602-609, 2014. https://doi.org/10.1080/21642583.2014.956265
M. Denil, D. Matheson, and N. D. Freitas, "Narrowing the gap: Random forests in theory and in practice," in Proc. of International Conference on Machine Learning (ICML), 2014.
K. Siddique, Z. Akhtar, E. J. Yoon, Y. S. Jeong, D. Dasgupta, and Y. Kim, "Apache Hama: an emerging bulk synchronous parallel computing framework for big data applications," IEEE Access, Vol. 4, pp. 8879-8887, 2016. https://doi.org/10.1109/ACCESS.2016.2631549
K. Siddique, Z. Akhtar, Y. Kim, Y. S. Jeong, and E. J. Yoon, "Investigating Apache Hama: a bulk synchronous parallel computing framework," The Journal of Supercomputing, Vol. 73, No. 9, pp. 4190-4205, 2017. https://doi.org/10.1007/s11227-017-1987-9
M. Sokolova, and G. Lapalme, "A systematic analysis of performance measures for classification tasks," Information Processing & Management, Vol. 45, No. 4, pp. 427-437, 2009. https://doi.org/10.1016/j.ipm.2009.03.002

Cited by

빅데이터를 활용한 드론의 이상 예측시스템 연구 vol.21, pp.2, 2018, https://doi.org/10.7472/jksii.2020.21.2.27

KSII Transactions on Internet and Information Systems (TIIS)

Developing an Intrusion Detection Framework for High-Speed Big Data Networks: A Comprehensive Approach

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)