DOI QR코드

DOI QR Code

A Data-centric Analysis to Evaluate Suitable Machine-Learning-based Network-Attack Classification Schemes

  • Received : 2021.06.05
  • Published : 2021.06.30

Abstract

Since machine learning was invented, there have been many different machine learning-based algorithms, from shallow learning to deep learning models, that provide solutions to the classification tasks. But then it poses a problem in choosing a suitable classification algorithm that can improve the classification/detection efficiency for a certain network context. With that comes whether an algorithm provides good performance, why it works in some problems and not in others. In this paper, we present a data-centric analysis to provide a way for selecting a suitable classification algorithm. This data-centric approach is a new viewpoint in exploring relationships between classification performance and facts and figures of data sets.

Keywords

Acknowledgement

This research is funded by the Hanoi University of Science and Technology (HUST) under project number T2020-SAHEP-010. We also thank for the technical contribution of Miss. Nguyen Thuy Linh - our student.

References

  1. C.E. Brodley, Addressing the selective superiority problem: Automatic algorithm/model class selection, in: Proceedings of the tenth international conference on machine learning, 1993, pp. 17-24.
  2. P. Brazdil, J. Gama, B. Henery, Characterizing the applicability of classification algorithms using meta-level learning, in: Proc. European Conference on Machine Learning, 1994,
  3. Ricardo Vilalta, Christophe Giraud-Carrier, Pavel Brazdil, Carlos Soares: Using meta-learning to support data Mining. IJCSA. 1(1), pp.31-45, 2004
  4. C. Giraud-Carrier, R.Vilalta and P. Brazdil, -Introduction to the special issue on meta-learning‖, Machine Learning 54, 187-193, 2004. https://doi.org/10.1023/B:MACH.0000015878.60765.42
  5. G. Wang, Q. Song, X. Zhu, An improved data characterization method and its application in classification algorithm recommendation, Appl. Intell. 43 (4) (2015) 892-912. https://doi.org/10.1007/s10489-015-0689-3
  6. R. Ali, S. Lee, T.C. Chung, Accurate multi-criteria decision making methodology for recommending machine learning algorithm, Expert Syst. Appl. 71 (4) (2017) 257-278 https://doi.org/10.1016/j.eswa.2016.11.034
  7. S. Gore, N. Pise, Dynamic algorithm selection for data mining classification, Int. J. Sci. Eng. Res. 4 (12) (2013) 2029-2033
  8. D.H. Wolpert, W.G. Macready: No free lunch theorem for search, Technical Report SFI-TR-05-010, Santa Fe Institute, Santa Fe, NM, 1995
  9. I. Ullah and Q. H. Mahmoud, A Technique for Generating a Botnet Dataset for Anomalous Activity Detection in IoT Networks, vol. 2020-October, no. April 2021. Springer International Publishing, 2020
  10. Y. Meidan et al., "N-BaIoT-Network-based detection of IoT botnet attacks using deep autoencoders," IEEE Pervasive Comput., vol. 17, no. 3, pp. 12-22, 2018, doi: 10.1109/MPRV.2018.03367731.
  11. Y. Mirsky, T. Doitshman, Y. Elovici, and A. Shabtai, "Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection," no. February, pp. 18-21, 2018, doi: 10.14722/ndss.2018.23204.
  12. I. Vaccari, G. Chiola, M. Aiello, M. Mongelli, and E. Cambiaso, "Mqttset, a new dataset for machine learning techniques on mqtt," Sensors (Switzerland), vol. 20, no. 22, pp. 1-17, 2020, doi: 10.3390/s20226578.
  13. H. Hindy, E. Bayne, M. Bures, R. Atkinson, C. Tachtatzis, and X. Bellekens, "Machine Learning Based IoT Intrusion Detection System: An MQTT Case Study (MQTT-IoT-IDS2020 Dataset)," Lect. Notes Networks Syst., vol. 180, pp. 73-84, 2021, doi: 10.1007/978-3-030-64758-2_6.
  14. A. Alsaedi, N. Moustafa, Z. Tari, A. Mahmood, and Adna N Anwar, "TON-IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems," IEEE Access, vol. 8, pp. 165130-165150, 2020, doi: 10.1109/ACCESS.2020.3022862.
  15. N. Moustafa and J. Slay, "UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)," 2015 Mil. Commun. Inf. Syst. Conf. MilCIS 2015 - Proc., 2015, doi: 10.1109/MilCIS.2015.7348942.
  16. I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, "Toward generating a new intrusion detection dataset and intrusion traffic characterization," ICISSP 2018 - Proc. 4th Int. Conf. Inf. Syst. Secur. Priv., vol. 2018-January, no. Cic, pp. 108-116, 2018, doi: 10.5220/0006639801080116.
  17. I. Sharafaldin, A. H. Lashkari, S. Hakak, and A. A. Ghorbani, "Developing realistic distributed denial of service (DDoS) attack dataset and taxonomy," Proc. - Int. Carnahan Conf. Secur. Technol., vol. 2019-October, no. Cic, 2019, doi: 10.1109/CCST.2019.8888419.
  18. A. H. Lashkari, G. D. Gil, M. S. I. Mamun, and A. A. Ghorbani, "Characterization of tor traffic using time based features," ICISSP 2017 - Proc. 3rd Int. Conf. Inf. Syst. Secur. Priv., vol. 2017-January, no. January, pp. 253-262, 2017, doi: 10.5220/0006105602530262.
  19. Bill Fulkerson (1995) Machine Learning, Neural and Statistical Classification, Technometrics, 37:4, 459, DOI: 10.1080/.1995.10484383
  20. Neelam Agarwalla et al, "Deep Learning using Restricted Boltzmann Machines" in International Journal of Computer Science and Information Technologies, Vol.7(3), 2016, 1552-1556