DOI QR코드

DOI QR Code

Study on Machine Learning Techniques for Malware Classification and Detection

  • Received : 2021.07.08
  • Accepted : 2021.10.08
  • Published : 2021.12.31

Abstract

The importance and necessity of artificial intelligence, particularly machine learning, has recently been emphasized. In fact, artificial intelligence, such as intelligent surveillance cameras and other security systems, is used to solve various problems or provide convenience, providing solutions to problems that humans traditionally had to manually deal with one at a time. Among them, information security is one of the domains where the use of artificial intelligence is especially needed because the frequency of occurrence and processing capacity of dangerous codes exceeds the capabilities of humans. Therefore, this study intends to examine the definition of artificial intelligence and machine learning, its execution method, process, learning algorithm, and cases of utilization in various domains, particularly the cases and contents of artificial intelligence technology used in the field of information security. Based on this, this study proposes a method to apply machine learning technology to the method of classifying and detecting malware that has rapidly increased in recent years. The proposed methodology converts software programs containing malicious codes into images and creates training data suitable for machine learning by preparing data and augmenting the dataset. The model trained using the images created in this manner is expected to be effective in classifying and detecting malware.

Keywords

References

  1. AV-TEST GmbH, https://www.av-test.org/en/statistics/malware/
  2. T. M. M. M. I. Jordan, "Machine learning: Trends, perspectives, and prospects," Science, vol. 349, Issue 6245, pp 255-260, Jul. 2015. https://doi.org/10.1126/science.aaa8415
  3. K. W. Kug, "Cases of application by artificial intelligence technology and industry," IITP, 2019.
  4. Z.-K. Zhang, "IoT Security: Ongoing Challenges and Research Opportunities," in Proc. of 2014 IEEE 7th International Conference on Service-Oriented Computing and Applications, pp 230-234, Nov. 2014.
  5. D. L. JS Luo, "Binary malware image classification using machine learning with local binary pattern," in Proc. of IEEE International Conference on Big Data, pp 4664-4667, Dec. 2017.
  6. I. S. Oh, "Machine Learning," in Seoul, KOREA: Hanbit, 2017.
  7. Z. Y. I Muhammad, "SUPERVISED MACHINE LEARNING APPROACHES: A SURVEY," ICTACT Journal on Soft Computing, vol. 5, pp. 946-952, May. 2015. https://doi.org/10.21917/ijsc.2015.0133
  8. H Paulheim, R Meusel, "A decomposition of the outlier detection problem into a set of supervised learning problems," Machine Learning, vol. 100, pp 509-531, Jun. 2015. https://doi.org/10.1007/s10994-015-5507-y
  9. B. P. B. S. HP Vinutha, "Detection of Outliers Using Interquartile Range Technique from Intrusion Dataset," Information and Decision Sciences, vol. 701, pp 511-518, Apr. 2018. https://doi.org/10.1007/978-981-10-7563-6_53
  10. R. F. P. G. AI Karoly, "Unsupervised clustering for deep learning: A tutorial survey," Acta Polytechnica Hungarica, vol. 15, pp 29-53, Aug. 2018.
  11. S. Y. Jang, H. J. Yoon, N. S. Park, "Research Trends on Deep Reinforcement Learning," ETRI, vol. 34, Issue 4, pp 1-14, Aug. 2019.
  12. M. Abadi, "TensorFlow: learning functions at scale," in Proc. of ICFP 2016: Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, vol. 51, pp 1-1, Sep. 2016.
  13. G. V. A. G. F Pedregosa, "Scikit-learn: Machine learning in Python," Journal of Machine Learning Research, 12, 2825-2830, Oct. 2011.
  14. E. S. J. D. Yangqing Jia, "Caffe: Convolutional Architecture for Fast Feature Embedding," in Proc. of the 22nd ACM international conference on Multimedia, pp. 675-678, Nov. 2014.
  15. A. A. Frank Seide, "CNTK: Microsoft's Open-Source Deep-Learning Toolkit," in Proc. of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 2135, Aug. 2016.
  16. Z. Y., I Muhammad, "Supervised Machine Learning Approaches: A Survey," ICTACT Journal on Soft Computing, vol. 5, Issue. 03, pp 946-952, Apr. 2015. https://doi.org/10.21917/ijsc.2015.0133
  17. J. H. Kim, N. Aziz, "An Enhanced DBSCAN Algorithm to Consider Various Density Distributions for Educational Data," KACE, vol. 22, pp 41-44, Jan. 2018.
  18. BRUNDAGE, Miles, et al, "The malicious use of artificial intelligence: Forecasting, prevention, and mitigation," arXiv preprint arXiv:1802.07228, Feb. 2018.
  19. Malwarebytes Labs, 2020 State of Malware, 2020, [Online] Available: https://www.malwarebytes.com/resources/files/2020/02/2020_state-of-malware-report-1.pdf
  20. KISA, "KISA Cyber Security Issue Report : Q3 2020," pp 1-54, Oct. 2020.
  21. S. W. LEE, J. Y. PARK, S. W. LEE, "Low resolution face recognition based on support vector data description," Pattern Recognition, vol. 39, Issue. 9, pp. 1809-1812, Sep. 2006. https://doi.org/10.1016/j.patcog.2006.04.033
  22. NATARAJ Lakshmanan, MANJUNATH, B. S, "SPAM: Signal processing to analyze malware [applications corner]," IEEE Signal Processing Magazine, vol. 33, no. 2, pp 105-117, Mar. 2016. https://doi.org/10.1109/MSP.2015.2507185
  23. "scikit-learn.org," [Online]. Available: https://scikitlearn.org/stable/auto_examples/cluster/plot_kmeans_digits.html.
  24. A. Sharma, "Advances in Computational Imaging: Theory, Algorithms, and Systems," Mathematical Problems in Engineering, vol. 2017, pp 9, Feb. 2017.
  25. C. Shorten, T.M. Khoshgoftaar, "A survey on Image Data Augmentation for Deep Learning," J Big Data, 6, no. 60, pp 1-48, Jul. 2019. https://doi.org/10.1186/s40537-018-0162-3
  26. M. Kalash, M. Rochan, N. Mohammed, N. D. B. Bruce, Y. Wang and F. Iqbal, "Malware Classification with Deep Convolutional Neural Networks," in Proc. of 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pp 1-5, Feb. 2018.
  27. J. Zhang, Z. Qin, H. Yin, L. Ou and Y. Hu, "IRMD: Malware Variant Detection Using Opcode Image Recognition," in Proc. of 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS), pp. 1175-1180, Dec. 2016.