DOI QR코드

DOI QR Code

MalEXLNet:A semantic analysis and detection method of malware API sequence based on EXLNet model

  • Xuedong Mao (School of information science and engineering, Shenyang Ligong University) ;
  • Yuntao Zhao (School of information science and engineering, Shenyang Ligong University) ;
  • Yongxin Feng (Graduate School, Shenyang Ligong University) ;
  • Yutao Hu (School of information science and engineering, Shenyang Ligong University)
  • Received : 2024.07.10
  • Accepted : 2024.08.31
  • Published : 2024.10.31

Abstract

With the continuous advancements in malicious code polymorphism and obfuscation techniques, the performance of traditional machine learning-based detection methods for malware variant detection has gradually declined. Additionally, conventional pre-trained models could adequately capture the contextual semantic information of malicious code and appropriately represent polysemous words. To enhance the efficiency of malware variant detection, this paper proposes the MalEXLNet intelligent semantic analysis and detection architecture for malware. This architecture leverages malware API call sequences and employs an improved pre-training model for semantic vector representation, effectively utilizing the semantic information of API call sequences. It constructs a hybrid deep learning model, CBAM+AttentionBiLSTM,CBAM+AttentionBiLSTM, for training and classification prediction. Furthermore, incorporating the KMeansSMOTE algorithm achieves balanced processing of small sample data, ensuring the model maintains robust performance in detecting malicious variants from rare malware families. Comparative experiments on generalized datasets, Ember and Catak, the results show that the proposed MalEXLNet architecture achieves excellent performance in malware classification and detection tasks, with accuracies of 98.85% and 94.46% in the two datasets, and macro-averaged and micro-averaged metrics exceeding 98% and 92%, respectively.

Keywords

References

  1. D. Gibert, C. Mateu, and J. Planes, "The rise of machine learning for detection and classification of malware: Research developments, trends and challenges," Journal of Network and Computer Applications, vol.153, Mar. 2020.
  2. A. A. Al-Hashmi, F. A. Ghaleb, A. Al-Marghilani, A. E. Yahya, S. A. Ebad, M. Saqib, and A. A. Darem, "Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model," IEEE Access, vol.10, pp.42762-42777, Apr. 2022. https://doi.org/10.1109/ACCESS.2022.3168794
  3. Malware Statistics and Trends Report, AV-TEST, Magdeburg, Germany, Mar. 4, 2024. [Online]. Available: https://portal.av-atlas.org/malware/statistics
  4. 2024 SonicWall Network Threat Report, SonicWall, Silicon Valley, California, USA, Jul. 25, 2024. [Oline]. Available: https://www.sonicwall.com/zh-cn/threat-report
  5. Kaspersky Security Bulletion 2023, Kaspersky, Moscow, Russia, Nov. 2023.
  6. D. Ucci, L. Aniello, and R. Baldoni, "Survey of machine learning techniques for malware analysis," Computers & Security, vol.81, pp.123-147, Mar. 2019. https://doi.org/10.1016/j.cose.2018.11.001
  7. F. O. Catak, A. F. Yazi, O. Elezaj, and J. Ahmed, "Deep learning based Sequential model for malware analysis using Windows exe API Calls," PeerJ Computer Science, vol.6, Jul. 2020.
  8. Z. Zhang, P. Qi, and W. Wang, "Dynamic Malware Analysis with Feature Engineering and Feature Learning," in Proc. of the AAAI Conference on Artificial Intelligence, vol.34, no.01, pp.1210-1217, Apr. 2020.
  9. E. Amer, I. Zelinka, "A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence," Computers & Security, vol.92, May. 2020.
  10. G. Munjal, B. Paul, and M. Kumar, "Application of Artificial Intelligence in Cybersecurity," Improving Security, Privacy, and Trust in Cloud Computing, pp.127-146, 2024.
  11. U. Tayyab, F. B. Khan, M. H. Durad, A. Khan, and Y. S. Lee, "A Survey of the Recent Trends in Deep Learning Based Malware Detection," Journal of Cybersecurity and Privacy, vol.2, no.4, pp.800-829, 2022. https://doi.org/10.3390/jcp2040041
  12. W. Qiang, L. Yang, and H. Jin, "Efficient and Robust Malware Detection Based on Control Flow Traces Using Deep Neural Networks," Computers & Security, vol.122, Nov. 2022.
  13. I. Rosenberg, A. Shabtai, Y. Elovici, and L. Rokach, "Query-Efficient Black-Box Attack Against Sequence-Based Malware Classifiers," in Proc. of ACSAC '20: Proceedings of the 36th Annual Computer Security Applications Conference, pp.611-626, Austin, USA, Dec. 2020.
  14. S. Aggarwal, "Malware Classification using API Call Information and Word Embeddings," Master's Projects, Ph.D. dissertation, Department of Computer Science, San Jose State University, 2023.
  15. S. Zhang, J. Wu, M. Zhang, and W. Yang, "Dynamic Malware Analysis Based on API Sequence Semantic Fusion," Applied Sciences, vol.13, no.11, 2023.
  16. Q. Wang, and Q. Qian, "Malicious code classification based on opcode sequences and textCNN network," Journal of Information Security and Applications, vol.67, 2022.
  17. J. Kang, S. Jang, S. Li, Y. Jeong, and Y. Sung, "Long short-term memory-based Malware classification method for information security," Computers & Electrical Engineering, vol.77, pp.366-375, 2019. https://doi.org/10.1016/j.compeleceng.2019.06.014
  18. J. Liu, Y. Zhao, Y. Feng, Y. Hu, and X. Ma, "Semalbert: Semantic-based malware detection with bidirectional encoder representations from transformers," Journal of Information Security and Applications, vol.80, 2024.
  19. Y. Hua, Y. Du and D. He, "Classifying Packed Malware Represented as Control Flow Graphs using Deep Graph Convolutional Neural Network," in Proc. of 2020 International Conference on Computer Engineering and Application (ICCEA), pp.254-258, 2020.
  20. Y. Zhang, S. Yang, L. Xu, X. Li, and D. Zhao, "A Malware Detection Framework Based on Semantic Information of Behavioral Features," Applied Sciences, vol.13, no.22, 2023.
  21. D. Zhao, H. Wang, L. Kou, Z. Li, and J. Zhang, "Dynamic Malware Detection Using ParameterAugmented Semantic Chain," Electronics, vol.12, no.24, 2023.
  22. P. Maniriho, A. N. Mahmood and M. J. M. Chowdhury, "API-MalDetect: Automated malware detection framework for windows based on API calls and deep learning techniques," Journal of Network and Computer Applications, vol.218, 2023.
  23. S. Aggarwal, and F. D. Troia, "Malware Classification Using Dynamically Extracted API Call Embeddings," Applied Sciences, vol.14, no.13, 2024.
  24. Γιαπαντζής, "XLCNN: pre-trained transformer model for malware detection," M.S. thesis, 2024.
  25. H. S. Anderson and P. Roth, "EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models," arXiv :1804.04637, Apr. 2018.
  26. L. Akritidis, and P. Bozanis, "A clustering-based resampling technique with cluster structure analysis for software defect detection in imbalanced datasets," Information Sciences, vol.674, Jul. 2024.
  27. P. Mahalakshmi, V. Mahalakshmi, E.S. Vinothkumar, B. Senthilkumar, M. Dinesh, and R. Krishnaprasanna, "A Real-Time Spam Identification Scheme Over Social Networking Environment Using Deep Learning Principles," in Proc. of 2023 3rd International Conference on Mobile Networks and Wireless Communications (ICMNWC), pp.1-6, 2023.
  28. R. Han, K. Kim, B. Choi, and Y. Jeong, "A Study on Detection of Malicious Behavior Based on Host Process Data Using Machine Learning," Applied Sciences, vol.13, no.7, 2023.
  29. Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, Q. V. Le, "XLNet: Generalized Autoregressive Pretraining for Language Understanding," in Proc. of 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 2019.
  30. M. Ahmed, R. Seraj and S. M. S. Islam, "The k-means Algorithm: A Comprehensive Survey and Performance Evaluation," Electronics, vol.9, no.8, Aug. 2020.
  31. L. V. Maaten and G. Hinton, "Visualizing Data using t-SNE," Journal of Machine Learning Research, vol.9, no.11, pp.2579-2605, Nov. 2008.
  32. T. Kudo and J. Richardson, "Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing," arXiv:1808.06226, Aug. 2018.
  33. Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V. Le and R. Salakhutdinov, "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context," arXiv:1901.02860, Jun. 2019.