DOI QR코드

DOI QR Code

A Study on Binary Code Similarity Detection

  • Bastien Schoonaert (Dept. of Electrical and Computer Engineering and Inter-University Semiconductor Research Center (ISRC), Seoul National University) ;
  • Hyun-Jun Kim (Dept. of Electrical and Computer Engineering and Inter-University Semiconductor Research Center (ISRC), Seoul National University) ;
  • Yun-Heung Paek (Dept. of Electrical and Computer Engineering and Inter-University Semiconductor Research Center (ISRC), Seoul National University)
  • 바스티안 (서울대학교 전기정보공학부, 반도체 공동연구소) ;
  • 김현준 (서울대학교 전기정보공학부, 반도체 공동연구소) ;
  • 백윤흥 (서울대학교 전기정보공학부, 반도체 공동연구소)
  • Published : 2024.10.31

Abstract

Binary Code Similarity Detection (BCSD) plays a critical role in software security applications like vulnerability detection and malware analysis. This review surveys both traditional and machine-learning-based approaches to BCSD. Traditional methods, such as control flow graph matching and symbolic execution, have demonstrated effectiveness but suffer from scalability issues, particularly with obfuscated code. Modern machine learning techniques, including graph neural networks and deep learning models, offer improved adaptability across architectures and scalability. Despite these advancements, challenges remain in cross-platform detection, handling obfuscation, and deploying BCSD tools in real-world security scenarios. The review highlights recent innovations and outlines potential future directions for enhancing the robustness and efficiency of BCSD systems.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (RS-2023-00277326, 0.1), the Institute of Information & communications Technology Planning & Evaluation (IITP) under the artificial intelligence semiconductor support program to nurture the best talents grant funded by the Korean government (MSIT) (IITP-2023-RS-2023-00256081), the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No.2020-0-01840, Analysis on technique of accessing and acquiring user data in smartphone, 0.5), the BK21 FOUR program of the Education and Research Program for Future ICT Pioneers, Seoul National University in 2024, Inter-University Semiconductor Research Center (ISRC).

References

  1. X. Zhang, W. Sun, J. Pang, F. Liu, and Z. Ma. 2020. "Similarity Metric Method for Binary Basic Blocks of Cross-Instruction Set Architecture". Proceedings 2020 Workshop on Binary Analysis Research.
  2. P. Keller, A. K. Kabore, L. Plein, J. Klein, Y. Le Traon, and T. F. Bissyande. 2021. "What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning". ACM Transactions on Software Engineering and Methodology, 31, 2, Article 31 (April 2022).
  3. L. Yu, Y. Lu, Y. Shen, H. Huang, and K. Zhu. 2021. "BEDetector: A Two-Channel Encoding Method to Detect Vulnerabilities Based on Binary Similarity". In IEEE Access, vol. 9, 51631-51645.
  4. D. Kim, E. Kim, S. Cha, S. Son, and Y. Kim. 2023. "Revisiting Binary Code Similarity Analysis Using Interpretable Feature Engineering and Lessons Learned". IEEE Transactions on Software Engineering, 49, 4 (April 2023), 1661-1682.
  5. Z. Luo, T. Hou, X. Zhou, H. Zeng, and Z. Lu. 2021. "Binary Code Similarity Detection through LSTM and Siamese Neural Network". EAI Endorsed Transactions on Security and Safety, vol. 8, no. 29, p. e1, Sep. 2021.
  6. S. H. H. Ding, B. C. M. Fung and P. Charland. 2019. "Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization". 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 472-489.
  7. S. Ahn, S. Ahn, H. Koo, and Y. Paek. 2022. "Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning". In Proceedings of the 38th Annual Computer Security Applications Conference (ACSAC '22). Association for Computing Machinery, New York, NY, USA, 361-374.
  8. H. Wang, W. Qu, G. Katz, W. Zhu, Z. Gao, H. Qiu, J. Zhuge, and C. Zhang. 2022. "JTrans: jump-aware transformer for binary code similarity detection". In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2022). Association for Computing Machinery, New York, NY, USA, 1-13.
  9. K. Pei, Z. Xuan, S. Jana, and B. Ray. 2020. "Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity".
  10. X. Xu, S. Feng, Y. Ye, G. Shen, Z. Su, S. Cheng, G. Tao, Q. Shi, Z. Zhang, and X. Zhang. 2023. "Improving Binary Code Similarity Transformer Models by Semantics-Driven Instruction Deemphasis". In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2023). Association for Computing Machinery, New York, NY, USA, 1106-1118.
  11. H. Wang, Z. Gao, C. Zhang, M. Sun, Y. Zhou, H. Qiu, and X. Xiao. 2024. "CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection". In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024). Association for Computing Machinery, New York, NY, USA, 149-161.
  12. Z. Luo, P. Wang, B. Wang, Y. Tang, W. Xie, X. Zhou, D. Liu, and K. Lu. 2023. "VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search". In Network and Distributed System Security (NDSS) Symposium 2023, San Diego, CA, USA.