Comparison of Code Similarity Analysis Performance of funcGNN and Siamese Network

funcGNN과 Siamese Network의 코드 유사성 분석 성능비교

  • Choi, Dong-Bin (Dept. of Computer Science, Dankook University) ;
  • Jo, In-su (Dept. of Computer Science, Dankook University) ;
  • Park, Young B. (Dept. of Software Science, Dankook University)
  • 최동빈 (단국대학교 컴퓨터학과) ;
  • 조인수 (단국대학교 컴퓨터학과) ;
  • 박용범 (단국대학교 소프트웨어학과)
  • Received : 2021.09.02
  • Accepted : 2021.09.16
  • Published : 2021.09.30

Abstract

As artificial intelligence technologies, including deep learning, develop, these technologies are being introduced to code similarity analysis. In the traditional analysis method of calculating the graph edit distance (GED) after converting the source code into a control flow graph (CFG), there are studies that calculate the GED through a trained graph neural network (GNN) with the converted CFG, Methods for analyzing code similarity through CNN by imaging CFG are also being studied. In this paper, to determine which approach will be effective and efficient in researching code similarity analysis methods using artificial intelligence in the future, code similarity is measured through funcGNN, which measures code similarity using GNN, and Siamese Network, which is an image similarity analysis model. The accuracy was compared and analyzed. As a result of the analysis, the error rate (0.0458) of the Siamese network was bigger than that of the funcGNN (0.0362).

Keywords

Acknowledgement

본 연구는 과학기술정보통신부 및 정보통신기획평가원의 대학ICT육성지원사업의 연구결과로 수행되었음(IITP-2020-2017-0-01628).

References

  1. Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. "Learning to represent programs with graphs." arXiv preprint arXiv:1711.00740, 2017.
  2. A. Nair, A. Roy, and K. Meinke, "funcgnn: A graph neural network approach to program similarity," in Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2020, pp. 1-11
  3. Zeping Yu, Rui Cao, Qiyi Tang, Sen Nie, Junzhou Huang, and Shi Wu. "Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection." In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, 1145-1152. 2020.
  4. Song-Yeon Lee, Yong Jeon Huh, "A Comparative Study on Deep Learning Models for Scaffold Defect Detection", Journal of the Semiconductor & Display Technology, Vol. 20, No. 2. June 2021.
  5. Seung Cheol Kim, Ho Jeong Jeon and Sang Jeen Hong, "Ball Grid Array Solder Void Inspection Using Mask RCNN", Journal of the Semiconductor & Display Technology, Vol. 20, No. 2. June 2021.
  6. F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner and G. Monfardini, "The Graph Neural Network Model," in IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61-80, Jan. 2009 https://doi.org/10.1109/TNN.2008.2005605
  7. Kipf, T. N., and Welling, M. "Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907. 2016.
  8. William L. Hamilton, Rex Ying and Jure Leskovec, "Inductive Representation Learning on Large Graphs", arXiv : 1706.02216, 2017
  9. Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, Yoshua Bengio, "Graph Attention Networks", arXiv:1710.10903, 2017
  10. Sanfeliu, Alberto; Fu, King-Sun (1983). "A distance measure between attributed relational graphs for pattern recognition". IEEE Transactions on Systems, Man and Cybernetics. 13(3): 353-363. doi:10.1109/TSMC.1983.6313167.
  11. G Koch, R Zemel, and R Salakhutdinov. "Siamese neural networks for one-shot image recognition". In ICML Deep Learning workshop, 2015.
  12. S. Chopra, R. Hadsell and Y. LeCun, "Learning a similarity metric discriminatively, with application to face verification," 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2005, pp. 539-546 vol. 1, doi: 10.1109/CVPR.2005.202.
  13. Aravind Nair, Karl Meinke, and Sigrid Eldh. "Leveraging mutants for automatic prediction of metamorphic relations using machine learning." In Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation. 1-6. 2019.