DOI QR코드

DOI QR Code

Detection of an Open-Source Software Module based on Function-level Features

함수 수준 특징정보 기반의 오픈소스 소프트웨어 모듈 탐지

  • Received : 2015.01.12
  • Accepted : 2015.03.02
  • Published : 2015.06.15

Abstract

As open-source software (OSS) becomes more widely used, many users breach the terms in the license agreement of OSS, or reuse a vulnerable OSS module. Therefore, a technique needs to be developed for investigating if a binary program includes an OSS module. In this paper, we propose an efficient technique to detect a particular OSS module in an executable program using its function-level features. The conventional methods are inappropriate for determining whether a module is contained in a specific program because they usually measure the similarity between whole programs. Our technique determines whether an executable program contains a certain OSS module by extracting features such as its function-level instructions, control flow graph, and the structural attributes of a function from both the program and the module, and comparing the similarity of features. In order to demonstrate the efficiency of the proposed technique, we evaluate it in terms of the size of features, detection accuracy, execution overhead, and resilience to compiler optimizations.

OSS(Open-Source Software)의 사용 증가와 함께 라이선스 위반, 취약한 소스코드 재사용 등에 의한 분쟁 및 피해가 빈번해지고 있다. 이에, 실행파일(바이너리) 수준에서 프로그램에 OSS 모듈이 포함되었는지 여부를 확인하는 기술이 필요해졌다. 본 논문에서는 바이너리에서 함수 수준의 특징정보를 사용하여 OSS 모듈을 탐지하는 기법을 제안한다. 기존 소프트웨어 특징정보(버스마크) 기반 도용 탐지 기법들은 프로그램 전체 간 유사성을 비교하기 때문에 프로그램의 일부로 포함된 OSS 모듈들을 탐지하는데 부적합하다. 본 논문에서는, 함수 수준의 실행명령어, 제어 흐름 그래프(Control Flow Graph)와 개선된 함수 수준 구조적 특징정보를 추출하고 유사성을 비교하여 OSS 모듈의 임의 사용 여부를 탐지한다. 제안기법의 효율성과 각 특징정보들의 OSS 탐지 성능을 평가하기 위해, 특징정보량, OSS 모듈 탐지 시간 및 정확도, 컴파일러 최적화에 대한 강인성을 실험하였다.

Keywords

Acknowledgement

Supported by : 한국저작권위원회, 정보통신기술진흥센터

References

  1. P. Marshall, "Webinar: Open source component governance and management using cobit 5," Information Systems Audit and Control Association (ISACA), Jun. 2012. Available: http://www.isaca.org/Education/Online-Learning/Pages/webinars.aspx (downloaded 2014, Oct.)
  2. Gartner, "Predicts 2011: Open-Source Software, the Power Behind the Throne," Nov. 2010.
  3. M. Lee, "Free Software Foundation Files Suit Against Cisco For GPL Violations," Free Software Foundation, Dec. 2008.
  4. J. Garrison, "BusyBox Developers Agree To End GPL Lawsuit Against Verizon," Software Freedom Law Center, Mar. 2008.
  5. Protex. http://www.blackducksoftware.com/products/black-duck-suite/protex, last viewed Oct. 2014.
  6. Fossology. http://www.fossology.org, last viewed Oct. 2014.
  7. Binary Analysis Tool (BAT) website - http://www.binaryanalysis.org/en/home, last viewed Oct. 2014.
  8. G. Myles and C. Collberg, "k-gram Based Software Birthmarks," Proc. of the 2005 ACM Symposium on Applied Computing (SAC), pp. 314-318, Mar. 2005.
  9. S. Choi, H. Park, H. Lim, and T. Han, "A Static Birthmark of Binary Executables Based on API Call Structure," Proc. of the 12th ASIAN 2007, pp. 2-16. Dec. 2007.
  10. H. Park, S. Choi, S. Seo, and T. Han, "Analyzing Differences of Binary Executable files using Program Structure and Constant Values," Journal of KIISE : Software and Applications, Vol. 35, No. 7, pp. 452-461, Jul. 2008. (in Korean)
  11. H. Tamada, K. Okamoto, M. Nakamura, A. Monden, and K. Matsumoto, "Dynamic Software Birthmarks to Detect the Theft of Windows Applications," Proc. of the International Symposium on Future Software Technology 2004 (ISFST 2004), Oct. 2004.
  12. D. Schuler, and V. Dallmeier, "Detecting Software Theft with API Call Sequence Sets," Proc. of the 8th Workshop Software Reengineering, May 2006.
  13. G. Myles, and C. Collberg, "Detecting Software Theft via Whole Program Path Birthmarks," Proc. of the Information Security Conference, Sep. 2004.
  14. X. Zhou, X. Sun, G. Sun, and Y. Yang, "A Combined Static and Dynamic Software Birthmark Based on Component Dependence Graph," Proc. of the 4th Int'l Conf. on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), pp. 1416-1421, Aug. 2008.
  15. J. Choi and S. Cho, "Open Source Software Detection based on Opcode k-gram at Binary Code Level," Journal of KIISE : Computer Systems and Theory, Vol. 41, No. 1, pp. 23-32, Feb. 2014. (in Korean)
  16. D. Kim, S. Cho, S. Han, M. Park, and I. You, "Open Source Software Detection using Function-level Static Software Birthmark," Journal of Internet Services and Information Security (JISIS), Vol. 4, No. 4, pp. 25-37, Nov. 2014. https://doi.org/10.22667/JISIS.2014.11.31.025
  17. J. Jang, "Scaling Software Security Analysis to Millions of Malicious Programs and Billions of Lines of Code," Ph.D. Thesis, Carnegie Mellon University Pittsburgh, PA, 2013.
  18. S. Choi, W. Cho, and T. Han, "A Functional Unit Dynamic API Birthmark for Windows Programs Code Theft Detection," Journal of KIISE : Software and Applications, Vol. 36, No. 9, pp. 767-776, Sep. 2009. (in Korean)
  19. M. Gheorghescu, "An Automated Virus Classification System," Virus Bulletin Conference, pp. 294-300, 2005.
  20. J. Kinable and O. Kostakis, "Malware Classification based on Call Graph Clustering," Journal in Computer Virology, Vol. 7, No. 4, pp. 233-245, 2011. https://doi.org/10.1007/s11416-011-0151-y
  21. C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna, "Polymorphic Worm Detection using Structural Information of Executables," Proc. of the 8th international conference on Recent Advances in Intrusion Detection (RAID '05), pp. 207-226, 2006.
  22. D. Bruschi, L. Martignoni, and M. Monga, "Detecting Self-mutating Malware using Control-Flow Graph Matching," Proc. of the Third international conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA'06), pp. 129-143, 2006.
  23. B. Anderson, D. Quist, J. Neil, C. Storlie, and T. Lane, "Graph-based Malware Detection using Dynamic Analysis," Journal of Computer Virology, Vol. 7, No. 4, pp. 247-258, Nov. 2011. https://doi.org/10.1007/s11416-011-0152-x
  24. Z. Zhao, "A Virus Detection Scheme based on Features of Control Flow Graph," Proc. of 2011 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC), pp. 943-947, 2011.
  25. S. Cesare and Y. Xiang, "Malware Variant Detection using Similarity Search over Sets of Control Flow Graphs," Proc. of 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 181-189, 2011.
  26. X. Hu, T.-c. Chiueh, and K. G. Shin, "Large-scale Malware Indexing using Function-call Graphs," Proc. of the 16th ACM conference on Computer and communications security (CCS '09), pp. 611-620, 2009.
  27. P.P.F. Chan and C. Collberg, "A Method to Evaluate CFG Comparison Algorithms," Proc. of 2014 14th International Conference on Quality Software, pp. 95-104, Oct. 2014.
  28. H. W. Kuhn, "The Hungarian Method for the Assignment Problem," 50 Years of Integer Programming 1958-2008, pp. 29-47. 2010.