• Title/Summary/Keyword: Source Code Similarity

Search Result 47, Processing Time 0.023 seconds

A Program Similarity Evaluation Algorithm (프로그램 유사도 평가 알고리즘)

  • Kim Young-Chul;Hwang Seog-Chan;Choi Jaeyoung
    • Journal of Internet Computing and Services
    • /
    • v.6 no.1
    • /
    • pp.51-64
    • /
    • 2005
  • In this paper, we introduce a system for evaluating similarity of C program source code using method which compares syntax-trees each others. This method supposes two characteristic features as against other systems. It is not sensitive for program style such as indentation, white space, and comments, and changing order of control structure like sentences, code block, procedures, and so on. Another is that it can detect a syntax-error cause of using paring technique, We introduce algorithms for similarity evaluation method and grouping method that reduces the number of comparison, In the examination section, we show a test result of program similarity evaluation and its reduced iteration by grouping algorithm.

  • PDF

Cross-Language Clone Detection based on Common Token (공통 토큰에 기반한 서로 다른 언어의 유사성 검사)

  • Hong, Sung-Moon;Kim, Hyunha;Lee, Jaehyung;Park, Sungwoo;Mo, Ji-Hwan;Doh, Kyung-Goo
    • Journal of Software Assessment and Valuation
    • /
    • v.14 no.2
    • /
    • pp.35-44
    • /
    • 2018
  • Tools for detecting cross-language clones usually compare abstract-syntax-tree representations of source code, which lacks scalability. In order to compare large source code to a practical level, we need a similarity checking technique that works on a token level basis. In this paper, we define common tokens that represent all tokens commonly used in programming languages of different paradigms. Each source code of different language is then transformed into the list of common tokens that are compared. Experimental results using exEyes show that our proposed method using common tokens is effective in detecting cross-language clones.

Domain Analysis of Device Drivers Using Code Clone Detection Method

  • Ma, Yu-Seung;Woo, Duk-Kyun
    • ETRI Journal
    • /
    • v.30 no.3
    • /
    • pp.394-402
    • /
    • 2008
  • Domain analysis is the process of analyzing related software systems in a domain to find their common and variable parts. In the case of device drivers, they are highly suitable for domain analysis because device drivers of the same domain are implemented similarly for each device and each system that they support. Considering this characteristic, this paper introduces a new approach to the domain analysis of device drivers. Our method uses a code clone detection technique to extract similarity among device drivers of the same domain. To examine the applicability of our method, we investigated whole device drivers of a Linux source. Results showed that many reusable similar codes can be discerned by the code clone detection method. We also investigated if our method is applicable to other kernel sources. However, the results show that the code clone detection method is not useful for the domain analysis of all kernel sources. That is, the applicability of the code clone detection method to domain analysis is a peculiar feature of device drivers.

  • PDF

A Method for Efficient Malicious Code Detection based on the Conceptual Graphs (개념 그래프 기반의 효율적인 악성 코드 탐지 기법)

  • Kim Sung-Suk;Choi Jun-Ho;Bae Young-Geon;Kim Pan-Koo
    • The KIPS Transactions:PartC
    • /
    • v.13C no.1 s.104
    • /
    • pp.45-54
    • /
    • 2006
  • Nowadays, a lot of techniques have been applied for the detection of malicious behavior. However, the current techniques taken into practice are facing with the challenge of much variations of the original malicious behavior, and it is impossible to respond the new forms of behavior appropriately and timely. There are also some limitations can not be solved, such as the error affirmation (positive false) and mistaken obliquity (negative false). With the questions above, we suggest a new method here to improve the current situation. To detect the malicious code, we put forward dealing with the basic source code units through the conceptual graph. Basically, we use conceptual graph to define malicious behavior, and then we are able to compare the similarity relations of the malicious behavior by testing the formalized values which generated by the predefined graphs in the code. In this paper, we show how to make a conceptual graph and propose an efficient method for similarity measure to discern the malicious behavior. As a result of our experiment, we can get more efficient detection rate.

A Study on Similarity Analysis of SNMP MIB File (SNMP MIB 파일의 유사도 분석에 관한 연구)

  • Chun, Byung-Tae
    • Journal of Software Assessment and Valuation
    • /
    • v.15 no.1
    • /
    • pp.37-42
    • /
    • 2019
  • Many similarity analysis methods, one of the dispute resolution methods for computer programs, have been studied. This paper is about quantitative similarity analysis of MIB (Management Information Base) file. Quantitative similarity means that the source codes of two computers are analyzed and the results are compared with a certain standard. The source code to analyze is a program that provides network device management functions such as configuration management, fault management, and performance management using SNMP protocol for WiMAX CPE devices. Here, WiMAX refers to the IEEE 802.16 wireless network standard protocol and can be classified into fixed WiMAX and mobile WiMAX. WiMAX CPE is a wireless Internet terminal that is fixedly used in a customer's home or office. In this paper, we analyze the similarity between MIB file of company A and company B. We will analyze whether the MIB file leaked from the damaged company is not just a list to describe the product specifications, but whether the property value can be recognized.

Measuring Similarity of Android Applications Using Method Reference Frequency and Manifest Information (메소드 참조 빈도와 매니페스트 정보를 이용한 안드로이드 애플리케이션들의 유사도 측정)

  • Kim, Gyoosik;Hamedani, Masoud Reyhani;Cho, Seong-je;Kim, Seong Baeg
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.3
    • /
    • pp.15-25
    • /
    • 2017
  • As the value and importance of softwares are growing up, software theft and piracy become a much larger problem. To tackle this problem, it is highly required to provide an accurate method for detecting software theft and piracy. Especially, while software theft is relatively easy in the case of Android applications (apps), screening illegal apps has not been properly performed in Android markets. In this paper, we propose a method to effectively measure the similarity between Android apps for detecting software theft at the executable file level. Our proposed method extracts method reference frequency and manifest information through static analysis of executable Android apps as the main features for similarity measurement. Each app is represented as an n-dimensional vectors with the features, and then cosine similarity is utilized as the similarity measure. We demonstrate the effectiveness of our proposed method by evaluating its accuracy in comparison with typical source code-based similarity measurement methods. As a result of the experiments for the Android apps whose source file and executable file are available side by side, we found that our similarity degree measured at the executable file level is almost equivalent to the existing well-known similarity degree measured at the source file level.

A Study on the Comparison of Similarity between Master Manuals of Appraisal Program (감정대상 프로그램의 마스터 매뉴얼 유사성 비교에 관한 연구)

  • Chun, Byung-Tae;Lee, Chang-Hoon
    • Journal of Software Assessment and Valuation
    • /
    • v.15 no.2
    • /
    • pp.1-7
    • /
    • 2019
  • Program similarity analysis consists of substantial similarity and access. Substantial similarity is a judgment of how similarly the program source code is quantitatively. Access determines the degree of similarity by analyzing comments in the program or other contextual evidence. In the case of manuals, it may be the subject of legitimacy analysis. Manuals can be divided into three types as follows. First, a master manual is a document created during the development stage of a product. It is a user manual that contains all the functionality of the product and its derivatives. Second, the customer manual is a manual that is open only to the primary customer and orderer. Third, the user manual is a document that is applied to the final OEM production stage and is open to the end purchaser. In this paper, we compare the master manual seized from the suspect and the master manual provided by the suspect on the Internet. It then determines how similar this master manual is and includes the victim company's original and property values.

Software Similarity Detection Using Highly Credible Dynamic API Sequences (신뢰성 높은 동적 API 시퀀스를 이용한 소프트웨어 유사성 검사)

  • Park, Seongsoo;Han, Hwansoo
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1067-1072
    • /
    • 2016
  • Software birthmarks, which are unique characteristics of the software, are used to detect software plagiarism or software similarity. Generally, software birthmarks are divided into static birthmarks or dynamic birthmarks, which have evident pros and cons depending on the extraction method. In this paper, we propose a method for extracting the API sequence birthmarks using a dynamic analysis and similarity detection between the executable codes. Dynamic birthmarks based on API sequences extract API functions during the execution of programs. The extracted API sequences often include all the API functions called from the start to the end of the program. Meanwhile, our dynamic birthmark scheme extracts the API functions only called directly from the executable code. Then, it uses a sequence alignment algorithm to calculate the similarity metric effectively. We evaluate the birthmark with several open source software programs to verify its reliability and credibility. Our dynamic birthmark scheme based on the extracted API sequence can be utilized in a similarity test of executable codes.

Hierarchical Clustering Methodology for Source Code Plagiarism Detection (계층적 군집화 기법을 이용한 소스 코드 표절 검사)

  • Sohn, Ki-Rack;Moon, Seung-Mi
    • Journal of The Korean Association of Information Education
    • /
    • v.11 no.1
    • /
    • pp.91-98
    • /
    • 2007
  • Plagiarism is a serious problem in school education due to current technologies such as the internet and word processors. This paper presents how to detect source code plagiarism using similarity based on string comparison methods. The main contribution is to use hierarchical agglomerative clustering technique to classify plagiarism groups, which are then visualized as a dendrogram. Graders can set an empirical threshold to the dendrogram to navigate plagiarism groups. We evaluated the performance of the presented method with a real world data. The result showed the usefulness and applicability of this method.

  • PDF

A Study on Analysis of Source Code for Program Protection in ICT Environment (ICT 환경에서 프로그램보호를 위한 소스코드 분석 사례 연구)

  • Lee, Seong-Hoon;Lee, Dong-Woo
    • Journal of Convergence for Information Technology
    • /
    • v.7 no.4
    • /
    • pp.69-74
    • /
    • 2017
  • ICT(Information Communication Technology) is a key word in our society on today. Various support programs by the government have given many quantitative and qualitative changes to the software industries. Software is instructions(Computer Program) and data structure. Software can be divided into Application program and System program. Application programs have been developed to perform special functions or provide entertainment functions. Because of this rapid growth of software industries, one of the problems is issue on copyright of program. In this paper, we described an analysis method for program similarity based on source code in program.