• Title/Summary/Keyword: Source Code Similarity

Search Result 47, Processing Time 0.027 seconds

Research on the Classification Model of Similarity Malware using Fuzzy Hash (퍼지해시를 이용한 유사 악성코드 분류모델에 관한 연구)

  • Park, Changwook;Chung, Hyunji;Seo, Kwangseok;Lee, Sangjin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.22 no.6
    • /
    • pp.1325-1336
    • /
    • 2012
  • In the past about 10 different kinds of malicious code were found in one day on the average. However, the number of malicious codes that are found has rapidly increased reachingover 55,000 during the last 10 year. A large number of malicious codes, however, are not new kinds of malicious codes but most of them are new variants of the existing malicious codes as same functions are newly added into the existing malicious codes, or the existing malicious codes are modified to evade anti-virus detection. To deal with a lot of malicious codes including new malicious codes and variants of the existing malicious codes, we need to compare the malicious codes in the past and the similarity and classify the new malicious codes and the variants of the existing malicious codes. A former calculation method of the similarity on the existing malicious codes compare external factors of IPs, URLs, API, Strings, etc or source code levels. The former calculation method of the similarity takes time due to the number of malicious codes and comparable factors on the increase, and it leads to employing fuzzy hashing to reduce the amount of calculation. The existing fuzzy hashing, however, has some limitations, and it causes come problems to the former calculation of the similarity. Therefore, this research paper has suggested a new comparison method for malicious codes to improve performance of the calculation of the similarity using fuzzy hashing and also a classification method employing the new comparison method.

Learning Source Code Context with Feature-Wise Linear Modulation to Support Online Judge System (온라인 저지 시스템 지원을 위한 Feature-Wise Linear Modulation 기반 소스코드 문맥 학습 모델 설계)

  • Hyun, Kyeong-Seok;Choi, Woosung;Chung, Jaehwa
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.11
    • /
    • pp.473-478
    • /
    • 2022
  • Evaluation learning based on code testing is becoming a popular solution in programming education via Online judge(OJ). In the recent past, many papers have been published on how to detect plagiarism through source code similarity analysis to support OJ. However, deep learning-based research to support automated tutoring is insufficient. In this paper, we propose Input & Output side FiLM models to predict whether the input code will pass or fail. By applying Feature-wise Linear Modulation(FiLM) technique to GRU, our model can learn combined information of Java byte codes and problem information that it tries to solve. On experimental design, a balanced sampling technique was applied to evenly distribute the data due to the occurrence of asymmetry in data collected by OJ. Among the proposed models, the Input Side FiLM model showed the highest performance of 73.63%. Based on result, it has been shown that students can check whether their codes will pass or fail before receiving the OJ evaluation which could provide basic feedback for improvements.

Applying Genomic Sequence Alignment Methodology for Source Codes Plagiarism Detection (유전체 서열의 정렬 기법을 이용한 소스 코드 표절 검사)

  • 강은미;황미녕;조환규
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.3
    • /
    • pp.352-367
    • /
    • 2003
  • The syntactic and semantic characteristics of a computer program can be represented by the keywords sequence extracted from the source code. Therefore the similarity and the difference between two programs can be clearly figured out by comparing the keyword sequences obtained from the given programs. Various methods for measuring the similarity of two different sequences have been intensively studied already in bioinformatics on biological genetic sequence manipulation. In this paper, we propose a new method for measuring the similarity of two different programs and detecting the partial plagiarism by exploiting the sequence alignment techniques. In order to evaluate the performance of the proposed method, we experimented with the actual Program codes submitted by 70 students attending a Data Structure course )tow 2001. The experimental results show that the proposed method is more effective and powerful than the fingerprint method which is the most commonly used for the Plagiarism detection.

Appraisal method for Determining Whether to Upgrade Software for Appraisal (감정 대상 소프트웨어의 업그레이드 여부 판정을 위한 감정 방법)

  • Chun, Byung-Tae;Jeong, Younseo
    • Journal of Software Assessment and Valuation
    • /
    • v.16 no.1
    • /
    • pp.13-19
    • /
    • 2020
  • It can be seen that the infringement of copyright cases is increasing as the society becomes more complex and advanced. During the software copyright dispute, there may be a dispute over whether the software is duplicated and made into upgraded software. In this paper, we intend to propose an analysis method for determining whether to upgrade software. For the software upgrade analysis, a software similarity analysis technique was used. The analysis program covers servers, management programs, and Raspberry PC programs. The first analysis confirms the correspondence between program creation information and content. In addition, it analyzes the similarity of functions and screen composition between the submitted program and the program installed in the field. The second comparative analysis compares and analyzes similarities by operating two programs in the same environment. As a result of comparative analysis, it was confirmed that the operation and configuration screens of the two programs were identical. Thus, minor differences were found in a few files, but it was confirmed that the two programs were mostly made using the same or almost similar source code. Therefore, this program can be judged as an upgrade program.

The Study of Similarity Measure on O-Line Game Software (온라인 게임 소프트웨어 복제도 산출기법에 관한 연구)

  • Kim, Jin-Yong;Kim, Jin-Uk
    • Journal of Korea Game Society
    • /
    • v.4 no.1
    • /
    • pp.50-57
    • /
    • 2004
  • The copyright against the game which is successful to a performance, is increasing rapidly. The compute r game is sensitive to a popularity. Game from the hazard which it develops short time, the expense which is cheap, about lower the dispute with a copyright, a reproduction and the ticket paragraph increases with the insufficient back of technical power. It analyzes the quality of the tools comparison it will be able to analyze the source codes from the dissertation which it sees. It analyzed the game source code and against the method which produces the original program and the reproduction degree of reproduction program for it researched. The method which produces a reproduction degree game feature following function shares a module especially. After from each module separating a file in file structure, source pro gram and data structure form, it calculates a similarity measure. It followed in important degree of each function and weight it let and the fixed quantity reproduction degree of full game program it produced.

  • PDF

Design and Implementation of Birthmark Technique for Unity Application

  • Heewan Park
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.7
    • /
    • pp.85-93
    • /
    • 2023
  • Software birthmark refers to a unique feature inherent in software that can be extracted from program binaries even in the absence of the original source code of the program. Like human genetic information, the similarity between programs can be calculated numerically, so it can be used to determine whether software is stolen or copied. In this paper, we propose a new birthmark technique for Android applications developed using Unity. The source codes of Unity-based Android applications use C# language, and since the core logic of the program is included in the DLL module, it must be approached in a different way from normal Android applications. In this paper, a Unity birthmark extraction and comparison system was implemented, and reliability and resilience were evaluated. The use of the Unity birthmark technique proposed in this paper is expected to be effective in preventing illegal copy or code theft of the Unity-based Android applications.

Efficient Similarity Measurement Technique of Windows Software using Dynamic Birthmark based on API (API 기반 동적 버스마크를 이용한 윈도우용 소프트웨어의 효율적인 유사도 측정 기법)

  • Park, Daeshin;Jie, Hyunho;Park, Youngsu;Hong, JiMan
    • Smart Media Journal
    • /
    • v.4 no.2
    • /
    • pp.34-45
    • /
    • 2015
  • The illegal copy of Windows software is one of the problems, because Windows is the most popular operating system in the country. The illegal copy can be infringe a software copyright, and software birthmark is one of solutions which is protecting software copyright. Software birthmark is a technique to distinguish software piracy using feature information from software. The type of software birthmark can be differentiated between static birthmark and dynamic birthmark through an extraction method. Static birthmark and dynamic birthmark have strengths and weaknesses. In this paper, we propose similarity measurement technique using dynamic birthmark based on API, and we explain extraction process of dynamic birthmark. In addition, we have verified that the proposed similarity measurement technique meet resilience and credibility through experiment. Furthermore, we saw that proposed measurement technique better than existing measurement technique.

Automated Detecting and Tracing for Plagiarized Programs using Gumbel Distribution Model (굼벨 분포 모델을 이용한 표절 프로그램 자동 탐색 및 추적)

  • Ji, Jeong-Hoon;Woo, Gyun;Cho, Hwan-Gue
    • The KIPS Transactions:PartA
    • /
    • v.16A no.6
    • /
    • pp.453-462
    • /
    • 2009
  • Studies on software plagiarism detection, prevention and judgement have become widespread due to the growing of interest and importance for the protection and authentication of software intellectual property. Many previous studies focused on comparing all pairs of submitted codes by using attribute counting, token pattern, program parse tree, and similarity measuring algorithm. It is important to provide a clear-cut model for distinguishing plagiarism and collaboration. This paper proposes a source code clustering algorithm using a probability model on extreme value distribution. First, we propose an asymmetric distance measure pdist($P_a$, $P_b$) to measure the similarity of $P_a$ and $P_b$ Then, we construct the Plagiarism Direction Graph (PDG) for a given program set using pdist($P_a$, $P_b$) as edge weights. And, we transform the PDG into a Gumbel Distance Graph (GDG) model, since we found that the pdist($P_a$, $P_b$) score distribution is similar to a well-known Gumbel distribution. Second, we newly define pseudo-plagiarism which is a sort of virtual plagiarism forced by a very strong functional requirement in the specification. We conducted experiments with 18 groups of programs (more than 700 source codes) collected from the ICPC (International Collegiate Programming Contest) and KOI (Korean Olympiad for Informatics) programming contests. The experiments showed that most plagiarized codes could be detected with high sensitivity and that our algorithm successfully separated real plagiarism from pseudo plagiarism.

A Study on the Research Model for the Standardization of Software-Similarity-Appraisal Techniques (소프트웨어 복제도 감정기법의 표준화 모델에 관한 연구)

  • Bahng, Hyo-Keun;Cha, Tae-Own;Chung, Tai-Myoung
    • The KIPS Transactions:PartD
    • /
    • v.13D no.6 s.109
    • /
    • pp.823-832
    • /
    • 2006
  • The Purpose of Similarity(Reproduction) Degree Appraisal is to determine the equality or similarity between two programs and it is a system that presents the technical grounds of judgment which is necessary to support the resolution of software intellectual property rights through expert eyes. The most important things in proceeding software appraisal are not to make too much of expert's own subjective judgment and to acquire the accurate-appraisal results. However, up to now standard research and development for its systematic techniques are not properly made out and as different expert as each one could approach in a thousand different ways, even the techniques for software appraisal types have not exactly been presented yet. Moreover, in the analyzing results of all the appraisal cases finished before, through a practical way, we blow that there are some damages on objectivity and accuracy in some parts of the appraisal results owing to the problems of existing appraisal procedures and techniques or lack of expert's professional knowledge. In this paper we present the model for the standardization of software-similarity-appraisal techniques and objective-evaluation methods for decreasing a tolerance that could make different results according to each expert in the same-evaluation points. Especially, it analyzes and evaluates the techniques from various points of view concerning the standard appraisal process, setting a range of appraisal, setting appraisal domains and items in detail, based on unit processes, setting the weight of each object to be appraised, and the degree of logical and physical similarity, based on effective solutions to practical problems of existing appraisal techniques and their objective and quantitative standardization. Consequently, we believe that the model for the standardization of software-similarity-appraisal techniques will minimizes the possibility of mistakes due to an expert's subjective judgment as well as it will offer a tool for improving objectivity and reliability of the appraisal results.

A Functional Unit Dynamic API Birthmark for Windows Programs Code Theft Detection (Windows 프로그램 도용 탐지를 위한 기능 단위 동적 API 버스마크)

  • Choi, Seok-Woo;Cho, Woo-Young;Han, Tai-Sook
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.9
    • /
    • pp.767-776
    • /
    • 2009
  • A software birthmark is a set of characteristics that are extracted from a program itself to detect code theft. A dynamic API birthmark is extracted from the run-time API call sequences of a program. The dynamic Windows API birthmarks of Tamada et al. are extracted from API call sequences during the startup period of a program. Therefore. the dynamic birthmarks cannot reflect characteristics of main functions of the program. In this paper. we propose a functional unit birthmark(FDAPI) that is defined as API call sequences recorded during the execution of essential functions of a program. To find out that some functional units of a program are copied from an original program. two FDAPIs are extracted by executing the programs with the same input. The FDAPIs are compared using the semi-global alignment algorithm to compute a similarity between two programs. Programs with the same functionality are compared to show credibility of our birthmark. Binary executables that are compiled differently from the same source code are compared to prove resilience of our birthmark. The experimental result shows that our birthmark can detect module theft of software. to which the existing birthmarks of Tamada et al. cannot be applied.