• Title/Summary/Keyword: source code similarity

Search Result 47, Processing Time 0.022 seconds

Similarity Detection for Large Scale Software Using Abstracted Source Code (소스코드 요약을 이용한 대규모 소프트웨어 유사도 평가)

  • Park, Seong-Soo;Han, Hwan-Soo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.39-41
    • /
    • 2012
  • 프로그램 코드의 유사도 측정에 대한 방법은 여러 가지 존재하고 있으며 유사도 측정 프로그램도 많이 존재한다. 이런 프로그램 유사도 측정 도구는 중소규모 소프트웨어 프로젝트에 많이 사용되고 있으나, 실제 대규모 소프트웨어의 유사도 검사를 위해서 사용하기에는 한계가 존재한다. 지금까지 대규모 소프트웨어의 유사도를 측정할 수 있는 객관적 방법이 거의 제시되지 않고 있어, 본 논문에서는 대규모 소프트웨어의 소스코드를 요약하여 서로 다른 프로그램의 유사도를 측정하는 방법을 제시한다.

Source code Plagiarism Detection with Recursive Local Alignments (재귀적 지역정렬을 이용한 프로그램 표절 탐색)

  • 전명재;이평준;조환규
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04a
    • /
    • pp.946-948
    • /
    • 2004
  • 지역정렬(local alignment)과 전체정렬(global alignment)로 대표되는 정렬 문제는 전산학 분야의 전형적인 문제로, 두 서열의 전체적인 또는 부문적인 유사성(similarity)을 찾아 주기 위한 방법이다. 특히 정렬은 두 문자열에서 유사하게 나타나는 유사 서브스트링을 찾아내는 문제라든가 근래의 생물정보학에서 두 DNA시퀀스간의 유사도를 판별하는 문제 등에서 매우 중요란 기법이다. 본 논문에서는 두 서열들을 유사하게 매칭 시켜 주는 기존의 정렬 방법을 응용, 변형하여 C, C++. JAVA등으로 짜여진 프로그램 소스들의 유사도를 측정하는 방법을 제시하였다. 실제로 이런 프로그램 소스의 표절은 대학교육 수업과정 등에서 빈번하게 발생되는 문제점으로서 본 논문에서는 프로그램 소스표절을 검사, 탐지할 수 있는 방법론 및 구체적인 프로그램과 그 결과를 제시하고 있다. 아울러 두 프로그램간의 유사성을 비교하기 위해 기존의 지역정렬 방법을 보다 효율적으로 적절히 변형시키는 방법을 제시하고 있다.

  • PDF

Management of Reliability and Delivery for Software Object Material (소프트웨어 목적물의 전달체계 분석과 신뢰성 검증)

  • Kim, Do-Hyeun;Lee, Kyu-Tae
    • Journal of Software Assessment and Valuation
    • /
    • v.15 no.2
    • /
    • pp.51-57
    • /
    • 2019
  • On increasing illegal software copyright, the need for similarity analysis is now rising. The reliability of object material are becoming important when it's moving from developer to evaluation experts. Object material as a comparison data, is the important data to the evaluation expert which is delivered from agencies such as courts and police stations. The object material is submitted at first to the Copyright Commission and then delivered to the evaluation expert with safe. However, if the similarity result is not satisfied to the both side, they will claim to the reliability of the object material such as source code modification or revision etc. Software objects is produced in a file format and are recognized as being able to be modified. Therefore, the reliability to the object material is studied in various ways, and a forensic is proposed as one method. This study showed the suggestion to keep reliability of the object material through the actual evaluation cases.

Authorship Attribution Framework Using Survival Network Concept : Semantic Features and Tolerances (서바이벌 네트워크 개념을 이용한 저자 식별 프레임워크: 의미론적 특징과 특징 허용 범위)

  • Hwang, Cheol-Hun;Shin, Gun-Yoon;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.6
    • /
    • pp.1013-1021
    • /
    • 2020
  • Malware Authorship Attribution is a research field for identifying malware by comparing the author characteristics of unknown malware with the characteristics of known malware authors. The authorship attribution method using binaries has the advantage that it is easy to collect and analyze targeted malicious codes, but the scope of using features is limited compared to the method using source code. This limitation has the disadvantage that accuracy decreases for a large number of authors. This study proposes a method of 'Defining semantic features from binaries' and 'Defining allowable ranges for redundant features using the concept of survival network' to complement the limitations in the identification of binary authors. The proposed method defines Opcode-based graph features from binary information, and defines the allowable range for selecting unique features for each author using the concept of a survival network. Through this, it was possible to define the feature definition and feature selection method for each author as a single technology, and through the experiment, it was confirmed that it was possible to derive the same level of accuracy as the source code-based analysis with an improvement of 5.0% accuracy compared to the previous study.

A Method for Detecting Program Plagiarism Comparing Class Structure Graphs (클래스 구조 그래프 비교를 통한 프로그램 표절 검사 방법)

  • Kim, Yeoneo;Lee, Yun-Jung;Woo, Gyun
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.11
    • /
    • pp.37-47
    • /
    • 2013
  • Recently, lots of research results on program comparison have been reported since the code theft become frequent as the increase of code mobility. This paper proposes a plagiarism detection method using class structures. The proposed method constructs a graph representing the referential relationship between the member variables and the methods. This relationship is shown as a bipartite graph and the test for graph isomorphism is applied on the set of graphs to measure the similarity of the programs. In order to measure the effectiveness of this method, an experiment was conducted on the test set, the set of Java source codes submitted as solutions for the programming assignments in Object-Oriented Programming course of Pusan National University in 2012. In order to evaluate the accuracy of the proposed method, the F-measure is compared to those of JPlag and Stigmata. According to the experimental result, the F-measure of the proposed method is higher than those of JPlag and Stigmata by 0.17 and 0.34, respectively.

Technology Clustering Using Textual Information of Reference Titles in Scientific Paper (과학기술 논문의 참고문헌 텍스트 정보를 활용한 기술의 군집화)

  • Park, Inchae;Kim, Songhee;Yoon, Byungun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.2
    • /
    • pp.25-32
    • /
    • 2020
  • Data on patent and scientific paper is considered as a useful information source for analyzing technological information and has been widely utilized. Technology big data is analyzed in various ways to identify the latest technological trends and predict future promising technologies. Clustering is one of the ways to discover new features by creating groups from technology big data. Patent includes refined bibliographic information such as patent classification code whereas scientific paper does not have appropriate bibliographic information for clustering. This research proposes a new approach for clustering data of scientific paper by utilizing reference titles in each scientific paper. In this approach, the reference titles are considered as textual information because each reference consists of the title of the paper that represents the core content of the paper. We collected the scientific paper data, extracted the title of the reference, and conducted clustering by measuring the text-based similarity. The results from the proposed approach are compared with the results using existing methodologies that one is the approach utilizing textual information from titles and abstracts and the other one is a citation-based approach. The suggested approach in this paper shows statistically significant difference compared to the existing approaches and it shows better clustering performance. The proposed approach will be considered as a useful method for clustering scientific papers.

Study on MPI-based parallel sequence similarity search in the LINUX cluster (클러스터 환경에서의 MPI 기반 병렬 서열 유사성 검색에 관한 연구)

  • Hong, Chang-Bum;Cha, Jeoung-Ho;Lee, Sung-Hoon;Shin, Seung-Woo;Park, Keun-Joon;Park, Keun-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.6 s.44
    • /
    • pp.69-78
    • /
    • 2006
  • In the field of the bioinformatics, it plays an important role in predicting functional information or structure information to search similar sequence in biological DB. Biolrgical sequences have been increased dramatically since Human Genome Project. At this point, because the searching speed for the similar sequence is highly regarded as the important factor for predicting function or structure, the SMP(Sysmmetric Multi-Processors) computer or cluster is being used in order to improve the performance of searching time. As the method to improve the searching time of BLAST(Basic Local Alighment Search Tool) being used for the similarity sequence search, We suggest the nBLAST algorithm performing on the cluster environment in this paper. As the nBLAST uses the MPI(Message Passing Interface), the parallel library without modifying the existing BLAST source code, to distribute the query to each node and make it performed in parallel, it is possible to easily make BLAST parallel without complicated procedures such as the configuration. In addition, with the experiment performing the nBLAST in the 28 nodes of LINUX cluster, the enhanced performance according to the increase in the number of the nodes has been confirmed.

  • PDF