Applying Genomic Sequence Alignment Methodology for Source Codes Plagiarism Detection

;;;

Journal of KIISE:Computing Practices and Letters (한국정보과학회논문지:컴퓨팅의 실제 및 레터)

Volume 9 Issue 3
/
Pages.352-367
/
2003
/
1229-7712(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Applying Genomic Sequence Alignment Methodology for Source Codes Plagiarism Detection

유전체 서열의 정렬 기법을 이용한 소스 코드 표절 검사

강은미 (부산대학교 전자계산학과) ;
황미녕 (한국과학기술정보연구원) ;
조환규 (부산대학교 정보컴퓨터공학부)

Published : 2003.06.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

The syntactic and semantic characteristics of a computer program can be represented by the keywords sequence extracted from the source code. Therefore the similarity and the difference between two programs can be clearly figured out by comparing the keyword sequences obtained from the given programs. Various methods for measuring the similarity of two different sequences have been intensively studied already in bioinformatics on biological genetic sequence manipulation. In this paper, we propose a new method for measuring the similarity of two different programs and detecting the partial plagiarism by exploiting the sequence alignment techniques. In order to evaluate the performance of the proposed method, we experimented with the actual Program codes submitted by 70 students attending a Data Structure course )tow 2001. The experimental results show that the proposed method is more effective and powerful than the fingerprint method which is the most commonly used for the Plagiarism detection.

일반적인 컴퓨터 프로그램의 구성적, 구문적 특징은 소스 코드로부터 추출한 키워드들의 서열로 나타낼 수 있다. 따라서 추출한 키워드의 서열을 비교하면 두 프로그램의 유사성과 상이점에 대해서 잘 파악할 수 있다. 서열의 유사성을 측정하는 여러 가지 방법은 생물학적 유전자 서열을 다루는 생물정보학에서 활발한 연구가 이루어져왔다. 본 논문에서 우리는 두 프로그램간의 유사성을 측정하고 서열 정렬 방법을 이용하여 부분 표절 검출을 하는 새로운 방법을 제안한다. 제시한 방법의 성능을 평가하기 위해서, 2001년 자료구조 수업에 참석한 수강생들이 제출한 프로그램을 실험 데이타로 사용하여 표절을 검사하였다. 실험결과는 제안된 기법이 표절 검사에 있어 가장 널리 사용되는 지문법(fingerprint)보다 더 효과적임을 보여 주었다.

Keywords

References

http://www.calstatela.edu/centers/write_cn/plagiarism.htm
http://www.rbsz.com/plag.htm
http://www.gyosuclub.com/
Tak W.Y. and Hector. G., 'Duplicate detection in information dissemination,' Proc. Very Large Databases Conference, pp. 66-77, 1995
Alan P. and James O.H., 'Computer algorithms for Plagiarism Detection,' IEEE Transactions on Education, Vol.32, No.2, pp. 94-99, 1989 https://doi.org/10.1109/13.28038
http://www.plagiarism.org
http://www.integriguard.com
http://www.canexus.com/eve/abouteve.shtml
http://www.copycatch.freeserve.co.uk
http://www.wordcheksystems.com/
Sergey B, James D. and H.G, 'Copy detection mechanisms for digital documents,' Proc. ACM SIGMOD International conference on Management of data, pp. 398-409, 1995 https://doi.org/10.1145/223784.223855
http://www.few.vu.nl/~dick/sim.html
http://glimpse.arizona.edu/javadup.html
Antonio. S., Hong V.L., and Rynson. W.H.L., 'CHECK: A document plagiarism detection system,' Proc. ACM Symposium on Applied Computing, pp. 70-77, 1997 https://doi.org/10.1145/331697.335176
Whale, 'Identification of Program Similarity in Large populations,' The Computer Journal, Vol.33, No.2, pp. 140-146, 1990 https://doi.org/10.1093/comjnl/33.2.140
Michael. J.W., 'Detection of similarities in student programs: YAP'ing may be preferable to Plague'ing,' Proc. SIGSCI Technical Symposium, pp. 268-271, 1992
Michael. J.W., 'YAP3: improved detection of similarities in computer programs and other texts,' Proc. SIGCSE'96, pp. 130-134, 1996 https://doi.org/10.1145/236452.236525
http://ftp.cs.berkeley.edu/~aiken/moss.html
http://wwwipd.ira.uka.de:2222/
이광근 교수와의 서신, private communication
조환규, 'Genomic Sequence alignment and its application for Computing Linear Structure Similarity,' 2002년 제 1차 한국생물정보학회 워크샵, 2. 2002
http://www2.ebi.ac.uk/clustalw/
Julie D.T.,Desmond G.H., and Toby. J.G., 'CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,' Nucleic Acids Res. Vol,22, No.22, pp. 4673-4680, 1994 https://doi.org/10.1093/nar/22.22.4673
Jeong-Hyeon C., Ho-Youl J.. Hey-Sun K. and Hwan-Gue C., 'PhyloDraw: a phylogenetic tree drawing system,' Bioinformatics, Vol.16, No.11 , pp. 1056-1058, 2000 https://doi.org/10.1093/bioinformatics/16.11.1056

Journal of KIISE:Computing Practices and Letters (한국정보과학회논문지:컴퓨팅의 실제 및 레터)

Applying Genomic Sequence Alignment Methodology for Source Codes Plagiarism Detection

유전체 서열의 정렬 기법을 이용한 소스 코드 표절 검사

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)