Search | Korea Science

A Study of Natural Language Plagiarism Detection

Ahn, Byung-Ryul;Kim, Heon;Kim, Moon-Hyun
- Proceedings of the Korea Society of Information Technology Applications Conference
- /
- 2005.11a
- /
- pp.325-329
- /
- 2005
Vast amount of information is generated and shared in this active digital As the digital informatization is vividly going on now, most of documents are in digitalized forms, and this kind of information is on the increase. It is no exaggeration to say that this kind of newly created information and knowledge would affect the competitiveness and the future of our nation. In addition to that, a lot of investment is being made in information and knowledge based industries at national level and in reality, a lot of efforts are intensively made for research and development of human resources. It becomes easier in digital era to create and share the information as there are various tools that have been developed to create documents along with the internet, and as a result, the share of dual information is increasing day in and day out. At present, a lot of information that is provided online is actually being plagiarized or illegally copied. Specifically, it is very tricky to identify some plagiarism from tremendous amount of information because the original sentences can be simply restructured or replaced with similar words, which would make them look different from original sentences. This means that managing and protecting the knowledge start to be regarded as important, though it is important to create the knowledge through the investment and efforts. This dissertation tries to suggest new method and theory that would be instrumental in effectively detecting any infringement on and plagiarism of intellectual property of others. DICOM(Dynamic Incremental Comparison Method), a method which was developed by this research to detect plagiarism of document, focuses on realizing a system that can detect plagiarized documents and parts efficiently, accurately and immediately by creating positive and various detectors.
PDF

Hierarchical Clustering Methodology for Source Code Plagiarism Detection (계층적 군집화 기법을 이용한 소스 코드 표절 검사)

Sohn, Ki-Rack;Moon, Seung-Mi
- Journal of The Korean Association of Information Education
- /
- v.11 no.1
- /
- pp.91-98
- /
- 2007
Plagiarism is a serious problem in school education due to current technologies such as the internet and word processors. This paper presents how to detect source code plagiarism using similarity based on string comparison methods. The main contribution is to use hierarchical agglomerative clustering technique to classify plagiarism groups, which are then visualized as a dendrogram. Graders can set an empirical threshold to the dendrogram to navigate plagiarism groups. We evaluated the performance of the presented method with a real world data. The result showed the usefulness and applicability of this method.
PDF

Program Plagiarism Detection through Memory Access Log Analysis (메모리 액세스 로그 분석을 통한 프로그램 표절 검출)

Park, Sung-Yun;Han, Sang-Yong
- The KIPS Transactions:PartD
- /
- v.13D no.6 s.109
- /
- pp.833-838
- /
- 2006
Program Plagiarism is an infringement of software copyright. In detecting program plagiarism, many different source program comparison methods has been studied. But, it is not easy to detect plagiarized program that made a few cosmetic changes in program structures and variable names In this paper, we propose a new ground-breaking technique in detecting plagiarism by Memory Access Log Analysis.
https://doi.org/10.3745/KIPSTD.2006.13D.6.833 인용 PDF KSCI

A Two Phases Plagiarism Detection System for the Newspaper Articles by using a Web Search and a Document Similarity Estimation (웹 검색과 문서 유사도를 활용한 2 단계 신문 기사 표절 탐지 시스템)

Cho, Jung-Hyun;Jung, Hyun-Ki;Kim, Yu-Seop
- The KIPS Transactions:PartB
- /
- v.16B no.2
- /
- pp.181-194
- /
- 2009
With the increased interest on the document copyright, many of researches related to the document plagiarism have been done up to now. The plagiarism problem of newspaper articles has attracted much interest because the plagiarism cases of the articles having much commercial values in market are currently happened very often. Many researches related to the document plagiarism have been so hard to be applied to the newspaper articles because they have strong real-time characteristics. So to detect the plagiarism of the articles, many human detectors have to read every single thousands of articles published by hundreds of newspaper companies manually. In this paper, we firstly sorted out the articles with high possibility of being copied by utilizing OpenAPI modules supported by web search companies such as Naver and Daum. Then, we measured the document similarity between selected articles and the original article and made the system decide whether the article was plagiarized or not. In experiment, we used YonHap News articles as the original articles and we also made the system select the suspicious articles from all searched articles by Naver and Daum news search services.
https://doi.org/10.3745/KIPSTB.2009.16-B.2.181 인용 PDF KSCI

Applying Genomic Sequence Alignment Methodology for Source Codes Plagiarism Detection (유전체 서열의 정렬 기법을 이용한 소스 코드 표절 검사)

강은미;황미녕;조환규
- Journal of KIISE:Computing Practices and Letters
- /
- v.9 no.3
- /
- pp.352-367
- /
- 2003
The syntactic and semantic characteristics of a computer program can be represented by the keywords sequence extracted from the source code. Therefore the similarity and the difference between two programs can be clearly figured out by comparing the keyword sequences obtained from the given programs. Various methods for measuring the similarity of two different sequences have been intensively studied already in bioinformatics on biological genetic sequence manipulation. In this paper, we propose a new method for measuring the similarity of two different programs and detecting the partial plagiarism by exploiting the sequence alignment techniques. In order to evaluate the performance of the proposed method, we experimented with the actual Program codes submitted by 70 students attending a Data Structure course )tow 2001. The experimental results show that the proposed method is more effective and powerful than the fingerprint method which is the most commonly used for the Plagiarism detection.
PDF KSCI

A Study on Efficient Program Plagiarism Detection (효율적인 프로그램 표절 탐지에 관한 연구)

Ahn Byung-Ryul;Kim Moon-Hyun
- Annual Conference of KIPS
- /
- 2006.05a
- /
- pp.147-150
- /
- 2006
본 논문에서는 각종 언어로 구현된 프로그램의 소스 코드를 표절 하였을 경우 이를 효과적으로 탐지하는 방법과 이론을 제시하고자 한다. 기존에 사용되고 있는 프로그램 표절(plagiarism) 검사 소프트웨어의 장단점을 분석하고, 특히 단점을 극복하기 위한 방법으로 Pattern Matching을 이용한 표절 검출방법을 소개한다. 그리고 기존의 Pattern Matching을 이용한 방법에서 나타나는 문제점을 극복하여 좀 더 발전된 방식의 자동 표절 검출 시스템을 소개하고자 한다.
PDF

Developing of Text Plagiarism Detection Model using Korean Corpus Data (한글 말뭉치를 이용한 한글 표절 탐색 모델 개발)

Ryu, Chang-Keon;Kim, Hyong-Jun;Cho, Hwan-Gue
- Journal of KIISE:Computing Practices and Letters
- /
- v.14 no.2
- /
- pp.231-235
- /
- 2008
Recently we witnessed a few scandals on plagiarism among academic paper and novels. Plagiarism on documents is getting worse more frequently. Although plagiarism on English had been studied so long time, we hardly find the systematic and complete studies on plagiarisms in Korean documents. Since the linguistic features of Korean are quite different from those of English, we cannot apply the English-based method to Korean documents directly. In this paper, we propose a new plagiarism detecting method for Korean, and we throughly tested our algorithm with one benchmark Korean text corpus. The proposed method is based on "k-mer" and "local alignment" which locates the region of plagiarized document pairs fast and accurately. Using a Korean corpus which contains more than 10 million words, we establish a probability model (or local alignment score (random similarity by chance). The experiment has shown that our system was quite successful to detect the plagiarized documents.
PDF KSCI

An Adaptive Algorithm for Plagiarism Detection in a Controlled Program Source Set (제한된 프로그램 소스 집합에서 표절 탐색을 위한 적응적 알고리즘)

Ji, Jeong-Hoon;Woo, Gyun;Cho, Hwan-Gue
- Journal of KIISE:Software and Applications
- /
- v.33 no.12
- /
- pp.1090-1102
- /
- 2006
This paper suggests a new algorithm for detecting the plagiarism among a set of source codes, constrained to be functionally equivalent, such are submitted for a programming assignment or for a programming contest problem. The typical algorithms largely exploited up to now are based on Greedy-String Tiling, which seeks for a perfect match of substrings, and analysis of similarity between strings based on the local alignment of the two strings. This paper introduces a new method for detecting the similar interval of the given programs based on an adaptive similarity matrix, each entry of which is the logarithm of the probabilities of the keywords based on the frequencies of them in the given set of programs. We experimented this method using a set of programs submitted for more than 10 real programming contests. According to the experimental results, we can find several advantages of this method compared to the previous one which uses fixed similarity matrix(+1 for match, -1 for mismatch, -2 for gap) and also can find that the adaptive similarity matrix can be used for detecting various plagiarism cases.
PDF KSCI

Program Plagiarism Detection based on X-treeDiff+ (X-treeDiff+ 기반의 프로그램 복제 탐지)

Lee, Suk-Kyoon
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.47 no.4
- /
- pp.44-53
- /
- 2010
Program plagiarism is a significant factor to reduce the quality of education in computer programming. In this paper, we propose the technique of identifying similar or identical programs in order to prevent students from reckless copying their programming assignments. Existing approaches for identifying similar programs are mainly based on fingerprints or pattern matching for text documents. Different from those existing approaches, we propose an approach based on the program structur. Using paring progrmas, we first transform programs into XML documents by representing syntactic components in the programs with elements in XML document, then run X-tree Diff+, which is the change detection algorithm for XML documents, and produce an edit script as a change. The decision of similar or identical programs is made on the analysis of edit scripts in terms of program plagiarism. Analysis of edit scripts allows users to understand the process of conversion between two programs so that users can make qualitative judgement considering the characteristics of program assignment and the degree of plagiarism.
PDF KSCI

Enhancing the performance of code-clone detection tools using code2vec (code2vec을 이용한 유사도 감정 도구의 성능 개선)

Um, Taeho;Hong, Sung Moon;Yang, Joon Hyuk;Jang, Hyo Seok;Doh, Kyung-Goo
- Journal of Software Assessment and Valuation
- /
- v.17 no.1
- /
- pp.31-40
- /
- 2021
Plagiarism refers to the act of using the original data as if it were one's own without revealing the source. The plagiarism of source code causes a variety of problems, including legal disputes. Plagiarism in software projects is usually determined by measuring similarity by comparing every pair of source code within two projects. However, blindly comparing every pair has been a huge computational burden, causing a major factor of not using tools of better accuracy. If we can only compare pairs that are probable to be clones, eliminating pairs that are impossible to be clones, we can concentrate more on improving the accuracy of detection. In this paper, we propose a method of selecting highly probable candidates of clone pairs by pre-classifying suspected source-codes using a machine-learning model called code2vec.
https://doi.org/10.29056/jsav.2021.06.05 인용

Search Result 66, Processing Time 0.038 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)