A Study of Natural Language Plagiarism Detection

  • Ahn, Byung-Ryul (School of Information Communication Engineering, SungKyunKwan University) ;
  • Kim, Heon (School of Information Communication Engineering, SungKyunKwan University) ;
  • Kim, Moon-Hyun (School of Information Communication Engineering, SungKyunKwan University)
  • Published : 2005.11.25

Abstract

Vast amount of information is generated and shared in this active digital As the digital informatization is vividly going on now, most of documents are in digitalized forms, and this kind of information is on the increase. It is no exaggeration to say that this kind of newly created information and knowledge would affect the competitiveness and the future of our nation. In addition to that, a lot of investment is being made in information and knowledge based industries at national level and in reality, a lot of efforts are intensively made for research and development of human resources. It becomes easier in digital era to create and share the information as there are various tools that have been developed to create documents along with the internet, and as a result, the share of dual information is increasing day in and day out. At present, a lot of information that is provided online is actually being plagiarized or illegally copied. Specifically, it is very tricky to identify some plagiarism from tremendous amount of information because the original sentences can be simply restructured or replaced with similar words, which would make them look different from original sentences. This means that managing and protecting the knowledge start to be regarded as important, though it is important to create the knowledge through the investment and efforts. This dissertation tries to suggest new method and theory that would be instrumental in effectively detecting any infringement on and plagiarism of intellectual property of others. DICOM(Dynamic Incremental Comparison Method), a method which was developed by this research to detect plagiarism of document, focuses on realizing a system that can detect plagiarized documents and parts efficiently, accurately and immediately by creating positive and various detectors.

Keywords