A Method for Measuring Similarity Measure of Thesaurus Transformation Documents using DBSCAN

Kim, Byeongsik;Shin, Juhyun;

doi:10.9717/kmms.2018.21.9.1035

Journal of Korea Multimedia Society (한국멀티미디어학회논문지)

Volume 21 Issue 9
/
Pages.1035-1043
/
2018
/
1229-7771(pISSN)
/
2384-0102(eISSN)

Korea Multimedia Society (한국멀티미디어학회)

DOI QR Code

A Method for Measuring Similarity Measure of Thesaurus Transformation Documents using DBSCAN

DBSCAN을 활용한 유의어 변환 문서 유사도 측정 방법

Kim, Byeongsik (Dept. of Software Convergence Engineering Chosun University) ;
Shin, Juhyun (Dept. of ICT Convergence, Chosun University)

김병식 ;
신주현

Received : 2018.04.09
Accepted : 2018.06.20
Published : 2018.09.30

https://doi.org/10.9717/kmms.2018.21.9.1035 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

There is a case where the core content of another person's work is decorated as though it is his own thoughts by changing own thoughts without showing the source. Plagiarism test of copykiller free service used in plagiarism check is performed by comparing plagiarism more than 6th word. However, it is not enough to judge it as a plagiarism with a six - word match if it is replaced with a similar word. Therefore, in this paper, we construct word clusters by using DBSCAN algorithm, find synonyms, convert the words in the clusters into representative synonyms, and construct L-R tables through L-R parsing. We then propose a method for determining the similarity of documents by applying weights to the thesaurus and weights for each paragraph of the thesis.

Keywords

References

I.S. Hwang, "Development of A Plagiarism Detection System Using Web Search and Morpheme Analysis," Journal of Information Technology Applications and Management, Vol. 16, No. 1, pp. 21-36, 2009.
D. Kwack, "A Study on the Types of Plagiarism and Appropriate Citation Practices of Writing Research Papers," Proceeding of the Korean Society for Library and Information Science, Vol. 41, No. 3, pp. 103-126, 2007. https://doi.org/10.4275/KSLIS.2007.41.3.103
R. Robertson, "Understanding Inverse Document Frequency: on Theoretical Arguments for IDF," Journal of Documentation, Vol. 60, No. 5, pp. 503-520, 2004. https://doi.org/10.1108/00220410410560582
J.Y. Son and Y.T. Shin, "Music Lyrics Summarization Method Using TextRank Algorithm," Journal of Korea Multimedia Society, Vol. 21, No. 1, pp. 45-50, 2018. https://doi.org/10.9717/KMMS.2018.21.1.045
Q. Le and T. Milokov, "Distributed Representations of Sentences and Documents," Proceeding of the 31st International Conference on Machine Learning, Vol. 23, No. 12, pp. 698-702, 2014.
K. Cheng, J. Li, J. Tang, and H. Liu, "Unsupervised Sentiment Analysis with Signed Social Networks," Proceeding of the 23rd ACM Special Interest Group on Knowledge Discovery and Data Mining International Conference on Knowledge Discorvery and data Mining, pp. 777-786, 2017.
D.W. Kim and M.W. Koo, "Categorization of Korean News Articles Based on Convolutional Neural Network Using Doc2Vec and Word2Vec," Journal of Korea Institute on Information Scientists Engineers, Vol. 44, No. 7, pp. 742-747, 2017.
J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, USA, 2005.
M.S. Kwon, Y.H. Kang, H.J. Han, and D.S. Cho, "Adaptive DBSCAN for Time-varing Clustering DBSCAN," Proceeding of Information and Control Symposium, Vol. 2016, No. 4, pp. 134-135, 2016.
M. Ester, H.P. Kriegel, J. Sander, and X. Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," Proceeding of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226-231, 1996.
Y.H. Won, Efficient LR(k) Parsing Algorithms, Master's Thesis of Korea Advanced Institute of Science, 1975.
M.J. Kim and S.J. Lee. "Measures of Abnormal User Activities in Online Comments Based on Cosine Similarity," Journal of the Korea Institute of Information Security and Cryptology, Vol. 24, No. 2, pp. 335-343, 2014. https://doi.org/10.13089/JKIISC.2014.24.2.335
H.S. Ji, J.H. Joh, and H.S. Lim, "A Detection Method of Similar Sentences Considering Plagiarism Patterns of Korean Sentence," Journal of Korea Computer Education Association, Vol. 13, No. 6, pp. 78-89, 2010.

Journal of Korea Multimedia Society (한국멀티미디어학회논문지)

A Method for Measuring Similarity Measure of Thesaurus Transformation Documents using DBSCAN

DBSCAN을 활용한 유의어 변환 문서 유사도 측정 방법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)