DOI QR코드

DOI QR Code

Machine Learning-Based Authorship Attribution for Historical Texts - a case study of GaeByeok magazine in the 1920s

기계학습을 이용한 역사 텍스트의 저자판별 : 1920년대 『개벽』 잡지의 논설 텍스트

  • Published : 20180000

Abstract

This study aims to demonstrate how the authorship attribution techniques can be applied to historical texts, exploring the potential of authorship attribution as a solution to the real world authorship disputes and the possibility of multidisciplinary research that combines humanities and quantitative text analytics. History and literary studies have used traditional methods of judging the similarity of topics and subject matters or relying on extra-textual information to solve the authorship problems. This subjective and anecdotal approach to authorship needs to be complemented by incorporating objective and quantitative methodology that examines intra-textual clues. As the first case study, we performed machine learning-based authorship attribution analysis on the 164 opinion texts with unknown authorship from GaeByeok magazine of the 1920s. To enhance accuracy and reliability of the analysis, an improved machine learning algorithm was devised based on SVM by incorporating three parameters α, β, θ into the prediction model. This study is also a case study showing how to perform the authorship attribution analysis in an open setting, not in a closed setting. We hope that the prediction results of the analysis will encourage and facilitate more productive discussion among related disciplines on authorship identification and verification of real historical texts.

Keywords