DOI QR코드

DOI QR Code

A Method for Clustering Noun Phrases into Coreferents for the Same Person in Novels Translated into Korean

한국어 번역 소설에서 인물명 명사구의 동일인물 공통참조 클러스터링 방법

  • Park, Taekeun (Dept. of Applied Computer Engineering, Dankook University) ;
  • Kim, Seung-Hoon (Dept. of Applied Computer Engineering, Dankook University)
  • Received : 2016.12.12
  • Accepted : 2017.01.17
  • Published : 2017.03.30

Abstract

Novels include various character names, depending on the genre and the spatio-temporal background of the novels and the nationality of characters. Besides, characters and their names in a novel are created by the author's pen and imagination. As a result, any proper noun dictionary cannot include all kinds of character names. In addition, the novels translated into Korean have character names consisting of two or more nouns (such as "Harry Potter"). In this paper, we propose a method to extract noun phrases for character names and to cluster the noun phrases into coreferents for the same character name. In the extraction of noun phrases, we utilize KKMA morpheme analyzer and CPFoAN character identification tool. In clustering the noun phrases into coreferents, we construct a directed graph with the character names extracted by CPFoAN and the extracted noun phrases, and then we create name sets for characters by traversing connected subgraphs in the directed graph. With four novels translated into Korean, we conduct a survey to evaluate the proposed method. The results show that the proposed method will be useful for speaker identification as well as for constructing the social network of characters.

Keywords

References

  1. D.K. Elson and K.R. McKeown, "Automatic Attribution of Quoted Speech in Literary Narrative," Proceedings of the 24th AAAI Conference on Artificial Intelligence, pp. 1013-1019, 2010.
  2. D.K. Elson, N. Dames, and K.R. McKwown, "Extracting Social Networks from Literary Fiction," Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 138-147, 2010.
  3. E. Iosif and T. Mishra, "From Speaker Identification to Affective Analysis: A Multi-Step System for Analyzing Children' Stories," Proceeding of the 3rd Workshop on Computational Linguistics for Literature, pp. 40-49, 2014.
  4. Stanford CoreNLP-A Suite of Core NLP Tools, http://nlp.stanford.edu/software/corenlp.shtml, (accessed Nov., 28, 2016).
  5. T. Park and S. H. Kim, "A Character Identification Method Using Postpositions for Animate Nouns in Korean Novels," Journal of Information Technology Services, Vol. 15, No. 3, pp. 115-125, 2016.
  6. T. Park and S.H. Kim, "A Character Identification Method Utilizing Connective and Possessive Forms of Animate Nouns in Novels Translated into or Written in Korean," IEICE Transactions on Information and Systems, 2016.
  7. E.Y. Lee, "Named Entity Detection and Relation Extraction in the Personal Chronology of the 19th Century," Journal of EONEOHAG, Vol. 53, pp. 141-162, 2009.
  8. G.M. Park, S.H. Kim, and H.G. Cho, "Analysis of Social Network According to the Distance of Character Statements," Journal of the Korea Contents Association, Vol. 13, No. 4, pp. 427-439, 2013. https://doi.org/10.5392/JKCA.2013.13.04.427
  9. B.H. Back, I. Ha, and B.C. Ahn, "An Extraction Method of Sentiment Information from Unstructured Big Data on SNS," Journal of Korea Multimedia Society, Vol. 17, No. 6, pp. 671-680, 2014. https://doi.org/10.9717/kmms.2014.17.6.671
  10. D.J. Lee, J.H. Yeon, I.B. Hwang, and S.G. Lee, "KKMA: A Tool for Utilizing Sejong Corpus Based on Relational Database," Journal of KIISE: Computing Practices and Letters, Vol. 16, No. 11, pp. 1046-1050, 2010.