Cluster Analysis Study based on Content Types of <Heungbu-jeon> versions

<흥부전> 이본의 내용 유형에 따른 군집 분석 연구

  • 최운호 (국립목포대학교 국어국문학과) ;
  • 김동건 (경희대학교 후마니타스 칼리지)
  • Received : 2023.08.27
  • Accepted : 2023.10.11
  • Published : 2023.10.30

Abstract

This study aims to analyze the similarities and dissimilarities of various versions of <Heungbu-jeon> at both micro- and macro-levels using contents analysis techniques and the Hamming distance metrics. The 28 versions of <Heungbu-jeon> were segmented into 341 content units, and for each unit, the value of the content type was encoded. The dissimilarities between content types were compared among all versions by the content unit, respectively. The (dis-)similarities based on the content types of the 28 versions were aggregated and transformed into a distance matrix. The matrix was interpreted by multi-dimensional scaling, resulting into the two-dimensional coordinates. By visualizing the results by multi-dimensional scaling analysis, it was confirmed that the versions of <Heungbu-jeon> can be broadly divided into two groups. Hierarchical clustering and phylogenetic analysis were applied to analyze the clusters of the 28 versions, using the same distance matrix. The results showed that there are five clusters based on the micro-level analysis of (dis-)similarities within two major clusters. This study demonstrated the usefulness of applying digital humanities methods to encode the content of classical literary versions and analyze the data using clustering analysis techniques based on the (dis-)similarity of literary content.

이 연구는 내용 분석 기법과 해밍 거리 측정 방법을 적용하여 <흥부전> 이본의 계열과 계통을 미시적, 거시적으로 분석하는 것을 목적으로 한다. <흥부전>의 28개 이본을 내용 단락으로 분절하고 각 단락마다 내용 유형에 따라 내용 유형의 값을 인코딩하여서, 모든 이본의 유형 차이를 비교하였다. 28개 이본의 내용 단락 유형에 따른 차이를 종합하여서 이본의 친소 관계를 분석하기 위하여 거리 행렬로 변환하였다. 거리 행렬은 차원 축소 기법의 일종인 다차원 척도법을 적용하였고 그 결과 거리 행렬을 2차원 공간으로 축소하여 2차원 좌표를 구하였다. 다차원 척도법 분석 결과를 시각화하여서 흥부전 이본은 크게 2가지 계통으로 구분이 된다는 것을 확인하였다. 동일한 거리 행렬을 활용하여 28개 이본의 친소 관계 군집을 분석하기 위한 방법으로는 계층적 군집 분석과 계통분기분석방법을 적용하였다. 그 결과 2개의 이본 계통은 친소 관계의 미시적 분석 결과에 따라 5개의 계열이 존재하는 것을 확인하였다. 이 연구에서는 디지털 인문학 연구 방법을 적용하여 고전 문학 이본의 내용을 인코딩하고 그 데이터를 분석하는 방법을 적용하여 문헌의 내용 유사도에 따른 군집 분석 기법이 유용함을 보여주었다.

Keywords

References

  1. W. H. Choi, D. K. Kim, "A Research on Building Digital Contents of Korean Classical Texts and Computational Classification by Their Narrative Types," The Journal of Korean Institute of Information Technology, Vol. 12, No. 7, pp. 101-110, 2014. https://doi.org/10.14801/kiitr.2014.12.7.101
  2. W. K. Kang, B. R. Kim, "Stylistics Consideration of series," Studies in Humanities, Vol. 76, pp. 29-46, 2018.
  3. W. K. Kang, B. R. Kim, "A Study on the Transformation of Different Versions of the Series by Computer - Focused on Ewha Womans University's 15-volume version and Kyujanggak's 21-volume version -," Korean Language and Literature in Internation Context, Vol. 80, pp. 115-135, 2019. https://doi.org/10.31147/IALL.80.5
  4. W. K. Kang, B. R. Kim, "A Study on the Methodology of Digital Emotion Analysis for Classical Novels - For the Cloud Dream of the Nine," The East Asian Ancient Studies, Vol. 56, pp. 349-377, 2019. https://doi.org/10.17070/aeaas.2019.12.56.349
  5. W. H. Choi, D. K. Kim, "A Study of Measuring Text Distances using the Hierarchical Clustering Method in Application to Pansori Narratives," Journal of Humanities, Vol. 62, pp. 203-229, 2009. https://doi.org/10.17326/jhsnu..62.200912.203
  6. W. H. Choi, D. K. Kim, "Researches on Classifying Versions of Sipjangga by Measuring Similarities of Lexical Elements and using Hierarchical Clustering," The Journal of Korean Institute of Information Technology, Vol. 12, No. 5, pp. 133-138, 2012.
  7. W. H. Choi, D. K. Kim, "A Computation Approach to the Classification and Clustering of Tokkijeon through Pairwise Comparison of its Narrative Elements," The Studies of Korean Literature, Vol. 58, pp. 123-154, 2019. https://doi.org/10.20864/skl.2018.04.58.123
  8. K. S. Kwon, D. K. Kim, "The Classification of through a Computer Analysis Technique of Bibliographies," Journal of Pansori, Vol. 47, pp. 167-205, 2019. https://doi.org/10.18102/jp.2019.04.47.167
  9. J. O. Lee, D. K. Kim, "A Study on the Classification of Jeokbyeok-ga's Version by the Computer Analysis Technique of Bibliographies," International JOURNAL OF CONTENTS, Vol. 19, No. 16, pp. 1-9, 2019. https://doi.org/10.5392/IJoC.2023.19.1.001
  10. K. S. Kwon, W. H. Choi, and D. K. Kim, "A Lengthwise Comparative Study of Different Versions of Yadam - based on ," The Research of the Korean Classic, Vol. 57, pp. 87-120, 2022. https://doi.org/10.20516/classic.2022.57.87
  11. T. J. Kim, "Comparative Study of Heungbojeon," Journal of Dong-마 Language and Literature, Vol. 4, pp. 21-52, 1966.
  12. H. S. Hong, "A Study on Pansori Heungboga," Symposium, Vol. 1, pp. 61-90, 1974.
  13. Y. G. Kang, "A Study on the Versions of Pansori," DONG-A RONCHONG, Vol. 12, pp. 294-307, 1976.
  14. Y. H. Kwon, "A Study on the Versions of Heungbujeon," M.A. dissertation, Dept. of Korean Language and Literature, Kyungpook Nat'l Univ., Daegu, Korea, 1984.
  15. K. S. You, "A Study on Heungbujeon," Ph.D. dissertation, Dept. of Korean Language and Literature, Korea Univ., Seoul, Korea, 1989.
  16. C. G. Kim, "A study of versions and the composition on the Heung-boo-jon," Ph.D. dissertation, Dept. of Korean Language and Literature, Kyung Hee Univ., Seoul, Korea, 1991.
  17. J. Y. Kim, et al., "The Complete Collection of 1-3," in Pak-i-jeong Press, Seoul, 1997-2003.
  18. S. Y. Kang, "DNA Profiling for Classical Texts: Issues and Prospects for the Phylogenetic Analysis in Textual Criticism," Lingua Humanitatis, Vol. 15, No. 3, pp. 77-122, 2013.