DOI QR코드

DOI QR Code

Comparison of Topic Modeling Methods for Analyzing Research Trends of Archives Management in Korea: focused on LDA and HDP

국내 기록관리학 연구동향 분석을 위한 토픽모델링 기법 비교 - LDA와 HDP를 중심으로 -

  • 박준형 (전북대학교 일반대학원 기록관리학과) ;
  • 오효정 (전북대학교 기록관리학과, 문화융복합 아카이빙연구소)
  • Received : 2017.11.16
  • Accepted : 2017.12.16
  • Published : 2017.12.30

Abstract

The purpose of this study is to analyze research trends of archives management in Korea by comparing LDA (Latent Semantic Allocation) topic modeling, which is the most famous method in text mining, and HDP (Hierarchical Dirichlet Process) topic modeling, which is developed LDA topic modeling. Firstly we collected 1,027 articles related to archives management from 1997 to 2016 in two journals related with archives management and four journals related with library and information science in Korea and performed several preprocessing steps. And then we conducted LDA and HDP topic modelings. For a more in-depth comparison analysis, we utilized LDAvis as a topic modeling visualization tool. At the results, LDA topic modeling was influenced by frequently keywords in all topics, whereas, HDP topic modeling showed specific keywords to easily identify the characteristics of each topic.

본 연구에서는 최근 각광을 받고 있는 텍스트마이닝 기법인 LDA 토픽모델링과 이를 변형한 HDP 토픽모델링을 적용하여 국내 기록관리학의 연구동향을 분석하고자 한다. 이를 위해 국내 기록관리학 관련 학술지 2종과 문헌정보학 관련 학술지 4종에서 1997년부터 2016년까지 발표된 기록관리학 관련 논문 1,027건을 수집하고 적절한 전처리과정을 거친 후 LDA 토픽모델링과 HDP 토픽모델링을 각각 수행하였다. 또한 토픽모델링 시각화 도구인 LDAvis를 활용하여 토픽별 거리를 가시적으로 표현하고 세부 대표 키워드를 분석하였다. 두 토픽모델링을 비교한 결과, LDA 토픽모델링은 전반적으로 해당 도메인을 대표하는 주요 키워드로 빈도수에 영향을 많이 받았으며, HDP 토픽모델링은 각 토픽별 특징을 파악할 수 있는 특수한 키워드가 많이 도출되었다. 이를 통해 LDA는 국내 기록관리학 내에 거시적으로 대표되는 주제들을, HDP는 세부 주제별 미시적인 핵심 키워드를 도출하는데 효과적임을 알 수 있었다.

Keywords

References

  1. Choi, Yilang. 2015. A Study on the Research Trends of Archival Studies in Korea : Focused on Research Papers between 2004 and 2013. The Korean Journal of Archival Studies, 43, 147-177.
  2. Gyu-Hwan Kim, Young-Joon Nam. 2009. A Study on the Research Trends of Records and Archives Management in Korea through an Analysis of Journal Articles. Journal of The Korean Society for Library and Information Science, 43(4): 217-239. https://doi.org/10.4275/KSLIS.2009.43.4.217
  3. Gyu-Hwan Kim, Bo-Seong Jang, Hyun-Jung Yi. 2009. A Study on Intellectual Structure of Records Management and Archives in Korea : Based on Syntactic and Semantic Structure of Article Titles. Journal of The Korean Society for Library and Information Science, 43(3): 417-439. https://doi.org/10.4275/KSLIS.2009.43.3.417
  4. Namgyu Kim, Donghoon Lee, Hochang Choi, Willam Xiu Shun Wong. 2017. Investigations on Techniques and Applications of Text Analytics. The Journal of Communications and Information Sciences, 42(2): 471-492.
  5. Kim, Sang Kyoum. 2016. A Study on the Research Trends in Domestic Industrial Engineering using Topic Modeling. master's thesis, Seoul National University of Science and Technology, Seoul, Korea.
  6. Seol A Jin, Min Song. 2016. Topic Modeling based Interdisoiplinarity Measurement in the Informatics Related Journals. Journal of the Korean Society for Information Management, 33(1): 7-32. https://doi.org/10.3743/KOSIM.2016.33.1.007
  7. So-Young Yu. 2015. Combining Ego-centric Network Analysis and Dynamic Citation Network Analysis to Topic Modeling for Characterizing Research Trends. Journal of the Korean Society for Information Management, 32(1): 153-169. https://doi.org/10.3743/KOSIM.2015.32.1.153
  8. TaeKyung Kim, HoeRyeon Choi, HongChul Lee. 2016. A Study on the Research Trends in Fintech using Topic Modeling. The Journal of Korea Academy Industrial Cooperation Society, 17(11): 670-681.
  9. Jae-Yun Lee, Ju-Young Moon, Hee-Jung Kim. 2007. Examining the Intellectual Structure of Records Management & Archivial Science in Korea with Text Mining. Journal of The Korean Society for Library and Information Science, 41(1): 345-372. https://doi.org/10.4275/KSLIS.2007.41.1.345
  10. Nahm, Choon-Ho. 2016. An Illustrative Application of Topic Modeling Method to a Farmer's Diary. Cross-Cultural Studies, 22(1): 89-135.
  11. Sang-Tae Na, Joo-Eon Ahn, Min-Ho Jung, Ja-Hee Kim. 2017. Research Trend Analysis for Smart Grids Using Dynamic Topic Modeling. The transactions of The Korean Institute of Electrical Engineers, 66(4): 613-620. https://doi.org/10.5370/KIEE.2017.66.4.613
  12. Tea-Woo Nam, Jin-Young Lee. 2009. A Study on the Research Trends of Records and Archives Management in Korea. Journal of Korean Library and Information Science Society, 40(2): 451-472.
  13. Ja-Hyun Park, Min Song. 2013. A Study on the Research Trends in Library & Infromation Science in Korea using Topic Modeling. Journal of the Korean Society for Information Management, 30(1): 7-32. https://doi.org/10.3743/KOSIM.2013.30.1.007
  14. Park Ju Seop, Hong Soon-Goo, Kim Jong-Weon. 2017. A Study on Science Technology Trend and Prediction Using Topic Modeling. Journal of the Korea Industrial Information Systems Research, 22(4): 19-28. https://doi.org/10.9723/JKSIIS.2017.22.4.019
  15. Records Management & Archives Society Of Korea. 2013. Records & Archives Management. Seongnam: Asian cultural history.
  16. Seo, Seong Hun. 2016. Fintech trend analysis using topic modeling of BM patents. master's thesis, Seoul National University of Science and Technology, Seoul, Korea.
  17. Shin, Kyoo-Sik, Choi, Hoe-Ryeon, Lee, Hong-Chul. 2015. Topic Model Analysis of Research Trend on Renewable Energy. The Journal of Korea Academy Industrial Cooperation Society, 16(9): 6411-6418. https://doi.org/10.5762/KAIS.2015.16.9.6411
  18. Hye In Sohn, Young Joon Nam. 2016. A Study on the Research Trends of Archives Management in Korea : Focused on the Journal of Records - Management & Archives Society of Korea and The Korean Journal of Archival Studies. Journal of Korea Society for Information Management, 33(1): 85-110. https://doi.org/10.3743/KOSIM.2016.33.1.085
  19. Carson Sievert and Kenneth E. Shirley. 2014. LDAvis: A method for visualizing and interpreting topics. proceedings of workshop on interactive language learning, visualization, and interfaces, Baltimore, Maryland.
  20. Chong Wang, John Paisley and David M. Blei. 2011. Online Variational Inference for the Hierarchical Dirichlet Process. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL.
  21. David M. Blei. 2012. Provavilistic Topic Models. Communications of the ACM, 55(4): 77-84. https://doi.org/10.1145/2133806.2133826
  22. David M. Blei, Andrew Y. Ng and Michael I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3: 993-1022.
  23. Gensim Home Page. [cited 2017. 9. 15].
  24. Jason Chuang, Christopher D. Manning and Jeffrey Heer. 2012. Termite: Visualization Techniques for Assessing Textual Topic Models. Advanced Visual Interfaces, 12: 21-25.
  25. Konlpy Home Page. [cited 2017. 9. 15].
  26. Loet Leydesdorff and and Adina Nerghes. 2017. Co-word Maps and Topic Modeling: A Comparison Using Small and Medium-Sized Corpora (N<1,000). Journal of the Association for Information Science and Technology, 68(4): 1024-1035. https://doi.org/10.1002/asi.23740
  27. Teh, Yee Whye, Michael I. Jordan, Matthew J. Beal and David M. Blei. 2007. Hierarchical Dirichlet processes. Journal of the American Statistical Association,101(476): 1566-1581. https://doi.org/10.1198/016214506000000302

Cited by

  1. 텍스트마이닝 기법을 활용한 미국산업응용수학 학회지의 연구 현황 및 동향 분석 vol.20, pp.7, 2017, https://doi.org/10.5392/jkca.2020.20.07.212
  2. 토픽 모델링 기반의 국내외 공공데이터 연구 동향 비교 분석 vol.19, pp.2, 2017, https://doi.org/10.14400/jdc.2021.19.2.001
  3. 토픽 모델링을 이용한 지속가능패션 연구 동향 분석 vol.29, pp.4, 2017, https://doi.org/10.29049/rjcc.2021.29.4.538