DOI QR코드

DOI QR Code

Analyzing data-related policy programs in Korea using text mining and network cluster analysis

텍스트 마이닝과 네트워크 군집 분석을 활용한 한국의 데이터 관련 정책사업 분석

  • 최성준 (서울대학교 기술경영경제정책전공) ;
  • 신기윤 (과학기술정책연구원 혁신기업연구단) ;
  • 오윤환 (과학기술정책연구원 신산업전략연구단)
  • Received : 2023.10.07
  • Accepted : 2023.11.17
  • Published : 2023.12.31

Abstract

This study endeavors to classify and categorize similar policy programs through network clustering analysis, using textual information from data-related policy programs in Korea. To achieve this, descriptions of data-related budgetary programs in South Korea in 2022 were collected, and keywords from the program contents were extracted. Subsequently, the similarity between each program was derived using TF-IDF, and policy program network was constructed accordingly. Following this, the structural characteristics of the network were analyzed, and similar policy programs were clustered and categorized through network clustering. Upon analyzing a total of 97 programs, 7 major clusters were identified, signifying that programs with analogous themes or objectives were categorized based on application area or services utilizing data. The findings of this research illuminate the current status of data-related policy programs in Korea, providing policy implications for a strategic approach to planning future national data strategies and programs, and contributing to the establishment of evidence-based policies.

본 연구는 우리나라 데이터 관련 정책사업에 대한 텍스트 정보를 기반으로 네트워크 군집 분석을 통해 유사한 사업들을 분류하고 유형화하였다. 이를 위해 2022년에 우리나라에서 추진된 데이터 관련 재정사업 설명자료를 수집하고 사업 내용으로부터 키워드를 추출, TF-IDF로 각 사업 간 유사도를 도출하였으며, 이를 기반으로 정책사업 네트워크를 구축하였다. 이후 정책사업 네트워크의 구조적 특징을 분석하고, 네트워크 군집 분석을 통해 유사한 정책사업들을 군집화하여 유형화 하였다. 총 97개의 사업을 분석한 결과, 7개의 주요 군집이 식별되었으며, 이를 통해 비슷한 주제나 목표를 가진 사업들이 응용 분야 혹은 데이터가 활용되는 서비스 관점에서 유형화가 이루어진 것을 확인하였다. 본 연구의 결과는 현재 우리나라 데이터 관련 정책사업의 현황을 보여줌과 동시에 향후 국가데이터전략 수립 및 사업 기획에 있어서 전략적 접근을 위한 정책적 시사점을 제공하며 증거기반 정책 확립에 기여한다.

Keywords

References

  1. Bafna, P., Pramod, D. and Vaidya, A. (2016). Document clustering: TF-IDF approach, 2016 International Conference on Electrical, Electronics, and Optimization Techniques, Mar. 3-5, Chennai, India, pp. 61-66. https://doi.org/10.1109/ICEEOT.2016.7754750.
  2. Barabasi, A. L. and Bonabeau, E. (2003). Scale-free networks, Scientific american, 288(5), 60-69. https://doi.org/10.1038/scientificamerican0503-60.
  3. Blondel, V. D., Guillaume, J. L., Lambiotte, R. and Lefebvre, E. (2008). Fast unfolding of communities in large networks, Journal of statistical mechanics: theory and experiment, 2008(10), P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008.
  4. Chae, H. G., Lee, G. H. and Lee, J. Y. (2021). Analysis of Domestic and Foreign Financial Security Research Activities and Trends through Topic Modeling Analysis, Journal of the Korea Industrial Information Systems Research, 26(1), 83-95.
  5. Choi, B., Yun, J. and Um, T. (2019). A Study on Policies to Revitalize the Public Big Data in Seoul, Knowledge Management Research, 20(3), 73-89.
  6. Choi, H. H. and Shim, D. Y. (2020). Analysis of Korean ICT Convergence Trend using Text Mining Methodology, Innovation Studies, 15(3), 257-281. https://doi.org/10.46251/INNOS.2020.08.15.3.257.
  7. Chung, J. M. and Park, Y. H. (2022). Social Media Bigdata Analysis Based on Information Security Keyword Using Text Mining, Journal of the Korea Industrial Information Systems Research, 27(5), 37-48.
  8. Chung, W. (2023). Legislative Tasks of the Digital Platform Government - Focusing on the linkage and integration of government services, Yonsei Law Journal, 42, 117-147, http://dx.doi.org/10.33606/YLA.42.3.
  9. Clauset, A., Newman, M. E. and Moore, C. (2004). Finding community structure in very large networks, Physical review E, 70(6), 066111. https://doi.org/10.1103/PhysRevE.70.066111.
  10. Esnault, C., Rollot, M., Guilmin, P. and Zucker, J. D. (2023). Qluster: An easy-to-implement generic workflow for robust clustering of health data, Frontiers in Artificial Intelligence, 5, 1055294. https://doi.org/10.3389/frai.2022.1055294.
  11. Gerlach, M., Peixoto, T. P. and Altmann, E. G. (2018). A network approach to topic models, Science advances, 4(7), eaaq1360. https://doi.org/10.1126/sciadv.aaq1360.
  12. Hennig, C. (2008). Dissolution point and isolation robustness: robustness criteria for general cluster analysis methods, Journal of multivariate analysis, 99(6), 1154-1176. https://doi.org/10.1016/j.jmva.2007.07.002.
  13. Hubert, L. and Arabie, P. (1985). Comparing partitions, Journal of classification, 2, 193-218. https://doi.org/10.1007/BF01908075.
  14. Janssen, M. and Kuk, G. (2016). Big and open linked data (BOLD) in research, policy, and practice, Journal of Organizational Computing and Electronic Commerce, 26(1-2), 3-13. https://doi.org/10.1080/10919392.2015.1124005.
  15. Jeong, S. K. (2022). The Study on Data Governance Research Trends Based on Text Mining: Based on the publication of Korean academic journals from 2009 to 2021, Journal of Digital Convergence, 20(4), 133-145. https://doi.org/10.14400/JDC.2022.20.4.133.
  16. Jeong, Y. I., Chung, D. B. and Mun, H. J. (2020). The Analysis of Knowledge Information Research and Development Activities for the Fourth Industrial Revolution: Focusing on the US Intelligence Advanced Research Projects Activity (IARPA), The Journal of the Korea Contents Association, 20(2), 1-14. https://doi.org/10.5392/JKCA.2020.20.02.001.
  17. Jung, H., Park, S. and Hyun D. (2021). A priority analysis of policy implementation tasks for the revitalization of the big data industry : Based on the analysis of policy priority using AHP, Korean Journal of Broadcasting and Telecommunication Studies, 35(1), 283-313. https://doi.org/10.22876/kab.2021.35.1.008.
  18. Kang, H. J. and Choi, C. (2022). TF-IDF Analysis on the Changes of National Balanced Development Policy by Regime and Its Implications, Journal of Social Science, 61(3), 487-509, https://doi.org/10.22418/JSS.2022.12.61.3.487.
  19. Kang, S. H., and Park, D. H. (2023). An Analysis of Tourism Policy Using Seoul Metropolitan Archives: Applying Text Network Analysis Techniques, Korean Journal of Hospitality & Tourism, 32(1), 131-142, https://doi.org/10.24992/KJHT.2023.2.32.01.131.
  20. Kim, H., Lee, T., Ryu, S. and Kim, N. (2018). A study on text mining methods to analyze civil complaints: Structured association analysis, Journal of the Korea Industrial Information Systems Research, 23(3), 13-24. https://doi.org/10.9723/jksiis.2018.23.3.013.
  21. Lange, T., Roth, V., Braun, M. L. and Buhmann, J. M. (2004). Stability-based validation of clustering solutions, Neural computation, 16(6), 1299-1323. https://doi.org/10.1162/089976604773717621.
  22. Luo, J. (2022). Data-driven innovation: What is it?, IEEE Transactions on Engineering Management, 70(2), 784-790. https://doi.org/10.1109/TEM.2022.3145231.
  23. Marjanovic, O. (2022). A novel mechanism for business analytics value creation: improvement of knowledge-intensive business processes. Journal of Knowledge Management, 26(1), 17-44. https://doi.org/10.1108/JKM-09-2020-0669.
  24. National Information Society Agency (2022). Progress and Challenges for Implementing the New Government's 'Digital Platform Government', D.gov Issue Analysis, 2022-04, 1-36.
  25. OECD. (2015). Data-Driven Innovation: Big Data for Growth and Well-Being, OECD Publishing, Paris.
  26. Oh, Y. and Moon, H. (2022). Analysis of global trends on smart manufacturing technology using topic modeling, Journal of the Korea Industrial Information Systems Research, 27(4), 65-79.
  27. Pfeffer, J. and Sutton, R. I. (2006). Evidence-based management, Harvard business review, 84(1), 62.
  28. Pyun, J. and Jeong, E. (2018). A Study on Recent Research Trend in New Product Development Using Keyword Network Analysis, Journal of the Korea Industrial Information Systems Research, 23(5), 119-134. https://doi.org/10.9723/jksiis.2018.23.5.119.
  29. Quan, X., Kit, C., Ge, Y. and Pan, S. J. (2015, July). Short and sparse text topic modeling via self-aggregation, Proceedings of the 24th International Joint Conference on Artificial Intelligence, Jul. 25-31, Buenos Aires, Argentina, pp. 2270-2276.
  30. Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods, Journal of the American Statistical association, 66(336), 846-850. https://doi.org/10.1080/01621459.1971.10482356.
  31. Rha, J. S. (2020). A Study on the Research Trends in Supply Chain Management in Korea using Network Text Analysis, Journal of the Korea Industrial Information Systems Research, 25(1), 41-53. https://doi.org/10.9723/jksiis.2020.25.1.041.
  32. Rha, J. S. (2022). Review of ESG Challenges in Supply Chain Management Using Text Analysis, Journal of the Korea Industrial Information System Research, 27(5), 145-156.
  33. Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval, Information processing & management, 24(5), 513-523. https://doi.org/10.1016/0306-4573(88)90021-0.
  34. Sarker, M. N. I., Wu, M. and Hossin, M. A. (2018). Smart governance through bigdata: Digital transformation of public agencies. 2018 international conference on artificial intelligence and big data, May 26-28, Chengdu, China, pp. 62-70. https://doi.org/10.1109/ICAIBD.2018.8396168.
  35. Seo, H. (2019). Major Agenda Analysis for 50 Years of Korean Science and Technology Policy Using Text Network Analysis - Focusing on History of Science and Technology for 50 years -, Science & Technology P olicy, 2(2), 171-201.
  36. Sestino, A., Kahlawi, A. and De Mauro, A. (2023). Decoding the data economy: a literature review of its impact on business, society and digital transformation. European Journal of Innovation Management. https://doi.org/10.1108/EJIM-01-2023-0078.
  37. Shim, H., Shin, K. and Lee, J. (2019). The evolution pattern of public R&D policy, Proceedings of the Korea Technology Innovation Society Conference, Nov. 6-9. Jeju, Korea, pp. 2137-2153.
  38. Song, S. H., Lee, S. Y., Shin, Y. and Lee, J. Y. (2017). A study on the effectiveness of Korea's open government data policy: ecosystem perspective, Journal of Korean Association for Regional Information Society, 20(4), 1-34.
  39. Sparck, J. K. (1972). A statistical interpretation of term specificity and its application in retrieval, The Journal of Documentation, 28, 11-21. https://doi.org/10.1108/eb026526.
  40. Steinley, D. (2004). Properties of the hubert-arable adjusted rand index, Psychological methods, 9(3), 386-396. https://doi.org/10.1037/1082-989X.9.3.386.
  41. Sung, W. (2016). A Study on the Improvement of Big Data Policy in the Public Sector. Korean P olicy Studies Review, 25(2), 125-150.
  42. Wang, W., Oh, H. G. and Rah, M. (2023). Analysis of Local Governments' Data-based Administration Revitalization Trends using Text-mining Method, Journal of local government studies, 35(2), 31-50.
  43. Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of 'small-world' networks, nature, 393(6684), 440-442. https://doi.org/10.1038/30918.
  44. Yeo, Y., Jeong, D., Shin, K., Choi, J. and Kim, Y. (2021). Analysis of evolution patterns and path dependences of the mid-to long-term strategies in the S&T field with qualitative and quantitative findings, Journal of Korea Technology Innovation Society, 24(5), 919-949. https://doi.org/10.35978/jktis.2021.10.24.5.919.
  45. You, H and Chung, D. B. (2023). A Research Trend Analysis of Data Policy using Text Mining, The Journal of the Korea Contents Association, 23(3), 17-26, https://doi.org/10.5392/JKCA.2023.23.03.017.