DOI QR코드

DOI QR Code

A Study on the Development Trend of Artificial Intelligence Using Text Mining Technique: Focused on Open Source Software Projects on Github

텍스트 마이닝 기법을 활용한 인공지능 기술개발 동향 분석 연구: 깃허브 상의 오픈 소스 소프트웨어 프로젝트를 대상으로

  • Chong, JiSeon (Department of Business Administration, Graduate School, Hanyang University) ;
  • Kim, Dongsung (Department of Business Administration, Graduate School, Hanyang University) ;
  • Lee, Hong Joo (Department of Business Administration, The Catholic University of Korea) ;
  • Kim, Jong Woo (School of Business, Hanyang University)
  • 정지선 (한양대학교 일반대학원 경영학과) ;
  • 김동성 (한양대학교 일반대학원 경영학과) ;
  • 이홍주 (가톨릭대학교 경영학과) ;
  • 김종우 (한양대학교 경영대학 경영학부)
  • Received : 2019.01.18
  • Accepted : 2019.03.11
  • Published : 2019.03.31

Abstract

Artificial intelligence (AI) is one of the main driving forces leading the Fourth Industrial Revolution. The technologies associated with AI have already shown superior abilities that are equal to or better than people in many fields including image and speech recognition. Particularly, many efforts have been actively given to identify the current technology trends and analyze development directions of it, because AI technologies can be utilized in a wide range of fields including medical, financial, manufacturing, service, and education fields. Major platforms that can develop complex AI algorithms for learning, reasoning, and recognition have been open to the public as open source projects. As a result, technologies and services that utilize them have increased rapidly. It has been confirmed as one of the major reasons for the fast development of AI technologies. Additionally, the spread of the technology is greatly in debt to open source software, developed by major global companies, supporting natural language recognition, speech recognition, and image recognition. Therefore, this study aimed to identify the practical trend of AI technology development by analyzing OSS projects associated with AI, which have been developed by the online collaboration of many parties. This study searched and collected a list of major projects related to AI, which were generated from 2000 to July 2018 on Github. This study confirmed the development trends of major technologies in detail by applying text mining technique targeting topic information, which indicates the characteristics of the collected projects and technical fields. The results of the analysis showed that the number of software development projects by year was less than 100 projects per year until 2013. However, it increased to 229 projects in 2014 and 597 projects in 2015. Particularly, the number of open source projects related to AI increased rapidly in 2016 (2,559 OSS projects). It was confirmed that the number of projects initiated in 2017 was 14,213, which is almost four-folds of the number of total projects generated from 2009 to 2016 (3,555 projects). The number of projects initiated from Jan to Jul 2018 was 8,737. The development trend of AI-related technologies was evaluated by dividing the study period into three phases. The appearance frequency of topics indicate the technology trends of AI-related OSS projects. The results showed that the natural language processing technology has continued to be at the top in all years. It implied that OSS had been developed continuously. Until 2015, Python, C ++, and Java, programming languages, were listed as the top ten frequently appeared topics. However, after 2016, programming languages other than Python disappeared from the top ten topics. Instead of them, platforms supporting the development of AI algorithms, such as TensorFlow and Keras, are showing high appearance frequency. Additionally, reinforcement learning algorithms and convolutional neural networks, which have been used in various fields, were frequently appeared topics. The results of topic network analysis showed that the most important topics of degree centrality were similar to those of appearance frequency. The main difference was that visualization and medical imaging topics were found at the top of the list, although they were not in the top of the list from 2009 to 2012. The results indicated that OSS was developed in the medical field in order to utilize the AI technology. Moreover, although the computer vision was in the top 10 of the appearance frequency list from 2013 to 2015, they were not in the top 10 of the degree centrality. The topics at the top of the degree centrality list were similar to those at the top of the appearance frequency list. It was found that the ranks of the composite neural network and reinforcement learning were changed slightly. The trend of technology development was examined using the appearance frequency of topics and degree centrality. The results showed that machine learning revealed the highest frequency and the highest degree centrality in all years. Moreover, it is noteworthy that, although the deep learning topic showed a low frequency and a low degree centrality between 2009 and 2012, their ranks abruptly increased between 2013 and 2015. It was confirmed that in recent years both technologies had high appearance frequency and degree centrality. TensorFlow first appeared during the phase of 2013-2015, and the appearance frequency and degree centrality of it soared between 2016 and 2018 to be at the top of the lists after deep learning, python. Computer vision and reinforcement learning did not show an abrupt increase or decrease, and they had relatively low appearance frequency and degree centrality compared with the above-mentioned topics. Based on these analysis results, it is possible to identify the fields in which AI technologies are actively developed. The results of this study can be used as a baseline dataset for more empirical analysis on future technology trends that can be converged.

제4차 산업혁명을 이끄는 주요 원동력 중 하나인 인공지능 기술은 이미지와 음성 인식 등 여러 분야에서 사람과 유사하거나 더 뛰어난 능력을 보이며, 사회 전반에 미치게 될 다양한 영향력으로 인하여 높은 주목을 받고 있다. 특히, 인공지능 기술은 의료, 금융, 제조, 서비스, 교육 등 광범위한 분야에서 활용이 가능하기 때문에, 현재의 기술 동향을 파악하고 발전 방향을 분석하기 위한 노력들 또한 활발히 이루어지고 있다. 한편, 이러한 인공지능 기술의 급속한 발전 배경에는 학습, 추론, 인식 등의 복잡한 인공지능 알고리즘을 개발할 수 있는 주요 플랫폼들이 오픈 소스로 공개되면서, 이를 활용한 기술과 서비스들의 개발이 비약적으로 증가하고 있는 것이 주요 요인 중 하나로 확인된다. 또한, 주요 글로벌 기업들이 개발한 자연어 인식, 음성 인식, 이미지 인식 기능 등의 인공지능 소프트웨어들이 오픈 소스 소프트웨어(OSS: Open Sources Software)로 무료로 공개되면서 기술확산에 크게 기여하고 있다. 이에 따라, 본 연구에서는 온라인상에서 다수의 협업을 통하여 개발이 이루어지고 있는 인공지능과 관련된 주요 오픈 소스 소프트웨어 프로젝트들을 분석하여, 인공지능 기술 개발 현황에 대한 보다 실질적인 동향을 파악하고자 한다. 이를 위하여 깃허브(Github) 상에서 2000년부터 2018년 7월까지 생성된 인공지능과 관련된 주요 프로젝트들의 목록을 검색 및 수집하였으며, 수집 된 프로젝트들의 특징과 기술 분야를 의미하는 토픽 정보들을 대상으로 텍스트 마이닝 기법을 적용하여 주요 기술들의 개발 동향을 연도별로 상세하게 확인하였다. 분석 결과, 인공지능과 관련된 오픈 소스 소프트웨어들은 2016년을 기준으로 급격하게 증가하는 추세이며, 토픽들의 관계 분석을 통하여 주요 기술 동향이 '알고리즘', '프로그래밍 언어', '응용분야', '개발 도구'의 범주로 구분하는 것이 가능함을 확인하였다. 이러한 분석 결과를 바탕으로, 향후 다양한 분야에서의 활용을 위해 개발되고 있는 인공지능 관련 기술들을 보다 상세하게 구분하여 확인하는 것이 가능할 것이며, 효과적인 발전 방향 모색과 변화 추이 분석에 활용이 가능할 것이다.

Keywords

JJSHBB_2019_v25n1_1_f0001.png 이미지

Research Procedure

JJSHBB_2019_v25n1_1_f0002.png 이미지

Technique Trend Matrix

JJSHBB_2019_v25n1_1_f0003.png 이미지

Number of projects by year

JJSHBB_2019_v25n1_1_f0004.png 이미지

Technique Trend Matrix of Artificial Intelligence

Review of research related to AI technology trend analysis using patent and literature data

JJSHBB_2019_v25n1_1_t0001.png 이미지

Frequency of Artificial Intelligence Related Topics Top 10

JJSHBB_2019_v25n1_1_t0002.png 이미지

Degree Centrality of Artificial Intelligence Related Topics Top 10

JJSHBB_2019_v25n1_1_t0003.png 이미지

References

  1. Bae, Y. I. and H. R. Shin, "A Study on Convergence Patterns of Artificial Intelligence Technology using Patent Network Analysis," GRI Review, Vol.19, No.1(2017), 113-133.
  2. Bonaccorsi, A. and C. Rossi, "Why Open Source Software Can Succeed," Research policy," Vol.32, No.7(2003), 1243-1258. https://doi.org/10.1016/S0048-7333(03)00051-9
  3. Choi, J. H., H. S. Kim, and N. G. Im, "Keyword Network Analysis for Technology Forecasting," Journal of Intelligence and Information Systems, Vol.17, No.4(2011), 227-240. https://doi.org/10.13088/JIIS.2011.17.4.227
  4. Choi, J. H. and S. H. Jun, "Bayesian Inference for Technology Analysis of Artificial Intelligence," Journal of Korean Institute of Intelligent Systems, Vol.28, No.4(2018), 411-416. https://doi.org/10.5391/JKIIS.2018.28.4.411
  5. Chung, M. S. and J. Y. Lee, "Systemic Analysis of Research Activities and Trends Related to Artificial Intelligence(A.I.) Technology Based on Latent Dirichlet Allocation (LDA) Model," Journal of the Korea Industrial Information Systems Research, Vol.23, No.3 (2018), 87-95. https://doi.org/10.9723/JKSIIS.2018.23.3.087
  6. Chung, M. S., S. H. Park, B. H. Chae, and J. Y. Lee, "Analysis of Major Research Trends in Artificial Intelligence through Analysis of Thesis Data," Journal of Digital Convergence, Von.15, No.5(2017), 225-233. https://doi.org/10.14400/JDC.2017.15.2.225
  7. Chung, M. S., S. H. Jeong, and J. Y. Lee, "Analysis of Major Research Trends in Artificial Intelligence based on Domestic/International Patent Data," Journal of Digital Convergence, Vol.16, No.6(2018), 187-195. https://doi.org/10.14400/JDC.2018.16.6.187
  8. Fujii, H. and S. Managi, "Trends and Priority Shifts in Artificial Intelligence Technology Invention: A Global Patent Analysis," Economic Analysis and Policy, Vol.58(2018), 60-69. https://doi.org/10.1016/j.eap.2017.12.006
  9. Han, M. U., S. H. LEE, W. H. Lee, and M. H. Lee, "A study on the IT R&D Emerging Technology Detection through Information Analysis Method -Focus on Next Generation Computing Field-," proceeding of The Korean Operations Research and Management Science Society, (2009), 1066-1073.
  10. Jun, S. H., "A Big Data Learning for Patent Analysis," Journal of Korean Institute of Intelligent Systems, Vol.23, No.5(2013), 406-411. https://doi.org/10.5391/JKIIS.2013.23.5.406
  11. Kim, D. H., "4th Industrial Revolution, Development of Technology for Open SW Innovation[written in Korean] ," NIPA Issue Report, No.33(2018).
  12. Kim, D. S. and J. W. Kim, "Research Trend Analysis Using Bibliographic Information and Citations of Cloud Computing Articles: Application of Social Network Analysis," Journal of Intelligence and Information Systems, Vol.20, No.1(2014), 195-211. https://doi.org/10.13088/jiis.2014.20.1.195
  13. Kho, J. C., K. T. Cho, and Y. H. Cho, "A Study on Recent Research Trend in Management of Technology Using Keywords Network Analysis," Journal of Intelligence and Information Systems, Vol.19, No.2(2013), 101-123. https://doi.org/10.13088/jiis.2013.19.2.101
  14. Nam, C. H., "Open Source AI - Artificial Intelligence Ecosystem and Open Innovation," KISDI Premium Report, (2016), 4-22.
  15. Niu, J., W. Tang, F. Xu, X. Zhou, and Y. Song, "Global Research on Artificial Intelligence from 1990-2014: Spatially-Explicit Bibliometric Analysis," ISPRS International Journal of Geo-Information, Vol.5, No.5(2016), 66-84. https://doi.org/10.3390/ijgi5050066
  16. Park, J. S., S. G. Hong, and J. W. Kim, "A Study on Science Technology Trend and Prediction Using Topic Modeling," Journal of the Korea Industrial Information Systems Research, Vo.22, No.4(2017), 19-28. https://doi.org/10.9723/JKSIIS.2017.22.4.019
  17. Park, J. Y., "Trend Analysis of Artificial Intelligence Technology Using Patent Information," Journal of the Korea Society of Computer and Information, Vol.23, No.4(2018), 9-16. https://doi.org/10.9708/JKSCI.2018.23.04.009
  18. Rho, S., "Artificial Intelligence Technology R&D Trend by Patent Analysis," The Journal of Digital Contents Society, Vol.18, No.2(2017), 423-428. https://doi.org/10.9728/dcs.2017.18.2.423
  19. Synopsys, "2018 Open Source Security and Risk Analysis", 2019. Available at https://www.synopsys.com/software-integrity/resources/analyst-reports/open-source-security-risk-analysis-2018.html(Downloaded 8 March 2019).
  20. Tseng, Y. H., C. J. Lin, and Y. I. Lin, "Text Mining Techniques for Patent Analysis," Information Processing & Management, Vol.43, No.5(2007), 1216-1247. https://doi.org/10.1016/j.ipm.2006.11.011

Cited by

  1. 패션 영역에서 디지털 전환 관련 연구동향 및 지식구조 vol.19, pp.3, 2019, https://doi.org/10.14400/jdc.2021.19.3.319