Topic Modeling on Patent and Article Big Data Using BERTopic and Analyzing Technological Trends of AI Semiconductor Industry

BERTopic을 활용한 텍스트마이닝 기반 인공지능 반도체 기술 및 연구동향 분석

  • Received : 2024.01.08
  • Accepted : 2024.02.28
  • Published : 2024.02.29


The Fourth Industrial Revolution has spurred widespread adoption of AI-based services, driving global interest in AI semiconductors for efficient large-scale computation. Text mining research, historically using LDA, has evolved with machine learning integration, exemplified by the 2021 BERTopic technology. This study employs BERTopic to analyze AI semiconductor-related patents and research data, generating 48 topics from 2,256 patents and 40 topics from 1,112 publications. While providing valuable insights into technology trends, the study acknowledges limitations in taking a macro approach to the entire AI semiconductor industry. Future research may explore specific technologies for more nuanced insights as the industry matures.



  1. Angelov, D., "Top2vec: Distributed representations of topics", arXiv preprint arXiv:2008.09470, Aug 19, 2020.
  2. Atzeni, D., Bacciu, D., Mazzei, D., and Prencipe, G., "A Systematic Review of Wi-Fi and Machine Learning Integration with Topic Modeling Techniques", Sensors, Vol. 22, No. 13, 2022, p. 4925.
  3. Batra, G., Jacobson, Z., Madhav, S., Queirolo, A., and Santhanam, N., "Artificial-intelligence hardware: New opportunities for semiconductor companies", McKinsey and Company, January 2, 2019.
  4. Bergstrom, C. T., West, J. D., and Wiseman, M. A., "The eigenfactor™ metrics", Journal of Neuroscience, 2019, Vol. 28, No. 45, pp. 11433-11434.
  5. Blei, D. M., Ng, A. Y., and Jordan, M. I., "Latent dirichlet allocation", Journal of Machine Learning Research, Vol. 3, Issue, Mar, 2003, pp. 993-1022.
  6. Chaudhuri, A., Talukdar, J. and Chakrabarty, K., "Special Session: Fault Criticality Assessment in AI Accelerators", 2022 IEEE 40th VLSI Test Symposium (VTS), San Diego, CA, USA, Apr, 2022, pp. 1-4.
  7. Choi, S. and Yeon, S., "ETRI AI Strategy# 2: Strengthening Competencies in AI Semiconductor & Computing Technologies", Electronics and Telecommunications Trends, Vol. 35, No. 7, 2020, pp. 13-22.
  8. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R., "Indexing by latent semantic analysis", Journal of the American Society for Information Science, Vol. 41, No. 6, 1990, pp. 391-407.
  9. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K., "Bert: Pre-training of deep bidirectional transformers for language understanding", arXiv preprint arXiv:1810.04805, 2018.
  10. Drucker, P. F., "Managing in a time of great change", Oxford: Butterworth Heinemann, 1995,
  11. Ebeling, R., Saenz, C. A. C., Nobre, J., and Becker, K., "The effect of political polarization on social distance stances in the brazilian covid-19 scenario", Journal of Information and Data Management, 2021, Vol. 12, No. 1.
  12. Garfield, E., "The impact factor", Current Contents, Vol. 25, No. 20, 1994, pp. 3-7.
  13. Grzeszczyk, T. A. and Grzeszczyk, M. K., "Improving the discovery of technological opportunities using patent classification based on explainable neural networks", European Research Studies Journal, Vol. 24, No. 3, 2021, pp. 402-409.
  14. Gu, J., Lee, J., Chung, M., and Lee, J., "Electric Vehicle Technology Trends Forecast Research Using the Paper and Patent Data", Journal of Digital Convergence, Vol. 15, No. 2, 2017, pp. 165-172.
  15. Hahm, Y. and Lee, S., "The distinctiveness of big data business model in its components: A comparative analysis of Korea-US cosmetic big data business cases", Journal of Information Technology and Architecture, 2016, Vol. 13, No. 1, pp. 63-75.
  16. Hearst, M., "What is text mining?" SIMS, UC Berkeley, Oct, 17, 2003.
  17. Hofmann, T., "Probabilistic latent semantic indexing", Paper presented at the Proceedings of the 22nd annual international ACM SIGIR Conference on Research and Development in Information Retrieval, 1995.
  18. Hung, M., Park, S., Chae, B., and Lee, J., "Analysis of Major Research Trends in Artificial Intelligence through Analysis of Thesis Data", Journal of Digital Convergence, Vol. 15, No. 5, May 28, 2017, pp. 225-233.
  19. Ibtesam, M., Solangi, U., Kim, J., Ansari M., and Park, S., "Highly Efficient Test Architecture for Low-Power AI Accelerators", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 41, No. 8, Aug, 2022, pp. 2728-2738
  20. Jung, M., Park, S., Chae, B., Lee, J., "Analysis of major research trends in artificial intelligence through analysis of thesis data", Journal of Digital Convergence, Vol. 15, No. 5, 2017, pp. 255-233.
  21. Kang, B., Song, M., and Cho, H., "A Study on Opinion Mining of Newspaper Texts based on Topic Modeling", JAMS, Vol. 47, No. 4, Nov, 2013, pp. 315-334.
  22. Kherwa, P. and Bansal, P., "Topic Modeling: A Comprehensive Review", EAI Endorsed Transactions on Scalable Information Systems, Vol. 7, No, 24, 2020, pp. 1-16.
  23. Kim, S., "Topic Model Augmentation and Extension Method using LDA and BERTopic", KOSIM, Vol. 39, No. 3, 2022, pp. 99-132.
  24. Kim, S., Cho, H., and Kang, J., "The Status of Using Text Mining in Academic Research and Analysis Methods", The Journal of Information Technology and Architecture, Vol.13, No. 2, 2016, pp. 317-329.
  25. Kim, T., Choi, H., and Lee, H., "A Study on the Research Trends in Fintech using Topic Modeling", Journal of the Korea Academia-Industrial cooperation Society, Vol. 17, No. 11, Nov. 30, 2016, pp. 670-681.
  26. Kim, Y., "A Study on the Analysis of AI Semiconductor Industry Trends and Implicaiotns", Proceedings of Symposium of the Korean Institute of communications and Information Sciences, Jun, 2018, pp. 45-1104.
  27. Ko, Y., Lee, S., Cha, M., Kim, S., Lee, J., Han, J., and Song, M., "Topic Modeling Insomnia Social Media Corpus using BERTopic and Building Automatic Deep Learning Classification Model", JKOSIM, Vol. 39, No. 2, 2022, pp. 111-129.
  28. Kwon, Y., "AI Processor Technology Trends", Electronics and telecommunications trends, Vol. 33, No. 5, Oct 1, 2018, pp. 121-134.
  29. Kwon, Y., "Understanding of structural changes of keyword networks in the computer engineering field", KIPS Transactions on Software and Data Engineering, Vol. 2, No. 3, 2013, pp. 187-194.
  30. Lee, J. and Oh, C., "A study on the technological development path of the AI semiconductor industry and the catch-up chance for latecomers: Focusing on technical patent analysis as the view of the technological life cycle", Innovation studies, Vol. 17, No. 3, 2022, pp. 113-133.
  31. Lee, J., "Research on Trend of Solar-Photovoltaic(PV) Technology Using Papers and Patents Data: Using LDA Algorithm", Graduate School of Information, Yonsei University, 2017, Seoul.
  32. Lee, J.-H., Lee, I.-S., Jung, K.-S., Chae, B.-H., and Lee, J.-Y., "Patents and papers trends of solar-photovoltaic (PV) Technology using LDA algorithm", Journal of Digital Convergence, Vol. 15, No. 9, 2017, pp. 231-239.
  33. Lee, M., Chung, J., Lee, J., Han, J., and Kwon, Y., "Trends in AI processor technology", Electronics and Telecommunications Trends, Vol. 35, No. 3, 2020, pp. 66-75.
  34. Lee, S., Song, J., and Kim, Y., " An empirical comparison of four text mining methods", Journal of Computer Information Systems, Vol. 51, No. 1, 2010, pp. 1-10.
  35. Lee, W., "Analyzing technological structure and trends of artificial intelligence: Using patent and open source project data", 2021.
  36. Lee, Y., Nambiar, V., Goh, K. and Tuan Do, A., "Post-Silicon Validation Methodology for Resource-Constrained Neuromorphic Hardware", IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, Oct, 2020, pp. 3836-3840.
  37. Liu, L., Tang, L., Dong, W., Yao, S., and Zhou, W., "An overview of topic modeling and its current applications in bioinformatics", SpringerPlus, Vol. 5, No. 1, 2016, pp. 1-22.
  38. Mikolov, T., Chen, K., Corrado, G., and Dean, J., "Efficient estimation of word representations in vector space", Proceedings of Workshop at ICLR, 2013.
  39. Mimno, D., Li, W., and McCallum, A., "Mixtures of hierarchical topics with pachinko allocation", Paper presented at the Proceedings of the 24th International Conference on Machine Learning, 2017.
  40. Nam, S. and Lee, H., "Airline Passenger Characterizations Using LDA Topic Modeling", Korean management science review, Vol 36, No. 3, Sep, 2019, pp. 67-85.
  41. Noh, S., "Analysis of Issues Related to Artificial Intelligence Based on Topic Modeling", Journal of Digital Convergence, Vol. 18, No. 5, May 28, 2020, pp. 75-87.
  42. Park, D., "An analysis of user needs by user's review, based on text-mining : focusing on Bank-Salad and Mint", Graduate School of Information, Yonsei University, 2022, Seoul.
  43. Park, J., "Analyzing AI semiconductor trends and global business activities using patent data", Graduate School of Information, Yonsei University, 2020, Seoul.
  44. Park, J., Hong, S., and Kim, J., "A Study on Science Technology Trend and Prediction Using Topic Modeling", Journal of the Korea Industrial Information Systems Research, Vol. 22, No. 4, August 31, 2017, pp. 19-28.
  45. Pessoa, A., "Innovation and Economic Growth: What is the actual importance of R&D?", Universidade do Porto, Faculdade de Economia do Porto, FEP Working Papers, No, 254, Nov, 2007.
  46. Rani, S. and Kumar, M., "Topic modeling and its applications in materials science and engineering", Materials Today: Proceedings, Vol 45, 2021, pp. 5591-5596.
  47. Saidi, F., Trabelsi, Z., and Thangaraj, E., "A novel framework for semantic classification of cyber terrorist communities on Twitter", Engineering Applications of Artificial Intelligence, Vol. 115, 2022, pp. 105271.
  48. Shefer, D. and Frenkel, A., "R&D, firm size and innovation: An empirical analysis", Technovation, Vol. 25, No, 1, 2022, pp, 25-32.
  49. Silveira, R., Fernandes, C., Neto, J. A. M., Furtado, V., and Pimentel Filho, J. E., "Topic modelling of legal documents via legal-bert", SSRN Electronic Journal, 2021.
  50. So, J. and Shin, P., "Rating Prediction by Evaluation Item through Sentiment Analysis of Restaurant Review", Journal of the Korea Society of Computer and Information, Vol. 25, No. 6, June 30, 2020, pp. 81-89.
  51. Tekic, Z., Kukolj, D., Drazic, M., and Vitas, M., "Towards Understanding The Role and Value of Patents in a Knowledge-Based Economy", DAAAM International Scientific Book, 2003.
  52. Thielmann, A. F., Weisser, C., Kneib, T., and Saefken, B., "Coherence-Based Document Clustering", 2023 IEEE 17th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, 2023, pp. 9-16.
  53. Vayansky, I. and Kumar, S. A. P., "A review of topic modeling methods", Information systems, Oxford, Vol. 94, 2021, p. 101582.
  54. Wang, J. and Hsu, C., "A topic-based patent analytics approach for exploring technological trends in smart manufacturing", Journal of Manufacturing Technology Management, Vol. 193, No. 1, 2020, pp. 110-135
  55. Wang, Y., Shi, Z., Guo, X., Liu, X., Zhu, E., and Yin, J., "Deep embedding for determining the number of clusters", Paper presented at the Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
  56. Werzinsky, J., Zhong, Z., and Zou, X., "Analyzing Folktales of Different Regions Using Topic Modeling and Clustering", arXiv.2206.04221, 2022.