• Title/Summary/Keyword: Bertopic

Search Result 27, Processing Time 0.021 seconds

Evaluation of Topic Modeling Performance for Overseas Construction Market Analysis Using LDA and BERTopic on News Articles (LDA 및 BERTopic 기반 해외건설시장 뉴스 기사 토픽모델링 성능평가)

  • Baik, Joonwoo;Chung, Sehwan;Chi, Seokho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.6
    • /
    • pp.811-819
    • /
    • 2023
  • Understanding the local conditions is a crucial factor in enhancing the success potential of overseas construction projects. This can be achieved through the analysis of news articles of the target market using topic modeling techniques. In this study, the authors aimed to analyze news articles using two topic modeling methods, namely Latent Dirichlet Allocation (LDA) and BERTopic, in order to determine the optimal approach for market condition analysis. To evaluate the alignment between the generated topics and the actual themes of the news documents, the research collected 6,273 BBC news articles, created ground truth data for individual news article topics, and finally compared this ground truth with the results of the topic modeling. The F1 score for LDA was 0.011, while BERTopic achieved a score of 0.244. These results indicate that BERTopic more accurately reflected the actual topics of news articles, making it more effective for understanding the overseas construction market.

Topic Modeling on Patent and Article Big Data Using BERTopic and Analyzing Technological Trends of AI Semiconductor Industry (BERTopic을 활용한 텍스트마이닝 기반 인공지능 반도체 기술 및 연구동향 분석)

  • Hyeonkyeong Kim;Junghoon Lee;Sunku Kang
    • Journal of Information Technology Applications and Management
    • /
    • v.31 no.1
    • /
    • pp.139-161
    • /
    • 2024
  • The Fourth Industrial Revolution has spurred widespread adoption of AI-based services, driving global interest in AI semiconductors for efficient large-scale computation. Text mining research, historically using LDA, has evolved with machine learning integration, exemplified by the 2021 BERTopic technology. This study employs BERTopic to analyze AI semiconductor-related patents and research data, generating 48 topics from 2,256 patents and 40 topics from 1,112 publications. While providing valuable insights into technology trends, the study acknowledges limitations in taking a macro approach to the entire AI semiconductor industry. Future research may explore specific technologies for more nuanced insights as the industry matures.

Topic Model Augmentation and Extension Method using LDA and BERTopic (LDA와 BERTopic을 이용한 토픽모델링의 증강과 확장 기법 연구)

  • Kim, SeonWook;Yang, Kiduk
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.3
    • /
    • pp.99-132
    • /
    • 2022
  • The purpose of this study is to propose AET (Augmented and Extended Topics), a novel method of synthesizing both LDA and BERTopic results, and to analyze the recently published LIS articles as an experimental approach. To achieve the purpose of this study, 55,442 abstracts from 85 LIS journals within the WoS database, which spans from January 2001 to October 2021, were analyzed. AET first constructs a WORD2VEC-based cosine similarity matrix between LDA and BERTopic results, extracts AT (Augmented Topics) by repeating the matrix reordering and segmentation procedures as long as their semantic relations are still valid, and finally determines ET (Extended Topics) by removing any LDA related residual subtopics from the matrix and ordering the rest of them by F1 (BERTopic topic size rank, Inverse cosine similarity rank). AET, by comparing with the baseline LDA result, shows that AT has effectively concretized the original LDA topic model and ET has discovered new meaningful topics that LDA didn't. When it comes to the qualitative performance evaluation, AT performs better than LDA while ET shows similar performances except in a few cases.

Recent Research Trend Analysis for the Journal of Society of Korea Industrial and Systems Engineering Using Topic Modeling (토픽모델링을 활용한 한국산업경영시스템학회지의 최근 연구주제 분석)

  • Dong Joon Park;Pyung Hoi Koo;Hyung Sool Oh;Min Yoon
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.3
    • /
    • pp.170-185
    • /
    • 2023
  • The advent of big data has brought about the need for analytics. Natural language processing (NLP), a field of big data, has received a lot of attention. Topic modeling among NLP is widely applied to identify key topics in various academic journals. The Korean Society of Industrial and Systems Engineering (KSIE) has published academic journals since 1978. To enhance its status, it is imperative to recognize the diversity of research domains. We have already discovered eight major research topics for papers published by KSIE from 1978 to 1999. As a follow-up study, we aim to identify major topics of research papers published in KSIE from 2000 to 2022. We performed topic modeling on 1,742 research papers during this period by using LDA and BERTopic which has recently attracted attention. BERTopic outperformed LDA by providing a set of coherent topic keywords that can effectively distinguish 36 topics found out this study. In terms of visualization techniques, pyLDAvis presented better two-dimensional scatter plots for the intertopic distance map than BERTopic. However, BERTopic provided much more diverse visualization methods to explore the relevance of 36 topics. BERTopic was also able to classify hot and cold topics by presenting 'topic over time' graphs that can identify topic trends over time.

A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS (LDA, Top2Vec, BERTopic 모형의 토픽모델링 비교 연구 - 국외 문헌정보학 분야를 중심으로 -)

  • Yong-Gu Lee;SeonWook Kim
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.58 no.1
    • /
    • pp.5-30
    • /
    • 2024
  • The purpose of this study is to extract topics from experimental data using the topic modeling methods(LDA, Top2Vec, and BERTopic) and compare the characteristics and differences between these models. The experimental data consist of 55,442 papers published in 85 academic journals in the field of library and information science, which are indexed in the Web of Science(WoS). The experimental process was as follows: The first topic modeling results were obtained using the default parameters for each model, and the second topic modeling results were obtained by setting the same optimal number of topics for each model. In the first stage of topic modeling, LDA, Top2Vec, and BERTopic models generated significantly different numbers of topics(100, 350, and 550, respectively). Top2Vec and BERTopic models seemed to divide the topics approximately three to five times more finely than the LDA model. There were substantial differences among the models in terms of the average and standard deviation of documents per topic. The LDA model assigned many documents to a relatively small number of topics, while the BERTopic model showed the opposite trend. In the second stage of topic modeling, generating the same 25 topics for all models, the Top2Vec model tended to assign more documents on average per topic and showed small deviations between topics, resulting in even distribution of the 25 topics. When comparing the creation of similar topics between models, LDA and Top2Vec models generated 18 similar topics(72%) out of 25. This high percentage suggests that the Top2Vec model is more similar to the LDA model. For a more comprehensive comparison analysis, expert evaluation is necessary to determine whether the documents assigned to each topic in the topic modeling results are thematically accurate.

Research Trend Analysis on Customer Satisfaction in Service Field Using BERTopic and LDA

  • YANG, Woo-Ryeong;YANG, Hoe-Chang
    • The Journal of Economics, Marketing and Management
    • /
    • v.10 no.6
    • /
    • pp.27-37
    • /
    • 2022
  • Purpose: The purpose of this study is to derive various ways to realize customer satisfaction for the development of the service industry by exploring research trends related to customer satisfaction, which is presented as an important goal in the service industry. Research design, data and methodology: To this end, 1,456 papers with English abstracts using scienceON were used for analysis. Using Python 3.7, word frequency and co-occurrence analysis were confirmed, and topics related to research trends were classified through BERTopic and LDA. Results: As a result of word frequency and co-occurrence frequency analysis, words such as quality, intention, and loyalty appeared frequently. As a result of BERTopic and LDA, 11 topics such as 'catering service' and 'brand justice' were derived. As a result of trend analysis, it was confirmed that 'brand justice' and 'internet shopping' are emerging as relatively important research topics, but CRM is less interested. Conclusions: The results of this study showed that the 7P marketing strategy is working to some extent. Therefore, it is proposed to conduct research related to acquisition of good customers through service price, customer lifetime value application, and customer segmentation that are expected to be needed for the development of the service industry.

A Study on Leadership Trends from the Perspective of Domestic Researcher's Using BERTopic and LDA

  • Sung-Su, SHIN;Hoe-Chang, Yang
    • East Asian Journal of Business Economics (EAJBE)
    • /
    • v.11 no.1
    • /
    • pp.53-71
    • /
    • 2023
  • Purpose - This study aims to find clues necessary for the direction of leadership development suitable for the current situation by exploring the direction in which leadership has been studied from the perspective of domestic researchers, along with the arrangement of leadership theories studied in various ways. Research design, data, and methodology - A total of 7,425 papers were obtained due to the search, and 5,810 papers with English abstracts were used for analysis. For analysis, word frequency analysis, word clouding, and co-occurrence were confirmed using Python 3.7. In addition, after classifying topics related to research trends through BERTopic and LDA, trends were identified through dynamic topic modeling and OLS regression analysis. Result - As a result of the BERTopic, 14 topics such as 'Leadership management and performance' and 'Sports leadership' were derived. As a result of conducting LDA on 1,976 outliers, five topics were derived. As a result of trend analysis on topics by year, it was confirmed that five topics, such as 'military police leadership' received relative attention. Conclusion - Through the results of this study, a study on the reinterpretation of past leadership studies, a study on LMX with an expanded perspective, and a study on integrated leadership sub-factors of modern leadership theory were proposed.

Online Shopping Research Trend Analysis Using BERTopic and LDA

  • Yoon-Hwang, JU;Woo-Ryeong, YANG;Hoe-Chang, YANG
    • The Journal of Economics, Marketing and Management
    • /
    • v.11 no.1
    • /
    • pp.21-30
    • /
    • 2023
  • Purpose: As one of the ongoing studies on the distribution industry, the purpose of this study is to identify the research trends on online shopping so far to propose not only the development of online shopping companies but also the possibility of coexistence between online and offline retailers and the development of the distribution industry. Research design, data and methodology: In this study, the English abstracts of 645 papers on online shopping registered in scienceON were obtained. For the analysis through BERTopic and LDA using Python 3.7 and identifying which topics were interesting to researchers. Results: As a result of word frequency analysis and co-occurrence analysis, it was found that studies related to online shopping were frequently conducted on factors such as products, services, and shopping malls. As a result of BERTopic, five topics such as 'service quality' and 'sales strategy' were derived, and as a result of LDA, three topics including 'purchase experience' were derived. It was confirmed that 'Customer Recommendation' and 'Fashion Mall' showed relatively high interest, and 'Sales Strategy' showed relatively low interest. Conclusions: It was suggested that more diverse studies related to the online shopping mall platform, sales content, and usage influencing factors are needed to develop the online shopping industry.

Improvement of topic modeling and case analysis through convergence of Bertopic and TextRank (버토픽과 텍스트랭크의 융합을 통한 토픽모델링의 개선 및 사례 분석)

  • Kim, Keun Hyung;Kang Jae Jung
    • The Journal of Information Systems
    • /
    • v.33 no.3
    • /
    • pp.105-121
    • /
    • 2024
  • Purpose The purpose of this paper is to develop a method to improve topic representation by incorporating the TextRank technique in Bertopic-based topic modeling and additional indicators for determining the optimal number of topics. Design/methodology/approach In this paper, we propose a method to extract important documents from documents assigned to each topic of a topic model using the TextRank technique, and to calculate secondary diversity and generate topic representations based on the results. First, we integrate the TextRank algorithm into the Bertopic-based topic modeling process to set local secondary labels for each topic. The secondary labels of each topic are derived through extractive summarization based on the TextRank algorithm. Second, we improve the accuracy of selecting the optimal number of topics by calculating the secondary diversity index based on the extractive summary results of each topic. Third, we improve the efficiency by utilizing ChatGPT when deriving the labels of each topic. Findings As a result of performing case analysis and analysis evaluation using the proposed method, it was confirmed that topic representation based on TextRank results generated more accurate topic labels and that the secondary diversity index was a more effective index for determining the optimal number of topics.

A Study on Human-Robot Interaction Trends Using BERTopic (BERTopic을 활용한 인간-로봇 상호작용 동향 연구)

  • Jeonghun Kim;Kee-Young Kwahk
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.185-209
    • /
    • 2023
  • With the advent of the 4th industrial revolution, various technologies have received much attention. Technologies related to the 4th industry include the Internet of Things (IoT), big data, artificial intelligence, virtual reality (VR), 3D printers, and robotics, and these technologies are often converged. In particular, the robotics field is combined with technologies such as big data, artificial intelligence, VR, and digital twins. Accordingly, much research using robotics is being conducted, which is applied to distribution, airports, hotels, restaurants, and transportation fields. In the given situation, research on human-robot interaction is attracting attention, but it has not yet reached the level of user satisfaction. However, research on robots capable of perfect communication is steadily being conducted, and it is expected that it will be able to replace human emotional labor. Therefore, it is necessary to discuss whether the current human-robot interaction technology can be applied to business. To this end, this study first examines the trend of human-robot interaction technology. Second, we compare LDA (Latent Dirichlet Allocation) topic modeling and BERTopic topic modeling methods. As a result, we found that the concept of human-robot interaction and basic interaction was discussed in the studies from 1992 to 2002. From 2003 to 2012, many studies on social expression were conducted, and studies related to judgment such as face detection and recognition were conducted. In the studies from 2013 to 2022, service topics such as elderly nursing, education, and autism treatment appeared, and research on social expression continued. However, it seems that it has not yet reached the level that can be applied to business. As a result of comparing LDA (Latent Dirichlet Allocation) topic modeling and the BERTopic topic modeling method, it was confirmed that BERTopic is a superior method to LDA.