• Title/Summary/Keyword: Topic Modeling(LDA)

Search Result 292, Processing Time 0.03 seconds

Comparison of Topic Modeling Methods for Analyzing Research Trends of Archives Management in Korea: focused on LDA and HDP (국내 기록관리학 연구동향 분석을 위한 토픽모델링 기법 비교 - LDA와 HDP를 중심으로 -)

  • Park, JunHyeong;Oh, Hyo-Jung
    • Journal of Korean Library and Information Science Society
    • /
    • v.48 no.4
    • /
    • pp.235-258
    • /
    • 2017
  • The purpose of this study is to analyze research trends of archives management in Korea by comparing LDA (Latent Semantic Allocation) topic modeling, which is the most famous method in text mining, and HDP (Hierarchical Dirichlet Process) topic modeling, which is developed LDA topic modeling. Firstly we collected 1,027 articles related to archives management from 1997 to 2016 in two journals related with archives management and four journals related with library and information science in Korea and performed several preprocessing steps. And then we conducted LDA and HDP topic modelings. For a more in-depth comparison analysis, we utilized LDAvis as a topic modeling visualization tool. At the results, LDA topic modeling was influenced by frequently keywords in all topics, whereas, HDP topic modeling showed specific keywords to easily identify the characteristics of each topic.

Research Topic Analysis of the Domestic Papers Related to COVID-19 Using LDA (LDA를 사용한 COVID-19 관련 국내 논문의 연구 토픽 분석)

  • Kim, Eun-Hoe;Suh, Yu-Hwa
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.15 no.5
    • /
    • pp.423-432
    • /
    • 2022
  • This paper analyzes a total of 10,599 papers related to COVID-19 from January 2020 to July 2022 collected from the KCI site using LDA topic modeling so that academic researchers can understand the overall research trend. The results of LDA topic modeling are analyzed by major research categories so that academic researchers can easily figure out topics in their research fields. Then, the detailed research category information in which a lot of research is done by topic is analyzed. It is very important for academic researchers to understand the trend of research topics over time. Therefore, in this paper, the trend of topics is analyzed and presented using time series decomposition.

Performance Improvement of Topic Modeling using BART based Document Summarization (BART 기반 문서 요약을 통한 토픽 모델링 성능 향상)

  • Eun Su Kim;Hyun Yoo;Kyungyong Chung
    • Journal of Internet Computing and Services
    • /
    • v.25 no.3
    • /
    • pp.27-33
    • /
    • 2024
  • The environment of academic research is continuously changing due to the increase of information, which raises the need for an effective way to analyze and organize large amounts of documents. In this paper, we propose Performance Improvement of Topic Modeling using BART(Bidirectional and Auto-Regressive Transformers) based Document Summarization. The proposed method uses BART-based document summary model to extract the core content and improve topic modeling performance using LDA(Latent Dirichlet Allocation) algorithm. We suggest an approach to improve the performance and efficiency of LDA topic modeling through document summarization and validate it through experiments. The experimental results show that the BART-based model for summarizing article data captures the important information of the original articles with F1-Scores of 0.5819, 0.4384, and 0.5038 in Rouge-1, Rouge-2, and Rouge-L performance evaluations, respectively. In addition, topic modeling using summarized documents performs about 8.08% better than topic modeling using full text in the performance comparison using the Perplexity metric. This contributes to the reduction of data throughput and improvement of efficiency in the topic modeling process.

KOSPI index prediction using topic modeling and LSTM

  • Jin-Hyeon Joo;Geun-Duk Park
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.7
    • /
    • pp.73-80
    • /
    • 2024
  • In this paper, we proposes a method to improve the accuracy of predicting the Korea Composite Stock Price Index (KOSPI) by combining topic modeling and Long Short-Term Memory (LSTM) neural networks. In this paper, we use the Latent Dirichlet Allocation (LDA) technique to extract ten major topics related to interest rate increases and decreases from financial news data. The extracted topics, along with historical KOSPI index data, are input into an LSTM model to predict the KOSPI index. The proposed model has the characteristic of predicting the KOSPI index by combining the time series prediction method by inputting the historical KOSPI index into the LSTM model and the topic modeling method by inputting news data. To verify the performance of the proposed model, this paper designs four models (LSTM_K model, LSTM_KNS model, LDA_K model, LDA_KNS model) based on the types of input data for the LSTM and presents the predictive performance of each model. The comparison of prediction performance results shows that the LSTM model (LDA_K model), which uses financial news topic data and historical KOSPI index data as inputs, recorded the lowest RMSE (Root Mean Square Error), demonstrating the best predictive performance.

Topic Modeling Analysis of Franchise Research Trends Using LDA Algorithm (LDA 알고리즘을 이용한 프랜차이즈 연구 동향에 대한 토픽모델링 분석)

  • YANG, Hoe-Chang
    • The Korean Journal of Franchise Management
    • /
    • v.12 no.4
    • /
    • pp.13-23
    • /
    • 2021
  • Purpose: This study aimed to derive clues for the franchise industry to overcome difficulties such as various legal regulations and social responsibility demands and to continuously develop by analyzing the research trends related to franchises published in Korea. Research design, data and methodology: As a result of searching for 'franchise' in ScienceON, abstracts were collected from papers published in domestic academic journals from 1994 to June 2021. Keywords were extracted from the abstracts of 1,110 valid papers, and after preprocessing, keyword analysis, TF-IDF analysis, and topic modeling using LDA algorithm, along with trend analysis of the top 20 words in TF-IDF by year group was carried out using the R-package. Results: As a result of keyword analysis, it was found that businesses and brands were the subjects of research related to franchises, and interest in service and satisfaction was considerable, and food and coffee were prominently studied as industries. As a result of TF-IDF calculation, it was found that brand, satisfaction, franchisor, and coffee were ranked at the top. As a result of LDA-based topic modeling, a total of 12 topics including "growth strategy" were derived and visualized with LDAvis. On the other hand, the areas of Topic 1 (growth strategy) and Topic 9 (organizational culture), Topic 4 (consumption experience) and Topic 6 (contribution and loyalty), Topic 7 (brand image) and Topic 10 (commercial area) overlap significantly. Finally, the trend analysis results for the top 20 keywords with high TF-IDF showed that 10 keywords such as quality, brand, food, and trust would be more utilized overall. Conclusions: Through the results of this study, the direction of interest in the franchise industry was confirmed, and it was found that it was necessary to find a clue for continuous growth through research in more diverse fields. And it was also considered an important finding to suggest a technique that can supplement the problems of topic trend analysis. Therefore, the results of this study show that researchers will gain significant insights from the perspectives related to the selection of research topics, and practitioners from the perspectives related to future franchise changes.

A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS (LDA, Top2Vec, BERTopic 모형의 토픽모델링 비교 연구 - 국외 문헌정보학 분야를 중심으로 -)

  • Yong-Gu Lee;SeonWook Kim
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.58 no.1
    • /
    • pp.5-30
    • /
    • 2024
  • The purpose of this study is to extract topics from experimental data using the topic modeling methods(LDA, Top2Vec, and BERTopic) and compare the characteristics and differences between these models. The experimental data consist of 55,442 papers published in 85 academic journals in the field of library and information science, which are indexed in the Web of Science(WoS). The experimental process was as follows: The first topic modeling results were obtained using the default parameters for each model, and the second topic modeling results were obtained by setting the same optimal number of topics for each model. In the first stage of topic modeling, LDA, Top2Vec, and BERTopic models generated significantly different numbers of topics(100, 350, and 550, respectively). Top2Vec and BERTopic models seemed to divide the topics approximately three to five times more finely than the LDA model. There were substantial differences among the models in terms of the average and standard deviation of documents per topic. The LDA model assigned many documents to a relatively small number of topics, while the BERTopic model showed the opposite trend. In the second stage of topic modeling, generating the same 25 topics for all models, the Top2Vec model tended to assign more documents on average per topic and showed small deviations between topics, resulting in even distribution of the 25 topics. When comparing the creation of similar topics between models, LDA and Top2Vec models generated 18 similar topics(72%) out of 25. This high percentage suggests that the Top2Vec model is more similar to the LDA model. For a more comprehensive comparison analysis, expert evaluation is necessary to determine whether the documents assigned to each topic in the topic modeling results are thematically accurate.

How the Journal of the Korean Association for Science Education(JKASE) Changed for the Past 44 Years?: Topic Modeling Analysis Using Latent Dirichlet Allocation (한국과학교육학회지는 44년간 어떤 주제로 어떻게 변화했는가? -잠재 디리클레 할당(LDA)을 활용한 토픽모델링 분석-)

  • Chang, Jina;Na, Jiyeon
    • Journal of The Korean Association For Science Education
    • /
    • v.42 no.2
    • /
    • pp.185-200
    • /
    • 2022
  • The purpose of this study is to understand the trends and changes of the articles publishing the Journal of the Korean Association for Science Education(JKASE) in the past forty-four years. To this end, Latent Dirichlet Allocation(LDA) topic modeling analysis was performed on a total of 2,115 English abstracts of papers published in the JKASE from 1978 to 2021. As a result of LDA topic modeling analysis, a total of 23 topics were extracted, and each topic was presented with its related keywords and articles. Next, in order to examine how these topics have changed over time, we visualized the average weights of each topic for a 4-year cycle by using heatmaps. The topics that have risen or fallen were identified. The results of this study provide new insights into science education research in Korea in terms of revealing not only traditional research topics that have been consistently studied but also the topics that have changed in response to the development of educational philosophy or research methods, social or policy demands related to science education.

Evaluation of Topic Modeling Performance for Overseas Construction Market Analysis Using LDA and BERTopic on News Articles (LDA 및 BERTopic 기반 해외건설시장 뉴스 기사 토픽모델링 성능평가)

  • Baik, Joonwoo;Chung, Sehwan;Chi, Seokho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.6
    • /
    • pp.811-819
    • /
    • 2023
  • Understanding the local conditions is a crucial factor in enhancing the success potential of overseas construction projects. This can be achieved through the analysis of news articles of the target market using topic modeling techniques. In this study, the authors aimed to analyze news articles using two topic modeling methods, namely Latent Dirichlet Allocation (LDA) and BERTopic, in order to determine the optimal approach for market condition analysis. To evaluate the alignment between the generated topics and the actual themes of the news documents, the research collected 6,273 BBC news articles, created ground truth data for individual news article topics, and finally compared this ground truth with the results of the topic modeling. The F1 score for LDA was 0.011, while BERTopic achieved a score of 0.244. These results indicate that BERTopic more accurately reflected the actual topics of news articles, making it more effective for understanding the overseas construction market.

A Study on the Trends of Construction Safety Accident in Unstructured Text Using Topic Modeling (비정형 텍스트 기반의 토픽 모델링을 이용한 건설 안전사고 동향 분석)

  • Lee, Sang-Gyu
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.10
    • /
    • pp.176-182
    • /
    • 2018
  • In order to understand and track the trends of construction safety accident, this study shows the topic trends in the construction safety accident with LDA(Latent Dirichlet Allocation)-based topic modeling method for data analytics. Especially, it performs to figure out the main issue of construction safety accident with unstructured data analysis based on the topic modeling rather than a variety of structured data analysis for preventing to safety accident in construction industry. To apply this methodology, I randomly collected to 540 news article data about construction accident from January 2017 to February 2018. Based on the unstructured data with the LDA-based topic modeling, I found the 10 topics and identified key issues through 10 keyword in each 10 topics. I forecasted the topic issue related to construction safety accident based on analysis of time-series trends about the news data from January 2017 to February 2018. With this method, this research gives a hint about ways of using unstructured news article data to anticipate safety policy and research field and to respond to construction accident safety issues in the future.

Active Senior Contents Trend Analysis using LDA Topic Modeling (LDA 토픽 모델링을 이용한 액티브 시니어 콘텐츠 트렌드 분석)

  • Lee, Dongwoo;Kim, Yoosin;Shin, Eunjung
    • Journal of Internet Computing and Services
    • /
    • v.22 no.5
    • /
    • pp.35-45
    • /
    • 2021
  • The purpose of this study is to understand the characteristics and trends of active senior. As the baby boom generation become the age of the elderly, they are more active than senior. These seniors are called active seniors, a new consumer group. Many countries and companies are also interested in providing relevant policies and services, but there is lack of researches on active senior trends. This study collects the 8,740 posts related to active seniors on social media from January 1st, 2018 to June 31st, 2021, and conducted keyword frequency analysis, TF-IDF analysis and LDA topic modeling. Through LDA topic modeling, topics are classified into 10 categories: lifestyle, benefits, shopping, government business, government education, health, society and economy, care industry, silver housing, leisure. The results of this study can be utilized as fundamental data to help understand the academic and industrial aspects of active senior.