• Title/Summary/Keyword: LDA Topic Analysis

Search Result 295, Processing Time 0.032 seconds

Comparison of Topic Modeling Methods for Analyzing Research Trends of Archives Management in Korea: focused on LDA and HDP (국내 기록관리학 연구동향 분석을 위한 토픽모델링 기법 비교 - LDA와 HDP를 중심으로 -)

  • Park, JunHyeong;Oh, Hyo-Jung
    • Journal of Korean Library and Information Science Society
    • /
    • v.48 no.4
    • /
    • pp.235-258
    • /
    • 2017
  • The purpose of this study is to analyze research trends of archives management in Korea by comparing LDA (Latent Semantic Allocation) topic modeling, which is the most famous method in text mining, and HDP (Hierarchical Dirichlet Process) topic modeling, which is developed LDA topic modeling. Firstly we collected 1,027 articles related to archives management from 1997 to 2016 in two journals related with archives management and four journals related with library and information science in Korea and performed several preprocessing steps. And then we conducted LDA and HDP topic modelings. For a more in-depth comparison analysis, we utilized LDAvis as a topic modeling visualization tool. At the results, LDA topic modeling was influenced by frequently keywords in all topics, whereas, HDP topic modeling showed specific keywords to easily identify the characteristics of each topic.

Topic Modeling Analysis of Franchise Research Trends Using LDA Algorithm (LDA 알고리즘을 이용한 프랜차이즈 연구 동향에 대한 토픽모델링 분석)

  • YANG, Hoe-Chang
    • The Korean Journal of Franchise Management
    • /
    • v.12 no.4
    • /
    • pp.13-23
    • /
    • 2021
  • Purpose: This study aimed to derive clues for the franchise industry to overcome difficulties such as various legal regulations and social responsibility demands and to continuously develop by analyzing the research trends related to franchises published in Korea. Research design, data and methodology: As a result of searching for 'franchise' in ScienceON, abstracts were collected from papers published in domestic academic journals from 1994 to June 2021. Keywords were extracted from the abstracts of 1,110 valid papers, and after preprocessing, keyword analysis, TF-IDF analysis, and topic modeling using LDA algorithm, along with trend analysis of the top 20 words in TF-IDF by year group was carried out using the R-package. Results: As a result of keyword analysis, it was found that businesses and brands were the subjects of research related to franchises, and interest in service and satisfaction was considerable, and food and coffee were prominently studied as industries. As a result of TF-IDF calculation, it was found that brand, satisfaction, franchisor, and coffee were ranked at the top. As a result of LDA-based topic modeling, a total of 12 topics including "growth strategy" were derived and visualized with LDAvis. On the other hand, the areas of Topic 1 (growth strategy) and Topic 9 (organizational culture), Topic 4 (consumption experience) and Topic 6 (contribution and loyalty), Topic 7 (brand image) and Topic 10 (commercial area) overlap significantly. Finally, the trend analysis results for the top 20 keywords with high TF-IDF showed that 10 keywords such as quality, brand, food, and trust would be more utilized overall. Conclusions: Through the results of this study, the direction of interest in the franchise industry was confirmed, and it was found that it was necessary to find a clue for continuous growth through research in more diverse fields. And it was also considered an important finding to suggest a technique that can supplement the problems of topic trend analysis. Therefore, the results of this study show that researchers will gain significant insights from the perspectives related to the selection of research topics, and practitioners from the perspectives related to future franchise changes.

Topic Extraction and Classification Method Based on Comment Sets

  • Tan, Xiaodong
    • Journal of Information Processing Systems
    • /
    • v.16 no.2
    • /
    • pp.329-342
    • /
    • 2020
  • In recent years, emotional text classification is one of the essential research contents in the field of natural language processing. It has been widely used in the sentiment analysis of commodities like hotels, and other commentary corpus. This paper proposes an improved W-LDA (weighted latent Dirichlet allocation) topic model to improve the shortcomings of traditional LDA topic models. In the process of the topic of word sampling and its word distribution expectation calculation of the Gibbs of the W-LDA topic model. An average weighted value is adopted to avoid topic-related words from being submerged by high-frequency words, to improve the distinction of the topic. It further integrates the highest classification of the algorithm of support vector machine based on the extracted high-quality document-topic distribution and topic-word vectors. Finally, an efficient integration method is constructed for the analysis and extraction of emotional words, topic distribution calculations, and sentiment classification. Through tests on real teaching evaluation data and test set of public comment set, the results show that the method proposed in the paper has distinct advantages compared with other two typical algorithms in terms of subject differentiation, classification precision, and F1-measure.

Research Topic Analysis of the Domestic Papers Related to COVID-19 Using LDA (LDA를 사용한 COVID-19 관련 국내 논문의 연구 토픽 분석)

  • Kim, Eun-Hoe;Suh, Yu-Hwa
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.15 no.5
    • /
    • pp.423-432
    • /
    • 2022
  • This paper analyzes a total of 10,599 papers related to COVID-19 from January 2020 to July 2022 collected from the KCI site using LDA topic modeling so that academic researchers can understand the overall research trend. The results of LDA topic modeling are analyzed by major research categories so that academic researchers can easily figure out topics in their research fields. Then, the detailed research category information in which a lot of research is done by topic is analyzed. It is very important for academic researchers to understand the trend of research topics over time. Therefore, in this paper, the trend of topics is analyzed and presented using time series decomposition.

Active Senior Contents Trend Analysis using LDA Topic Modeling (LDA 토픽 모델링을 이용한 액티브 시니어 콘텐츠 트렌드 분석)

  • Lee, Dongwoo;Kim, Yoosin;Shin, Eunjung
    • Journal of Internet Computing and Services
    • /
    • v.22 no.5
    • /
    • pp.35-45
    • /
    • 2021
  • The purpose of this study is to understand the characteristics and trends of active senior. As the baby boom generation become the age of the elderly, they are more active than senior. These seniors are called active seniors, a new consumer group. Many countries and companies are also interested in providing relevant policies and services, but there is lack of researches on active senior trends. This study collects the 8,740 posts related to active seniors on social media from January 1st, 2018 to June 31st, 2021, and conducted keyword frequency analysis, TF-IDF analysis and LDA topic modeling. Through LDA topic modeling, topics are classified into 10 categories: lifestyle, benefits, shopping, government business, government education, health, society and economy, care industry, silver housing, leisure. The results of this study can be utilized as fundamental data to help understand the academic and industrial aspects of active senior.

How the Journal of the Korean Association for Science Education(JKASE) Changed for the Past 44 Years?: Topic Modeling Analysis Using Latent Dirichlet Allocation (한국과학교육학회지는 44년간 어떤 주제로 어떻게 변화했는가? -잠재 디리클레 할당(LDA)을 활용한 토픽모델링 분석-)

  • Chang, Jina;Na, Jiyeon
    • Journal of The Korean Association For Science Education
    • /
    • v.42 no.2
    • /
    • pp.185-200
    • /
    • 2022
  • The purpose of this study is to understand the trends and changes of the articles publishing the Journal of the Korean Association for Science Education(JKASE) in the past forty-four years. To this end, Latent Dirichlet Allocation(LDA) topic modeling analysis was performed on a total of 2,115 English abstracts of papers published in the JKASE from 1978 to 2021. As a result of LDA topic modeling analysis, a total of 23 topics were extracted, and each topic was presented with its related keywords and articles. Next, in order to examine how these topics have changed over time, we visualized the average weights of each topic for a 4-year cycle by using heatmaps. The topics that have risen or fallen were identified. The results of this study provide new insights into science education research in Korea in terms of revealing not only traditional research topics that have been consistently studied but also the topics that have changed in response to the development of educational philosophy or research methods, social or policy demands related to science education.

Multi-Topic Sentiment Analysis using LDA for Online Review (LDA를 이용한 온라인 리뷰의 다중 토픽별 감성분석 - TripAdvisor 사례를 중심으로 -)

  • Hong, Tae-Ho;Niu, Hanying;Ren, Gang;Park, Ji-Young
    • The Journal of Information Systems
    • /
    • v.27 no.1
    • /
    • pp.89-110
    • /
    • 2018
  • Purpose There is much information in customer reviews, but finding key information in many texts is not easy. Business decision makers need a model to solve this problem. In this study we propose a multi-topic sentiment analysis approach using Latent Dirichlet Allocation (LDA) for user-generated contents (UGC). Design/methodology/approach In this paper, we collected a total of 104,039 hotel reviews in seven of the world's top tourist destinations from TripAdvisor (www.tripadvisor.com) and extracted 30 topics related to the hotel from all customer reviews using the LDA model. Six major dimensions (value, cleanliness, rooms, service, location, and sleep quality) were selected from the 30 extracted topics. To analyze data, we employed R language. Findings This study contributes to propose a lexicon-based sentiment analysis approach for the keywords-embedded sentences related to the six dimensions within a review. The performance of the proposed model was evaluated by comparing the sentiment analysis results of each topic with the real attribute ratings provided by the platform. The results show its outperformance, with a high ratio of accuracy and recall. Through our proposed model, it is expected to analyze the customers' sentiments over different topics for those reviews with an absence of the detailed attribute ratings.

A Study on Identifying Topics and Trends in International Cadastral Research Using LDA: With Special Reference to the FIG Peer Review Journal (LDA를 이용한 국제지적연구의 주제와 추세확인에 관한 연구: 특히 FIG Peer Review Journal을 중심으로)

  • kim, Yun-Ki
    • Journal of Cadastre & Land InformatiX
    • /
    • v.48 no.1
    • /
    • pp.15-33
    • /
    • 2018
  • The main purpose of this study was to identify the topics and research trends of international cadastral research using LDA. To achieve this goal, I reviewed the literature on LDA and international cadastral study and formulated four research questions that are topics of cadastral researchers, distribution of topics, the most influential topics and changes of topics over time. To answer these research questions, I analyzed 370 papers published in the FIG Peer Review Journal between January 1, 2008, and October 31, 2017, using LDA. As a result of the analysis, I confirmed that there are twelve major topics in international cadastral research. And the most influential topic of these topics was identified as topic 2(cadastral information systems), and topic 5(land development and land administration) was also confirmed as playing an important role in the overall document. These two topics have been the most popular topics whose trendlines have been very active over the past decade and will play a leading role in future cadastral research.

Evaluation of Topic Modeling Performance for Overseas Construction Market Analysis Using LDA and BERTopic on News Articles (LDA 및 BERTopic 기반 해외건설시장 뉴스 기사 토픽모델링 성능평가)

  • Baik, Joonwoo;Chung, Sehwan;Chi, Seokho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.6
    • /
    • pp.811-819
    • /
    • 2023
  • Understanding the local conditions is a crucial factor in enhancing the success potential of overseas construction projects. This can be achieved through the analysis of news articles of the target market using topic modeling techniques. In this study, the authors aimed to analyze news articles using two topic modeling methods, namely Latent Dirichlet Allocation (LDA) and BERTopic, in order to determine the optimal approach for market condition analysis. To evaluate the alignment between the generated topics and the actual themes of the news documents, the research collected 6,273 BBC news articles, created ground truth data for individual news article topics, and finally compared this ground truth with the results of the topic modeling. The F1 score for LDA was 0.011, while BERTopic achieved a score of 0.244. These results indicate that BERTopic more accurately reflected the actual topics of news articles, making it more effective for understanding the overseas construction market.

Topic Modeling Analysis of Beauty Industry using BERTopic and LDA

  • YANG, Hoe-Chang;LEE, Won-Dong
    • The Journal of Economics, Marketing and Management
    • /
    • v.10 no.6
    • /
    • pp.1-7
    • /
    • 2022
  • Purpose: The purpose of this study is identifying the research trends of degree papers related to the beauty industry and providing information which can contribute to the development of the domestic beauty industry and the direction of various research about beauty industry. Research design, data and methodology: This study used 154 academic papers and 189 academic papers with English abstracts out of 299 academic papers. All of these papers were found by searching for the keyword "beauty industry" in ScienceON on August 15, 2022. For the analysis, BERTopic and LDA (Latent Dirichlet Allocation) analysis were conducted using Python 3.7. Also, OLS regression analysis was conducted to understand the annual increase and decrease trend of each topic derived with trend analysis. Results: As a result of word frequency analysis, the frequency of satisfaction, management, behavior, and service was found to be high. In addition, it was found that 'service', 'satisfaction' and 'customer' were frequently associated with program and relationship in the word co-occurrence frequency analysis. As a result of topic modeling, six topics were derived: 'Beauty shop', 'Health education', 'Cosmetics', 'Customer satisfaction', 'Beauty education', and 'Beauty business'. The trend analysis result of each topic confirmed that 'Beauty education' and 'Health education' are getting more attention as time goes by. Conclusions: The future studies must resolve the extreme polarization between the structure of the small beauty industry and beauty stores. Furthermore, the researches have to direct various ways to create the performance of internal personnel. The ways to maximize product capabilities such as competitive cosmetics and brands are also needed attentions.