• Title/Summary/Keyword: Latent dirichlet allocation

Search Result 217, Processing Time 0.023 seconds

Reviews Analysis of Korean Clinics Using LDA Topic Modeling (토픽 모델링을 활용한 한의원 리뷰 분석과 마케팅 제언)

  • Kim, Cho-Myong;Jo, A-Ram;Kim, Yang-Kyun
    • The Journal of Korean Medicine
    • /
    • v.43 no.1
    • /
    • pp.73-86
    • /
    • 2022
  • Objectives: In the health care industry, the influence of online reviews is growing. As medical services are provided mainly by providers, those services have been managed by hospitals and clinics. However, direct promotions of medical services by providers are legally forbidden. Due to this reason, consumers, like patients and clients, search a lot of reviews on the Internet to get any information about hospitals, treatments, prices, etc. It can be determined that online reviews indicate the quality of hospitals, and that analysis should be done for sustainable hospital marketing. Method: Using a Python-based crawler, we collected reviews, written by real patients, who had experienced Korean medicine, about more than 14,000 reviews. To extract the most representative words, reviews were divided by positive and negative; after that reviews were pre-processed to get only nouns and adjectives to get TF(Term Frequency), DF(Document Frequency), and TF-IDF(Term Frequency - Inverse Document Frequency). Finally, to get some topics about reviews, aggregations of extracted words were analyzed by using LDA(Latent Dirichlet Allocation) methods. To avoid overlap, the number of topics is set by Davis visualization. Results and Conclusions: 6 and 3 topics extracted in each positive/negative review, analyzed by LDA Topic Model. The main factors, consisting of topics were 1) Response to patients and customers. 2) Customized treatment (consultation) and management. 3) Hospital/Clinic's environments.

Text Network Analysis on Stalking-Related News Articles (스토킹 관련 언론기사에 대한 텍스트네트워크분석)

  • Eun-Sun Ji;Sang-Hee Jeong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.579-585
    • /
    • 2023
  • The purpose of this study is to explore keywords within stalking-related news articles according to political orientation through the text network analysis, and then to examine the implicit intentions. Selecting total 1,607 articles including 824 articles of the conservative press(The Chosun Ilbo, The Joongang Ilbo) and 783 articles of the progressive press(The Hankyoreh, The Kyunghyang Shinmun) reported from January 1, 2018 to December 31, 2022, this study explored the aspect of topic category drawn through the topic modeling technique based on LDA(Latent Dirichlet Allocation). In the results of this study, the common topics of the conservative and progressive press were improvement of the perception of gender-based violence, personal protection & intensity of punishment, and disclosure of stalkers' personal information. Regarding the topics differently shown in those two press, the conservative press showed stalkers' harmful act, and outline of 'murder case at Sindang Station' while the progressive press showed request for aggravated punishment on the 'murder case at Sindang Station', and eradication of sexual exploitation crime (in cyber space). The results of this study imply that there are changes in the type of reporting according to ideological opinions about stalking in news articles.

Analysis of service strategies through changes in Messenger application reviews during the pandemic: focusing on topic modeling (팬데믹 기간 Messenger 애플리케이션 리뷰 변화를 통한 서비스 전략 분석 : 토픽 모델링을 중심으로)

  • YuNa Lee;Mijin Noh;YangSok Kim;MuMoungCho Han
    • Smart Media Journal
    • /
    • v.12 no.6
    • /
    • pp.15-26
    • /
    • 2023
  • As face-to-face communication has become difficult due to the COVID-19 pandemic, studies have been conducted to understand the impact of non-face-to-face communication, but there is a lack of research that examines this through messenger application reviews. This study aims to identify the impact of the pandemic through Latent Dirichlet Allocation (LDA) topic modeling by collecting review data of 메신저 applications in the Google Play Store and suggest service strategies accordingly. The study categorized the data based on when the pandemic started and the ratings given by users. The analysis showed that messenger is mainly used by middle-aged and older people, and that family communication increased after the pandemic. Users expressed frustration with the application's updates and found it difficult to adapt to the changes. This calls for a development approach that adjusts the frequency of updates and actively listens to user feedback. Also, providing an intuitive and simple user interface (UI) is expected to improve user satisfaction.

Evaluation of Topic Modeling Performance for Overseas Construction Market Analysis Using LDA and BERTopic on News Articles (LDA 및 BERTopic 기반 해외건설시장 뉴스 기사 토픽모델링 성능평가)

  • Baik, Joonwoo;Chung, Sehwan;Chi, Seokho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.6
    • /
    • pp.811-819
    • /
    • 2023
  • Understanding the local conditions is a crucial factor in enhancing the success potential of overseas construction projects. This can be achieved through the analysis of news articles of the target market using topic modeling techniques. In this study, the authors aimed to analyze news articles using two topic modeling methods, namely Latent Dirichlet Allocation (LDA) and BERTopic, in order to determine the optimal approach for market condition analysis. To evaluate the alignment between the generated topics and the actual themes of the news documents, the research collected 6,273 BBC news articles, created ground truth data for individual news article topics, and finally compared this ground truth with the results of the topic modeling. The F1 score for LDA was 0.011, while BERTopic achieved a score of 0.244. These results indicate that BERTopic more accurately reflected the actual topics of news articles, making it more effective for understanding the overseas construction market.

An Analysis of Artificial Intelligence Education Research Trends Based on Topic Modeling

  • You-Jung Ko
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.2
    • /
    • pp.197-209
    • /
    • 2024
  • This study aimed to analyze recent research trends in Artificial Intelligence (AI) education within South Korea with the overarching objective of exploring the future direction of AI education. For this purpose, an analysis of 697 papers related to AI education published in Research Information Sharing Service (RISS) from 2016 to November 2023 were analyzed using word cloud and Latent Dirichlet Allocation (LDA) topic modeling technique. As a result of the analysis, six major topics were identified: generative AI utilization education, AI ethics education, AI convergence education, teacher perceptions and roles in AI utilization, AI literacy development in university education, and AI-based education and research directions. Based on these findings, I proposed several suggestions, (1) including expanding the use of generative AI in various subjects, (2) establishing ethical guidelines for AI use, (3) evaluating the long-term impact of AI education, (4) enhancing teachers' ability to use AI in higher education, (5) diversifying the curriculum of AI education in universities, (6) analyzing the trend of AI research, and developing an educational platform.

The Trends of Eco-Friendly Textiles Using Big Data from Newspaper Articles (신문기사 빅데이터를 활용한 친환경 섬유의 추이에 관한 연구)

  • Nam Beom Cho;Choong Kwon Lee
    • Smart Media Journal
    • /
    • v.13 no.2
    • /
    • pp.95-107
    • /
    • 2024
  • The development of environmentally friendly products and services has become a trend, and the development and utilization of eco-friendly textiles with economic value is gaining attention as a new business model. Analyzing and identifying trends and developments in eco-friendly textiles can provide important information and insights for various stakeholders such as companies, governments, and consumers to help them achieve sustainable growth. For this study, we collected and analyzed data from newspaper articles mainly covering the textile and fashion sector from 2000 to June 2023. A total of 12,331 articles containing the keyword 'eco-friendly textiles' were collected, and after performing morphological analysis on the extracted data, Latent Dirichlet Allocation and Dynamic Topic Modeling analysis were performed to identify topics by year. The results of the study are expected to provide strategic guidance and insights for the sustainable development of the textile industry, thereby helping to promote the research, development, and commercialization of eco-friendly textiles.

KOSPI index prediction using topic modeling and LSTM

  • Jin-Hyeon Joo;Geun-Duk Park
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.7
    • /
    • pp.73-80
    • /
    • 2024
  • In this paper, we proposes a method to improve the accuracy of predicting the Korea Composite Stock Price Index (KOSPI) by combining topic modeling and Long Short-Term Memory (LSTM) neural networks. In this paper, we use the Latent Dirichlet Allocation (LDA) technique to extract ten major topics related to interest rate increases and decreases from financial news data. The extracted topics, along with historical KOSPI index data, are input into an LSTM model to predict the KOSPI index. The proposed model has the characteristic of predicting the KOSPI index by combining the time series prediction method by inputting the historical KOSPI index into the LSTM model and the topic modeling method by inputting news data. To verify the performance of the proposed model, this paper designs four models (LSTM_K model, LSTM_KNS model, LDA_K model, LDA_KNS model) based on the types of input data for the LSTM and presents the predictive performance of each model. The comparison of prediction performance results shows that the LSTM model (LDA_K model), which uses financial news topic data and historical KOSPI index data as inputs, recorded the lowest RMSE (Root Mean Square Error), demonstrating the best predictive performance.

Performance Improvement of Topic Modeling using BART based Document Summarization (BART 기반 문서 요약을 통한 토픽 모델링 성능 향상)

  • Eun Su Kim;Hyun Yoo;Kyungyong Chung
    • Journal of Internet Computing and Services
    • /
    • v.25 no.3
    • /
    • pp.27-33
    • /
    • 2024
  • The environment of academic research is continuously changing due to the increase of information, which raises the need for an effective way to analyze and organize large amounts of documents. In this paper, we propose Performance Improvement of Topic Modeling using BART(Bidirectional and Auto-Regressive Transformers) based Document Summarization. The proposed method uses BART-based document summary model to extract the core content and improve topic modeling performance using LDA(Latent Dirichlet Allocation) algorithm. We suggest an approach to improve the performance and efficiency of LDA topic modeling through document summarization and validate it through experiments. The experimental results show that the BART-based model for summarizing article data captures the important information of the original articles with F1-Scores of 0.5819, 0.4384, and 0.5038 in Rouge-1, Rouge-2, and Rouge-L performance evaluations, respectively. In addition, topic modeling using summarized documents performs about 8.08% better than topic modeling using full text in the performance comparison using the Perplexity metric. This contributes to the reduction of data throughput and improvement of efficiency in the topic modeling process.

A Study on the Research Topics and Trends in South Korea: Focusing on Particulate Matter (토픽모델링을 이용한 국내 미세먼지 연구 분류 및 연구동향 분석)

  • Park, Hyemin;Kim, Taeyong;Kwon, Daewoong;Heo, Junyong;Lee, Juyeon;Yang, Minjune
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_3
    • /
    • pp.873-885
    • /
    • 2022
  • The particulate matter (PM) has emerged as a hot topic around the world as it has been reported that PM is related to an increase in mortality and prevalence rates. In South Korea, the importance of PM has been recognized since the late 1990s, and various studies on PM have been conducted. This study investigated the PM research topics and trends for papers (D=2,764) published in Research Information Sharing Service (RISS) using topic modeling based on Latent Dirichlet Allocation (LDA). As a result, a total of 10 topics were identified in the whole papers, and the PM research topics were classified as 'PM reduction (Topic 1)', 'Government policy and management (Topic 2)', 'Characteristics of PM (Topic 3)', 'PM model (Topic 4)', 'Environmental education (Topic 5)', 'Bio (Topic 6)', 'Traffic (Topic 7)', 'Asian dust (Topic 8)', 'Indoor PM (Topic 9)', 'Human risk (Topic 10)'. In particular, the proportion of papers on topics 'Government policy and management (Topic 2)', 'PM model (Topic 4)', 'Environmental education (Topic 5)', and 'Bio (Topic 6)' to the toal number of papers increased over time (linear slope > 0). The results of this study provide the new literature review methodology related to particulate matter and the history and insight.

Latent Keyphrase Extraction Using LDA Model (LDA 모델을 이용한 잠재 키워드 추출)

  • Cho, Taemin;Lee, Jee-Hyong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.2
    • /
    • pp.180-185
    • /
    • 2015
  • As the number of document resources is continuously increasing, automatically extracting keyphrases from a document becomes one of the main issues in recent days. However, most previous works have tried to extract keyphrases from words in documents, so they overlooked latent keyphrases which did not appear in documents. Although latent keyphrases do not appear in documents, they can undertake an important role in text summarization and information retrieval because they implicate meaningful concepts or contents of documents. Also, they cover more than one fourth of the entire keyphrases in the real-world datasets and they can be utilized in short articles such as SNS which rarely have explicit keyphrases. In this paper, we propose a new approach that selects candidate keyphrases from the keyphrases of neighbor documents which are similar to the given document and evaluates the importance of the candidates with the individual words in the candidates. Experiment result shows that latent keyphrases can be extracted at a reasonable level.