• Title/Summary/Keyword: LDA topic model

Search Result 111, Processing Time 0.023 seconds

Data Analysis of Dropouts of University Students Using Topic Modeling (토픽모델링을 활용한 대학생의 중도탈락 데이터 분석)

  • Jeong, Do-Heon;Park, Ju-Yeon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.88-95
    • /
    • 2021
  • This study aims to provide implications for establishing support policies for students by empirically analyzing data on university students dropouts. To this end, data of students enrolled in D University after 2017 were sampled and collected. The collected data was analyzed using topic modeling(LDA: Latent Dirichlet Allocation) technique, which is a probabilistic model based on text mining. As a result of the study, it was found that topics that were characteristic of dropout students were found, and the classification performance between groups through topics was also excellent. Based on these results, a specific educational support system was proposed to prevent dropout of university students. This study is meaningful in that it shows the use of text mining techniques in the education field and suggests an education policy based on data analysis.

Reviews Analysis of Korean Clinics Using LDA Topic Modeling (토픽 모델링을 활용한 한의원 리뷰 분석과 마케팅 제언)

  • Kim, Cho-Myong;Jo, A-Ram;Kim, Yang-Kyun
    • The Journal of Korean Medicine
    • /
    • v.43 no.1
    • /
    • pp.73-86
    • /
    • 2022
  • Objectives: In the health care industry, the influence of online reviews is growing. As medical services are provided mainly by providers, those services have been managed by hospitals and clinics. However, direct promotions of medical services by providers are legally forbidden. Due to this reason, consumers, like patients and clients, search a lot of reviews on the Internet to get any information about hospitals, treatments, prices, etc. It can be determined that online reviews indicate the quality of hospitals, and that analysis should be done for sustainable hospital marketing. Method: Using a Python-based crawler, we collected reviews, written by real patients, who had experienced Korean medicine, about more than 14,000 reviews. To extract the most representative words, reviews were divided by positive and negative; after that reviews were pre-processed to get only nouns and adjectives to get TF(Term Frequency), DF(Document Frequency), and TF-IDF(Term Frequency - Inverse Document Frequency). Finally, to get some topics about reviews, aggregations of extracted words were analyzed by using LDA(Latent Dirichlet Allocation) methods. To avoid overlap, the number of topics is set by Davis visualization. Results and Conclusions: 6 and 3 topics extracted in each positive/negative review, analyzed by LDA Topic Model. The main factors, consisting of topics were 1) Response to patients and customers. 2) Customized treatment (consultation) and management. 3) Hospital/Clinic's environments.

Improvement of recommendation system using attribute-based opinion mining of online customer reviews

  • Misun Lee;Hyunchul Ahn
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.259-266
    • /
    • 2023
  • In this paper, we propose an algorithm that can improve the accuracy performance of collaborative filtering using attribute-based opinion mining (ABOM). For the experiment, a total of 1,227 online consumer review data about smartphone apps from domestic smartphone users were used for analysis. After morpheme analysis using the KKMA (Kkokkoma) analyzer and emotional word analysis using KOSAC, attribute extraction is performed using LDA topic modeling, and the topic modeling results for each weighted review are used to add up the ratings of collaborative filtering and the sentiment score. MAE, MAPE, and RMSE, which are statistical model performance evaluations that calculate the average accuracy error, were used. Through experiments, we predicted the accuracy of online customers' app ratings (APP_Score) by combining traditional collaborative filtering among the recommendation algorithms and the attribute-based opinion mining (ABOM) technique, which combines LDA attribute extraction and sentiment analysis. As a result of the analysis, it was found that the prediction accuracy of ratings using attribute-based opinion mining CF was better than that of ratings implementing traditional collaborative filtering.

Topic modeling and topic change trend analysis for advanced construction technologies (건설신기술에 대한 토픽 모델링 및 토픽 변화추이 분석)

  • Jeong, Seong Yun;Kim, Nam Gon
    • Smart Media Journal
    • /
    • v.10 no.4
    • /
    • pp.102-110
    • /
    • 2021
  • Currently, the advanced construction technology endorsement system is being operated to promote the development of domestic construction technology. We tried to examine the implicit meanings inherent in advanced construction technologies by analyzing the relationship between emerging vocabularies with high importance in relation to the advanced construction technologies endorsed through this system. For this purpose, 918 cases of advanced construction technology information were collected. Based on the endorsed year and summary of the advanced construction technologies, the importance of the emerging vocabularies was measured for each advanced construction technology. And, based on the LDA model, the degree of influence between related vocabularies was evaluated for each of the four topic areas. Topics according to the technical application fields were analyzed. From 1990 to 2021, the trend of changes in highly influential vocabularies by each topic was inferred. In the future, changes in the degree of influence of the topics of environment, machinery, facilities, and maintenance and reinforcement of structures and related technology fields were predicted.

Analysis on Status and Trends of SIAM Journal Papers using Text Mining (텍스트마이닝 기법을 활용한 미국산업응용수학 학회지의 연구 현황 및 동향 분석)

  • Kim, Sung-Yeun
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.7
    • /
    • pp.212-222
    • /
    • 2020
  • The purpose of this study is to understand the current status and trends of the research studies published by the Society for Industrial and Applied Mathematics which is a leader in the field of industrial mathematics around the world. To perform this purpose, titles and abstracts were collected from 6,255 research articles between 2016 and 2019, and the R program was used to analyze the topic modeling model with LDA techniques and a regression model. As the results of analyses, first, a variety of studies have been studied in the fields of industrial mathematics, such as algebra, discrete mathematics, geometry, topological mathematics, probability and statistics. Second, it was found that the ascending research subjects were fluid mechanics, graph theory, and stochastic differential equations, and the descending research subjects were computational theory and classical geometry. The results of the study, based on the understanding of the overall flows and changes of the intellectual structure in the fields of industrial mathematics, are expected to provide researchers in the field with implications of the future direction of research and how to build an industrial mathematics curriculum that reflects the zeitgeist in the field of education.

Analysis of Research Trends in Cloud Security Using Topic Modeling and Time-Series Analysis: Focusing on NTIS Projects (토픽모델링과 시계열 분석을 활용한 클라우드 보안 분야 연구 동향 분석 : NTIS 과제를 중심으로)

  • Sun Young Yun;Nam Wook Cho
    • Convergence Security Journal
    • /
    • v.24 no.2
    • /
    • pp.31-38
    • /
    • 2024
  • Recent expansion in cloud service usage has heightened the importance of cloud security. The purpose of this study is to analyze current research trends in the field of cloud security and to derive implications. To this end, R&D project data provided by the National Science and Technology Knowledge Information Service (NTIS) from 2010 to 2023 was utilized to analyze trends in cloud security research. Fifteen core topics in cloud security research were identified using LDA topic modeling and ARIMA time series analysis. Key areas identified in the research include AI-powered security technologies, privacy and data security, and solving security issues in IoT environments. This highlights the need for research to address security threats that may arise due to the proliferation of cloud technologies and the digital transformation of infrastructure. Based on the derived topics, the field of cloud security was divided into four categories to define a technology reference model, which was improved through expert interviews. This study is expected to guide the future direction of cloud security development and provide important guidelines for future research and investment in academia and industry.

Non-Simultaneous Sampling Deactivation during the Parameter Approximation of a Topic Model

  • Jeong, Young-Seob;Jin, Sou-Young;Choi, Ho-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.1
    • /
    • pp.81-98
    • /
    • 2013
  • Since Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) were introduced, many revised or extended topic models have appeared. Due to the intractable likelihood of these models, training any topic model requires to use some approximation algorithm such as variational approximation, Laplace approximation, or Markov chain Monte Carlo (MCMC). Although these approximation algorithms perform well, training a topic model is still computationally expensive given the large amount of data it requires. In this paper, we propose a new method, called non-simultaneous sampling deactivation, for efficient approximation of parameters in a topic model. While each random variable is normally sampled or obtained by a single predefined burn-in period in the traditional approximation algorithms, our new method is based on the observation that the random variable nodes in one topic model have all different periods of convergence. During the iterative approximation process, the proposed method allows each random variable node to be terminated or deactivated when it is converged. Therefore, compared to the traditional approximation ways in which usually every node is deactivated concurrently, the proposed method achieves the inference efficiency in terms of time and memory. We do not propose a new approximation algorithm, but a new process applicable to the existing approximation algorithms. Through experiments, we show the time and memory efficiency of the method, and discuss about the tradeoff between the efficiency of the approximation process and the parameter consistency.

Topic change monitoring study based on Blue House national petition using a control chart (관리도를 활용한 국민청원 토픽 모니터링 연구)

  • Lee, Heeyeon;Choi, Jieun;Lee, Sungim;Son, Won
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.795-806
    • /
    • 2021
  • Recently, as text data through online channels have become vast, there is a growing interest in research that summarizes and analyzes them. One of the fundamental analyses of text data is to extract potential topics. Although the researcher may read all the data and summarize the contents one by one, it is not easy to deal with large amounts of data. Blei and Lafferty (2007) and Blei et al. (2003) proposed topic modeling methods for extracting topics using a statistical model. Since the text data is generally collected over time, it is worthwhile to monitor the topic's changes. In this study, we propose a topic index based on the results of the topic model. In addition, a control chart, a representative tool for statistical process management, is applied to monitor the topic index over time. As a practical example, we use text data collected from Blue House National Petition boards between March 5, 2018, and March 5, 2020.

Wireless Earphone Consumers Using LDA Topic Modeling Comparative Analysis of Purchase Intention and Satisfaction: Focused on Samsung and Apple wireless earphone reviews in Coupang (LDA 토픽 모델링을 활용한 무선이어폰 소비자 구매 의도 및 만족도 비교 분석: 쿠팡에서의 삼성과 애플 무선이어폰 리뷰를 중심으로)

  • Tuul Yondon;Tae-Gu Kang
    • Journal of Industrial Convergence
    • /
    • v.21 no.8
    • /
    • pp.23-33
    • /
    • 2023
  • Consumer review analysis is important for product development, customer satisfaction, competitive advantage, and effective marketing. Increased use of wireless earphones is expected to reach $45.7 billion by 2026 with growth in lifestyle. Therefore, in consideration of the growth and importance of the market, consumer reviews of wireless earphones from Apple and Samsung were analyzed. In this study, 11,320 wireless earphone reviews from Apple and Samsung sold on Coupang were collected to analyze consumers' purchase intentions and analyze consumer satisfaction through analysis of the frequency, sensitivity, and LDA topic model of text mining. As a result of topic modeling, 16 topics were derived and classified into sound quality, connection, shopping mall service, purchase intention, battery, delivery, and price. As a result of brand comparison, Samsung purchased a lot for gift purposes, had a high positive sentiment for price, and Apple had a high positive sentiment for battery, sound quality, connection, service, and delivery. The results of this study can be used as data for related industries as a result of research that can obtain improvements and insights on customer satisfaction, quality and market trends, including manufacturing, retail, marketers, and consumers.

Analysis on the Trend of The Journal of Information Systems Using TLS Mining (TLS 마이닝을 이용한 '정보시스템연구' 동향 분석)

  • Yun, Ji Hye;Oh, Chang Gyu;Lee, Jong Hwa
    • The Journal of Information Systems
    • /
    • v.31 no.1
    • /
    • pp.289-304
    • /
    • 2022
  • Purpose The development of the network and mobile industries has induced companies to invest in information systems, leading a new industrial revolution. The Journal of Information Systems, which developed the information system field into a theoretical and practical study in the 1990s, retains a 30-year history of information systems. This study aims to identify academic values and research trends of JIS by analyzing the trends. Design/methodology/approach This study aims to analyze the trend of JIS by compounding various methods, named as TLS mining analysis. TLS mining analysis consists of a series of analysis including Term Frequency-Inverse Document Frequency (TF-IDF) weight model, Latent Dirichlet Allocation (LDA) topic modeling, and a text mining with Semantic Network Analysis. Firstly, keywords are extracted from the research data using the TF-IDF weight model, and after that, topic modeling is performed using the Latent Dirichlet Allocation (LDA) algorithm to identify issue keywords. Findings The current study used the summery service of the published research paper provided by Korea Citation Index to analyze JIS. 714 papers that were published from 2002 to 2012 were divided into two periods: 2002-2011 and 2012-2021. In the first period (2002-2011), the research trend in the information system field had focused on E-business strategies as most of the companies adopted online business models. In the second period (2012-2021), data-based information technology and new industrial revolution technologies such as artificial intelligence, SNS, and mobile had been the main research issues in the information system field. In addition, keywords for improving the JIS citation index were presented.