• 제목/요약/키워드: Topic Modeling(LDA)

Search Result 292, Processing Time 0.028 seconds

Data Analysis of Dropouts of University Students Using Topic Modeling (토픽모델링을 활용한 대학생의 중도탈락 데이터 분석)

  • Jeong, Do-Heon;Park, Ju-Yeon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.88-95
    • /
    • 2021
  • This study aims to provide implications for establishing support policies for students by empirically analyzing data on university students dropouts. To this end, data of students enrolled in D University after 2017 were sampled and collected. The collected data was analyzed using topic modeling(LDA: Latent Dirichlet Allocation) technique, which is a probabilistic model based on text mining. As a result of the study, it was found that topics that were characteristic of dropout students were found, and the classification performance between groups through topics was also excellent. Based on these results, a specific educational support system was proposed to prevent dropout of university students. This study is meaningful in that it shows the use of text mining techniques in the education field and suggests an education policy based on data analysis.

The Analysis of North Korea's Economic Policy Trends through Topic Modeling (토픽모델링을 통한 북한의 경제정책 동향 분석)

  • Kang, Kyung Hwa
    • Smart Media Journal
    • /
    • v.9 no.4
    • /
    • pp.44-51
    • /
    • 2020
  • Since the mid-to-late 1990s, there have obviously been many changes in the North Korean economy. Since the change has been more pronounced since Kim Jong Un took power in 2012, the purpose of the paper is to track the trend of economic policy by timing. In this paper, I use LDA Topic Modeling, a text-mining analyzer method, to analyze the economics journal "Economic Research," which is a representative literature in the economic field published in North Korea. An in-depth analysis of the "economic research," which has an unrivaled position as an economic journal produced in North Korea, can be said to be an essential task in tracking the reality, limitations facing the economy and alternatives that North Korean authorities are aware of. Through the "Economic Research," where various topics of debate on the North Korean economy are hidden, the North Korean leader's economic policy flow is examined and the contents of the "change" intended by the current Kim Jong-un regime are analyzed.

Topic Modeling to Identify Cloud Security Trends using news Data Before and After the COVID-19 Pandemic (뉴스 데이터 토픽 모델링을 활용한 COVID-19 대유행 전후의 클라우드 보안 동향 파악)

  • Soun U Lee;Jaewoo Lee
    • Convergence Security Journal
    • /
    • v.22 no.2
    • /
    • pp.67-75
    • /
    • 2022
  • Due to the COVID-19 pandemic, many companies have introduced remote work. However, the introduction of remote work has increased attacks on companies to access sensitive information, and many companies have begun to use cloud services to respond to security threats. This study used LDA topic modeling techniques by collecting news data with the keyword 'cloud security' to analyze changes in domestic cloud security trends before and after the COVID-19 pandemic. Before the COVID-19 pandemic, interest in domestic cloud security was low, so representation or association could not be found in the extracted topics. However, it was analyzed that the introduction of cloud is necessary for high computing performance for AI, IoT, and blockchain, which are IT technologies that are currently being studied. On the other hand, looking at topics extracted after the COVID-19 pandemic, it was confirmed that interest in the cloud increased in Korea, and accordingly, interest in cloud security improved. Therefore, security measures should be established to prepare for the ever-increasing usage of cloud services.

Application of a Topic Model on the Korea Expressway Corporation's VOC Data (한국도로공사 VOC 데이터를 이용한 토픽 모형 적용 방안)

  • Kim, Ji Won;Park, Sang Min;Park, Sungho;Jeong, Harim;Yun, Ilsoo
    • Journal of Information Technology Services
    • /
    • v.19 no.6
    • /
    • pp.1-13
    • /
    • 2020
  • Recently, 80% of big data consists of unstructured text data. In particular, various types of documents are stored in the form of large-scale unstructured documents through social network services (SNS), blogs, news, etc., and the importance of unstructured data is highlighted. As the possibility of using unstructured data increases, various analysis techniques such as text mining have recently appeared. Therefore, in this study, topic modeling technique was applied to the Korea Highway Corporation's voice of customer (VOC) data that includes customer opinions and complaints. Currently, VOC data is divided into the business areas of Korea Expressway Corporation. However, the classified categories are often not accurate, and the ambiguous ones are classified as "other". Therefore, in order to use VOC data for efficient service improvement and the like, a more systematic and efficient classification method of VOC data is required. To this end, this study proposed two approaches, including method using only the latent dirichlet allocation (LDA), the most representative topic modeling technique, and a new method combining the LDA and the word embedding technique, Word2vec. As a result, it was confirmed that the categories of VOC data are relatively well classified when using the new method. Through these results, it is judged that it will be possible to derive the implications of the Korea Expressway Corporation and utilize it for service improvement.

Online Reviews Analysis for Prediction of Product Ratings based on Topic Modeling (토픽 모델링에 기반한 온라인 상품 평점 예측을 위한 온라인 사용 후기 분석)

  • Park, Sang Hyun;Moon, Hyun Sil;Kim, Jae Kyeong
    • Journal of Information Technology Services
    • /
    • v.16 no.3
    • /
    • pp.113-125
    • /
    • 2017
  • Customers have been affected by others' opinions when they make a purchase. Thanks to the development of technologies, people are sharing their experiences such as reviews or ratings through online or social network services, However, although ratings are intuitive information for others, many reviews include only texts without ratings. Also, because of huge amount of reviews, customers and companies can't read all of them so they are hard to evaluate to a product without ratings. Therefore, in this study, we propose a methodology to predict ratings based on reviews for a product. In a methodology, we first estimate the topic-review matrix using the Latent Dirichlet Allocation technic which is widely used in topic modeling. Next, we predict ratings based on the topic-review matrix using the artificial neural network model which is based on the backpropagation algorithm. Through experiments with actual reviews, we find that our methodology can predict ratings based on customers' reviews. And our methodology performs better with reviews which include certain opinions. As a result, our study can be used for customers and companies that want to know exactly a product with ratings. Moreover, we hope that our study leads to the implementation of future studies that combine machine learning and topic modeling.

Unstructured Data Processing Using Keyword-Based Topic-Oriented Analysis (키워드 기반 주제중심 분석을 이용한 비정형데이터 처리)

  • Ko, Myung-Sook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.521-526
    • /
    • 2017
  • Data format of Big data is diverse and vast, and its generation speed is very fast, requiring new management and analysis methods, not traditional data processing methods. Textual mining techniques can be used to extract useful information from unstructured text written in human language in online documents on social networks. Identifying trends in the message of politics, economy, and culture left behind in social media is a factor in understanding what topics they are interested in. In this study, text mining was performed on online news related to a given keyword using topic - oriented analysis technique. We use Latent Dirichiet Allocation (LDA) to extract information from web documents and analyze which subjects are interested in a given keyword, and which topics are related to which core values are related.

A Study on the Research Trends in the Fourth Industrial Revolution in Korea Using Topic Modeling (토픽모델링을 활용한 4차 산업혁명 분야의 국내 연구 동향 분석)

  • Gi Young Kim;Dong-Jo Noh
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.34 no.4
    • /
    • pp.207-234
    • /
    • 2023
  • Since the advent of the Fourth Industrial Revolution, related studies have been conducted in various fields including industrial fields. In this study, to analyze domestic research trends on the Fourth Industrial Revolution, a keyword analysis and topic modeling analysis based on the LDA algorithm were conducted on 2,115 papers included in the KCI from January 2016 to August 2023. As a result of this study, first, the journals in which more than 30 academic papers related to the Fourth Industrial Revolution were published were digital convergence research, humanities society 21, e-business research, and learner-centered subject education research. Second, as a result of the topic modeling analysis, seven topics were selected: "human and artificial intelligence," "data and personal information management," "curriculum change and innovation," "corporate change and innovation," "education change and jobs," "culture and arts and content," and "information and corporate policies and responses." Third, common research topics related to the Fourth Industrial Revolution are "change in the curriculum," "human and artificial intelligence," and "culture arts and content," and common keywords include "company," "information," "protection," "smart," and "system." Fourth, in the first half of the research period (2016-2019), topics in the field of education appeared at the top, but in the second half (2020-2023), topics related to corporate, smart, digital, and service innovation appeared at the top. Fifth, research topics tended to become more specific or subdivided in the second half of the study. This trend is interpreted as a result of socioeconomic changes that occur as core technologies in the fourth industrial revolution are applied and utilized in various industrial fields after the corona pandemic. The results of this study are expected to provide useful information for identifying research trends in the field of the Fourth Industrial Revolution, establishing strategies, and subsequent research.

Big Data Analysis of Busan Civil Affairs Using the LDA Topic Modeling Technique (LDA 토픽모델링 기법을 활용한 부산시 민원 빅데이터 분석)

  • Park, Ju-Seop;Lee, Sae-Mi
    • Informatization Policy
    • /
    • v.27 no.2
    • /
    • pp.66-83
    • /
    • 2020
  • Local issues that occur in cities typically garner great attention from the public. While local governments strive to resolve these issues, it is often difficult to effectively eliminate them all, which leads to complaints. In tackling these issues, it is imperative for local governments to use big data to identify the nature of complaints, and proactively provide solutions. This study applies the LDA topic modeling technique to research and analyze trends and patterns in complaints filed online. To this end, 9,625 cases of online complaints submitted to the city of Busan from 2015 to 2017 were analyzed, and 20 topics were identified. From these topics, key topics were singled out, and through analysis of quarterly weighting trends, four "hot" topics(Bus stops, Taxi drivers, Praises, and Administrative handling) and four "cold" topics(CCTV installation, Bus routes, Park facilities including parking, and Festivities issues) were highlighted. The study conducted big data analysis for the identification of trends and patterns in civil affairs and makes an academic impact by encouraging follow-up research. Moreover, the text mining technique used for complaint analysis can be used for other projects requiring big data processing.

An Exploratory Study of Generative AI Service Quality using LDA Topic Modeling and Comparison with Existing Dimensions (LDA토픽 모델링을 활용한 생성형 AI 챗봇의 탐색적 연구 : 기존 AI 챗봇 서비스 품질 요인과의 비교)

  • YaeEun Ahn;Jungsuk Oh
    • Journal of Service Research and Studies
    • /
    • v.13 no.4
    • /
    • pp.191-205
    • /
    • 2023
  • Artificial Intelligence (AI), especially in the domain of text-generative services, has witnessed a significant surge, with forecasts indicating the AI-as-a-Service (AIaaS) market reaching a valuation of $55.0 Billion by 2028. This research set out to explore the quality dimensions characterizing synthetic text media software, with a focus on four key players in the industry: ChatGPT, Writesonic, Jasper, and Anyword. Drawing from a comprehensive dataset of over 4,000 reviews sourced from a software evaluation platform, the study employed the Latent Dirichlet Allocation (LDA) topic modeling technique using the Gensim library. This process resulted the data into 11 distinct topics. Subsequent analysis involved comparing these topics against established AI service quality dimensions, specifically AICSQ and AISAQUAL. Notably, the reviews predominantly emphasized dimensions like availability and efficiency, while others, such as anthropomorphism, which have been underscored in prior literature, were absent. This observation is attributed to the inherent nature of the reviews of AI services examined, which lean more towards semantic understanding rather than direct user interaction. The study acknowledges inherent limitations, mainly potential biases stemming from the singular review source and the specific nature of the reviewer demographic. Possible future research includes gauging the real-world implications of these quality dimensions on user satisfaction and to discuss deeper into how individual dimensions might impact overall ratings.

Analysis of User Reviews of Running Applications Using Text Mining: Focusing on Nike Run Club and Runkeeper (텍스트마이닝을 활용한 러닝 어플리케이션 사용자 리뷰 분석: Nike Run Club과 Runkeeper를 중심으로)

  • Gimun Ryu;Ilgwang Kim
    • Journal of Industrial Convergence
    • /
    • v.22 no.4
    • /
    • pp.11-19
    • /
    • 2024
  • The purpose of this study was to analyze user reviews of running applications using text mining. This study used user reviews of Nike Run Club and Runkeeper in the Google Play Store using the selenium package of python3 as the analysis data, and separated the morphemes by leaving only Korean nouns through the OKT analyzer. After morpheme separation, we created a rankNL dictionary to remove stopwords. To analyze the data, we used TF, TF-IDF and LDA topic modeling in text mining. The results of this study are as follows. First, the keywords 'record', 'app', and 'workout' were identified as the top keywords in the user reviews of Nike Run Club and Runkeeper applications, and there were differences in the rankings of TF and TF-IDF. Second, the LDA topic modeling of Nike Run Club identified the topics of 'basic items', 'additional features', 'errors', and 'location-based data', and the topics of Runkeeper identified the topics of 'errors', 'voice function', 'running data', 'benefits', and 'motivation'. Based on the results, it is recommended that errors and improvements should be made to contribute to the competitiveness of the application.