• Title/Summary/Keyword: Topic Modeling(LDA)

Search Result 292, Processing Time 0.025 seconds

Topic Modeling with Deep Learning-based Sentiment Filters (감정 딥러닝 필터를 활용한 토픽 모델링 방법론)

  • Choi, Byeong-Seol;Kim, Namgyu
    • The Journal of Information Systems
    • /
    • v.28 no.4
    • /
    • pp.271-291
    • /
    • 2019
  • Purpose The purpose of this study is to propose a methodology to derive positive keywords and negative keywords through deep learning to classify reviews into positive reviews and negative ones, and then refine the results of topic modeling using these keywords. Design/methodology/approach In this study, we extracted topic keywords by performing LDA-based topic modeling. At the same time, we performed attention-based deep learning to identify positive and negative keywords. Finally, we refined the topic keywords using these keywords as filters. Findings We collected and analyzed about 6,000 English reviews of Gyeongbokgung, a representative tourist attraction in Korea, from Tripadvisor, a representative travel site. Experimental results show that the proposed methodology properly identifies positive and negative keywords describing major topics.

Analysis of Shipping and Logistics News Articles using Topic Modeling (토픽모델링을 활용한 해운물류 뉴스 분석)

  • Hee-Young Yoon;Il-Youp Kwak
    • Korea Trade Review
    • /
    • v.46 no.4
    • /
    • pp.61-76
    • /
    • 2021
  • This study focuses on three logistics-related news (Logistics Newspaper, Korea Shipping Gadget, and Korea Shipping Newspaper) in order to present changes in logistics issues, centering on Corona 19, which has recently had the greatest impact in the world. For data collection, two-year news articles in 2019 and 2020 (title, article, content, date, article classification, article URL) were collected through web crawling (using Python's BeautifulSoup, requests module) on the homepages of three representative logistics-related media companies. As for the data analysis methods, fundamental statistical analysis, Latent Dirichlet Allocation (LDA) for topic modeling, and Scattertext were performed. The analysis results were as follows. First, among the three news media related to logistics, the Korea Shipping Newspaper was carrying out the most active media activities. Second, through topic modeling with LDA, eight logistics-related topics were identified, and keywords and significant issues of each topic were presented. Third, the keywords were visually expressed through Scattertext. This is the first study to present changes in the logistics field, focusing on articles from representative logistics-related media in 2019 and 2020. In particular, 2019 and 2020 can be divided into before and after the outbreak of Corona 19, which has had a great impact not only on the logistics field but also on our lives as a whole. For future work, a multi-faceted approach is required, such as comparative studies of logistics issues between countries or presenting implications based on long-term time-series articles.

Classification of Public Perceptions toward Smog Risks on Twitter Using Topic Modeling (Topic Modeling을 이용한 Twitter상에서 스모그 리스크에 관한 대중 인식 분류 연구)

  • Kim, Yun-Ki
    • Journal of Cadastre & Land InformatiX
    • /
    • v.47 no.1
    • /
    • pp.53-79
    • /
    • 2017
  • The main purpose of this study was to detect and classify public perceptions toward smog disasters on Twitter using topic modeling. To help achieve these objectives and to identify gaps in the literature, this research carried out a literature review on public opinions toward smog disasters and topic modeling. The literature review indicated that there are huge gaps in the related literature. In this research, this author formed five research questions to fill the gaps in the literature. And then this study performed research steps such as data extraction, word cloud analysis on the cleaned data, building the network of terms, correlation analysis, hierarchical cluster analysis, topic modeling with the LDA, and stream graphs to answer those research questions. The results of this research revealed that there exist huge differences in the most frequent terms, the shapes of terms network, types of correlation, and smog-related topics changing patterns between New York and London. Therefore, this author could find positive answers to the four of the five research questions and a partially positive answer to Research question 4. Finally, on the basis of the results, this author suggested policy implications and recommendations for future study.

Trend Analysis of Dance Performance Research Using Keywords and Topic Modeling of LDA Techniques (LDA 토픽 모델링 기법을 활용한 무용공연의 연구 동향 분석)

  • SI YU
    • Journal of Industrial Convergence
    • /
    • v.22 no.3
    • /
    • pp.13-25
    • /
    • 2024
  • This study explores research topics related to dance performances published in Korea based on big data and examines research trends that change according to the trend of the times. The results derived from topic modeling analysis are as follows. (1) Six major topics were derived: a study on marketing strategies and development plans for dance performances, (2) a study on the re-watching factors of dance performance space and performance satisfaction, (3) a study on the popularity and contribution of dance performances in the stage environment, (4) a study on the current status of dance performances and the convergence of dance group operations, (5) a study on the definition of dance performances using various social media, and (6) a study on the direction and development of technology-applied dance performance contents. Accordingly, research trends and topics related to dance, including dance performances, social changes, key keywords of researchers' change interests were extracted, and keywords were compared and analyzed to present academic changes and countermeasures. Accordingly, the need for research to apply new technologies was emphasized as it diversified and fused.

Recent Research Trend Analysis for the Journal of Society of Korea Industrial and Systems Engineering Using Topic Modeling (토픽모델링을 활용한 한국산업경영시스템학회지의 최근 연구주제 분석)

  • Dong Joon Park;Pyung Hoi Koo;Hyung Sool Oh;Min Yoon
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.3
    • /
    • pp.170-185
    • /
    • 2023
  • The advent of big data has brought about the need for analytics. Natural language processing (NLP), a field of big data, has received a lot of attention. Topic modeling among NLP is widely applied to identify key topics in various academic journals. The Korean Society of Industrial and Systems Engineering (KSIE) has published academic journals since 1978. To enhance its status, it is imperative to recognize the diversity of research domains. We have already discovered eight major research topics for papers published by KSIE from 1978 to 1999. As a follow-up study, we aim to identify major topics of research papers published in KSIE from 2000 to 2022. We performed topic modeling on 1,742 research papers during this period by using LDA and BERTopic which has recently attracted attention. BERTopic outperformed LDA by providing a set of coherent topic keywords that can effectively distinguish 36 topics found out this study. In terms of visualization techniques, pyLDAvis presented better two-dimensional scatter plots for the intertopic distance map than BERTopic. However, BERTopic provided much more diverse visualization methods to explore the relevance of 36 topics. BERTopic was also able to classify hot and cold topics by presenting 'topic over time' graphs that can identify topic trends over time.

Features of the Rural Revitalization Projects in Jang-su County Using LDA Topic Analysis of News Data - Focused on Keyword of Tourism and Livelihood - (뉴스데이터의 LDA 토픽 분석을 통한 장수군 농촌지역 활성화 사업의 특징 - 관광·생활 키워드를 중심으로 -)

  • Kim, Young-Jin;Son, Yong-hoon
    • Journal of Korean Society of Rural Planning
    • /
    • v.24 no.4
    • /
    • pp.69-80
    • /
    • 2018
  • In this study, we typified the project for revitalizing the rural area through text analysis using news data, and analyzed the main direction and characteristics of the project. In order to examine the factors emphasized among the issues related to the revitalization of rural areas, we used news data related to 'tourism' and 'livelihood', which are the main keyword of the project to promote rural areas. In the analysis, text mining techniques were used. Topic modeling was conducted on LDA techniques for major projects in 'tourism' and 'livelihood' keyword. Based on this, this study typified the projects that are carried out for the activation of rural areas by topic. As a result of the analysis, it was fount that the topics included in the project were distributed in 11 sub-types(Tourism Promotion, Regional Specialization, Local Festival, Development of Regional Scale, Urban and Rural Exchange, Agricultural Support, Community Forest Management, Improve the Settlement Environment, General Welfare Service, Low Class Support, Others). The characteristics of the rural revitalization projects were examined, and it was confirmed that domestic projects were carried out by tourism-oriented projects. To summarize, the government is making projects to revitalize rural areas through related ministries. Within the structure where the project is spreading to the region, a lot of projects are being carried out. It is understood that the tourism and welfare oriented projects are being carried out in the revitalization project of the domestic rural area. Therefore, in order to achieve the goal of rural revitalization, it is believed that it will be effective to carry out a balanced project to improve the settlement environment of the residents.

A Study on the Document Topic Extraction System Based on Big Data (빅데이터 기반 문서 토픽 추출 시스템 연구)

  • Hwang, Seung-Yeon;An, Yoon-Bin;Shin, Dong-Jin;Oh, Jae-Kon;Moon, Jin Yong;Kim, Jeong-Joon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.5
    • /
    • pp.207-214
    • /
    • 2020
  • Nowadays, the use of smart phones and various electronic devices is increasing, the Internet and SNS are activated, and we live in the flood of information. The amount of information has grown exponentially, making it difficult to look at a lot of information, and more and more people want to see only key keywords in a document, and the importance of research to extract topics that are the core of information is increasing. In addition, it is also an important issue to extract the topic and compare it with the past to infer the current trend. Topic modeling techniques can be used to extract topics from a large volume of documents, and these extracted topics can be used in various fields such as trend prediction and data analysis. In this paper, we inquire the topic of the three-year papers of 2016, 2017, and 2018 in the field of computing using the LDA algorithm, one of Probabilistic Topic Model Techniques, in order to analyze the rapidly changing trends and keep pace with the times. Then we analyze trends and flows of research.

A Text Mining Study on Endangered Wildlife Complaints - Discovery of Key Issues through LDA Topic Modeling and Network Analysis - (멸종위기 야생생물 민원 텍스트 마이닝 연구 - LDA 토픽 모델링과 네트워크 분석을 통한 주요 이슈 발굴 -)

  • Kim, Na-Yeong;Nam, Hee-Jung;Park, Yong-Su
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.26 no.6
    • /
    • pp.205-220
    • /
    • 2023
  • This study aimed to analyze the needs and interests of the public on endangered wildlife using complaint big data. We collected 1,203 complaints and their corresponding text data on endangered wildlife, pre-processed them, and constructed a document-term matrix for 1,739 text data. We performed LDA (Latent Dirichlet Allocation) topic modeling and network analysis. The results revealed that the complaints on endangered wildlife peaked in June-August, and the interest shifted from insects to various endangered wildlife in the living area, such as mammals, birds, and amphibians. In addition, the complaints on endangered wildlife could be categorized into 8 topics and 5 clusters, such as discovery report, habitat protection and response request, information inquiry, investigation and action request, and consultation request. The co-occurrence network analysis for each topic showed that the keywords reflecting the call center reporting procedure, such as photo, send, and take, had high centrality in common, and other keywords such as dung beetle, know, absence and think played an important role in the network. Through this analysis, we identified the main keywords and their relationships within each topic and derived the main issues for each topic. This study confirmed the increasing and diversifying public interest and complaints on endangered wildlife and highlighted the need for professional response. We also suggested developing and extending participatory conservation plans that align with the public's preferences and demands. This study demonstrated the feasibility of using complaint big data on endangered wildlife and its implications for policy decision-making and public promotion on endangered wildlife.

Topic Model Augmentation and Extension Method using LDA and BERTopic (LDA와 BERTopic을 이용한 토픽모델링의 증강과 확장 기법 연구)

  • Kim, SeonWook;Yang, Kiduk
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.3
    • /
    • pp.99-132
    • /
    • 2022
  • The purpose of this study is to propose AET (Augmented and Extended Topics), a novel method of synthesizing both LDA and BERTopic results, and to analyze the recently published LIS articles as an experimental approach. To achieve the purpose of this study, 55,442 abstracts from 85 LIS journals within the WoS database, which spans from January 2001 to October 2021, were analyzed. AET first constructs a WORD2VEC-based cosine similarity matrix between LDA and BERTopic results, extracts AT (Augmented Topics) by repeating the matrix reordering and segmentation procedures as long as their semantic relations are still valid, and finally determines ET (Extended Topics) by removing any LDA related residual subtopics from the matrix and ordering the rest of them by F1 (BERTopic topic size rank, Inverse cosine similarity rank). AET, by comparing with the baseline LDA result, shows that AT has effectively concretized the original LDA topic model and ET has discovered new meaningful topics that LDA didn't. When it comes to the qualitative performance evaluation, AT performs better than LDA while ET shows similar performances except in a few cases.

Topic Modeling on Fine Dust Issues Using LDA Analysis (LDA 기법을 이용한 미세먼지 이슈의 토픽모델링 분석)

  • Yoon, soonuk;Kim, Minchul
    • Journal of Energy Engineering
    • /
    • v.29 no.2
    • /
    • pp.23-29
    • /
    • 2020
  • In this study, the last 10 years of news data on fine dust was collected and 80 topics are selected through LDA analysis. As a result, weather-related information made up the main words for the topic, and we can see that fine dust becomes a big issue below 10 degrees Celsius. The frequency of exposure to the media and the maximum concentration of fine dust are correlated with positive. Topics related to fine dust reduction measures and the government's comprehensive measures over the past decade, topics related to products such as air purifiers related to fine dust, topics related to policies protecting vulnerable people from fine dust, and topics on fine dust reduction through R&D were found to be major topics. Measures against fine dust as a social issue can be seen to be closely related to the government's policy.