• 제목/요약/키워드: Topic Modeling(LDA)

Search Result 292, Processing Time 0.027 seconds

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

  • Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.1-23
    • /
    • 2013
  • To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.

Rural Tourism Image and Major Activity Space in Gochang County Shown in Social Data - Focusing on the Keyword 'Gochang-gun Travel' - (소셜데이터에 나타난 고창군의 농촌관광 이미지와 주요 활동공간 - '고창군 여행' 키워드를 중심으로 -)

  • Kim, Young-Jin;Son, Gwangryul;Lee, Dongchae;Son, Yong-hoon
    • Journal of Korean Society of Rural Planning
    • /
    • v.27 no.3
    • /
    • pp.103-116
    • /
    • 2021
  • In this study, the characteristics of rural tourism image perceived by urban residents were analyzed through text analysis of blog data. In order to examine the images related to rural tourism, blog data written with the keyword "Gochang-gun travel" was used. LDA topic analysis, one of the text mining techniques, was used for the analysis. In the tourism image of Gochang-gun, 9 topics were derived, and 112 major places appeared. This was divided into 3 main activities and 5 object spaces through the review of keywords and the original text of blog data. As a result of the analysis, the traditional main resources of the region, Seonun mountain, Seonun temple, and Gochang-eup fortress, formed topic. On the other hand, world heritage such as dolmen and Ungok wetland did not appear as topic. In particular, the farms operated by the private sector form individual topics, and the theme farm can be seen as an important resource for tourism in Gochang-gun. Also, through the distribution of place keywords, it was possible to understand the characteristics of travel by region and the usage behavior of visitors. In the case of Gochang-gun, there was a phenomenon in which visitors were biased by region. This seems to be the result of Gochang-gun seeking to vitalize local tourism focusing on natural, ecological, and scenic resources. It is necessary to establish a plan for balanced regional development and develop other types of tourism resources. This study is different in that it identified the types and characteristics of rural tourism images in the region perceived by visitors, and the status of tourism at the regional level.

Topic modeling based similar user grouping and TV program recommendation for Smart TV (토픽 모델링을 이용한 유사 시청 사용자 그룹핑 및 TV 프로그램 추천 알고리듬)

  • Pyo, Shinjee;Kim, EunHui;Kim, Munchurl
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2012.07a
    • /
    • pp.117-120
    • /
    • 2012
  • 본 논문에서는 토픽 모델링 기반 TV 프로그램 유사 시청 사용자 그룹핑 및 이를 이용한 TV 프로그램 콘텐츠 추천 알고리듬을 제안하였다. 제안 기술은 토픽 모델링 기법 중 Latent Dirichlet Allocation(LDA) 방법을 이용하여 TV프로그램 시청 기록 내에서 은닉된 유사 사용자들을 그룹핑하고 이러한 유사 시청 사용자 그룹 정보를 이용하여 사용자에게 선호 TV 프로그램 콘텐츠를 자동으로 추천하는 알고리듬이다. 제안된 자동 추천 알고리듬의 성능평가를 위해 실제 TV 시청기록 데이터를 이용하여 훈련 기간과 검증 기간을 나누어 훈련 기간 동안 제안한 알고리듬을 이용하여 사용자 개인에 대한 추천 TV 프로그램 콘텐츠 목록을 생성하여 검증 기간 동안에 실제 추천된 TV프로그램을 얼마나 시청했는지를 측정하여 추천 정확도를 검증하였다.

  • PDF

Efficient Method for Image Representation Using Topic Modeling (토픽 모델링을 이용한 이미지의 효율적인 표현방법)

  • Lee, Ba-Do;Zhang, Byoung-Tak
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06c
    • /
    • pp.319-322
    • /
    • 2011
  • 시각 피처를 사용한 이미지 표현은 이미지 검색 분야에서 이미 광범위하게 사용되고 있다. 특히 이미지 자체에 태깅이 되어있지 않거나 다른 추가 정보가 없는 경우에는 이미지 콘텐츠자체의 정보만으로 검색하기 위해서는 이러한 전처리가 필수적이다. 이미지로 부터 얻어진 시각적 피처들이 시각 단어로 사용되기 위해서는 k-means 와 같은 군집 알고리즘을 통한 시각적 피처의 양자화를 위한 전처리가 필요한데, 시각 단어의 개수 k를 정하는데 모호함이 있다. 본 논문에서는 임의의 k를 사용하더라도, 대표적 토픽 모델링 기법인 LDA (Latent Dirichlet Allocation)를 사용하여 데이터의 차원을 줄이게 되면 여러개의 시각적 단어들의 조합을 각각의 토픽이 나타낼 수 있게 됨을 이미지 검색 성능으로써 확인해 보고, 이러한 방법을 사용하면 표현형의 사이즈를 줄일 수 있고, 검색에 있어서도 이미지의 유사성을 더욱 효과적으로 표현할 수 있음을 확인해 본다.

A Study on Automatic Analysis System of National Defense Articles (국방 기사 자동 분석 시스템 구축 방안 연구)

  • Kim, Hyunjung;Kim, Wooju
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.21 no.1
    • /
    • pp.86-93
    • /
    • 2018
  • Since media articles, which have a great influence on public opinion, are transmitted to the public through various media, it is very difficult to analyze them manually. There are many discussions on methods that can collect, process, and analyze documents in the academia, but this is mostly done in the areas related to politics and stocks, and national-defense articles are poorly researched. In this study, we will explain how to build an automatic analysis system of national defense articles that can collect information on defense articles automatically, and can process information quickly by using topic modeling with LDA, emotional analysis, and extraction-based text summarization.

Identifying Issue Changes of AI Chatbot 'Iruda' Case and Its Implications (AI 챗봇 '이루다' 논란의 이슈 변화와 시사점)

  • Choi, S.S.;Hong, A.R.
    • Electronics and Telecommunications Trends
    • /
    • v.36 no.2
    • /
    • pp.93-101
    • /
    • 2021
  • The controversy over Artificial Intelligence (AI) chatbot "Iruda," which suspended its service 20 days after its launch, can be seen as the first case to inform the public of AI ethics issues. Based on this context, this study examines the controversy and social semantic formation of "Iruda" service cases using news topic modeling techniques. 963-news articles were used for the analysis, and the event's duration was analyzed based on major events, such as service start, controversy, and suspension, to understand the progress. From the analyses results, we obtain major keywords and a total of 16 topics (5, 4, 7) from the period. Finally, the implications for the development and utilization of AI services obtained through this controversy were discussed based on the analysis results.

Keyword Extraction Technique for Attractions using Online Reviews - Topic Modeling and Markov Chain (온라인 리뷰를 활용한 관광지 키워드 추출 기법 - 토픽 모델링과 Markov Chain)

  • Kim, MyeongSeon;Lee, KangWoo;Lim, JiWon;Hong, Soon-Goo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.521-523
    • /
    • 2021
  • 관광 분야에서 온라인 리뷰의 중요성이 커지고 있다. 온라인 리뷰의 텍스트 데이터는 파악이 어렵다. 이에 본 연구에서는 특정 관광지에 대한 온라인 리뷰 텍스트 데이터가 나타내는 전반적인 의견을 직관적으로 도출하는 방법에 대해 알아보고자, 토픽 모델링과 Markov Chain을 시행했다. '해운대'에 대한 온라인 리뷰를 수집한 후, LDA와 BTM을 활용하여 주제를 도출하고, Markov Chain을 시각화하여 키워드 간의 관계와 전체적인 평가 내용을 확인했다. 사용된 기법은 각자 특징적인 결과를 제시했기 때문에 다양한 기법을 상보적으로 이용하기를 제안하였다.

Study of Mental Disorder Schizophrenia, based on Big Data

  • Hye-Sun Lee
    • International Journal of Advanced Culture Technology
    • /
    • v.11 no.4
    • /
    • pp.279-285
    • /
    • 2023
  • This study provides academic implications by considering trends of domestic research regarding therapy for Mental disorder schizophrenia and psychosocial. For the analysis of this study, text mining with the use of R program and social network analysis method have been used and 65 papers have been collected The result of this study is as follows. First, collected data were visualized through analysis of keywords by using word cloud method. Second, keywords such as intervention, schizophrenia, research, patients, program, effect, society, mind, ability, function were recorded with highest frequency resulted from keyword frequency analysis. Third, LDA (latent Dirichlet allocation) topic modeling result showed that classified into 3 keywords: patient, subjects, intervention of psychosocial, efficacy of interventions. Fourth, the social network analysis results derived connectivity, closeness centrality, betweennes centrality. In conclusion, this study presents significant results as it provided basic rehabilitation data for schizophrenia and psychosocial therapy through new research methods by analyzing with big data method by proposing the results through visualization from seeking research trends of schizophrenia and psychosocial therapy through text mining and social network analysis.

Text-Mining Analysis on the Interaction between the American Consumers Aged over 60 and Companion Pets Robots: Focused on Amazon Reviews for Joy For All Companion Pets (텍스트 마이닝을 활용한 미국 노년 소비자와 애완용 로봇 간 상호작용에 대한 분석: Joy For All Companion Pets에 대한 아마존 리뷰를 중심으로)

  • Chung, Yea-Eun;Lee, Yu Lim;Chung, Jae-Eun
    • Journal of Digital Convergence
    • /
    • v.19 no.10
    • /
    • pp.469-489
    • /
    • 2021
  • This study explores consumers' responses to socially assistive robotics by using text-mining method focusing on Companion Pets from Hasbro as it gives emotional support. We conducted text frequency analysis, LDA analysis using R programming. The key findings are 1)the most frequently used words the mimicry of living pets and the appearance of companion pets, 2)the five topics were derived from the LDA analysis and classified keywords in each topic split between positive and negative, 3)user, product, environment affect the interaction between consumer and companion pets, 4)consumers who have difficulty in cognition and physical conditions use companion pets to replace living pets. This study provides an understanding of consumer responses in companion pets and gives practical implications that may improve the efficacy of usage for consumers and understand the companion robot, which provides emotional support in COVID-19.

Accessibility Analysis Method based on Public Facility Attraction Index Using SNS Data (SNS 데이터를 이용한 공공시설 매력도지수에 따른 접근성 분석기법)

  • Lee, Ji Won;Yu, Ki Yun;Kim, Ji Young
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.1
    • /
    • pp.29-42
    • /
    • 2019
  • In order to expand the qualitative aspects of public facility, this study used SNS data to derive user-oriented preference factors for public facilities and then were quantified in terms of supply side and demand side. To derive preference factor, LDA, one of topic modeling, was used and attraction index was calculated for each facility. In addition we analyzed spatial accessibility to measure the degree of service experience of users by using 2SFCA model. The study area covered public libraries of Seoul, Korea. As a result of study, five topics were extracted as preference factors for the public library: Circumstance, Scale of facility, Cultural program, Parenting, Books and materials. In particular topic of circumstance and parenting were newly derived preference factors unknown in previous studies. As a result of calculating attraction index for each library, the index of Songpa Library, Jungdok Library, and Namsan Library was high. Songpa library has received good evaluation in parenting factor, and Jungdok & Namsan library in circumstance factor. The accessibility of each region seems to better in center of Seoul where public libraries are crowded, but shrinking toward the outskirts. We expect that the proposed method will contribute to user-oriented public facility evaluation and policy decision making.