• Title/Summary/Keyword: LDA Model

Search Result 166, Processing Time 0.02 seconds

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

  • Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.1-23
    • /
    • 2013
  • To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.

Topic Modeling Insomnia Social Media Corpus using BERTopic and Building Automatic Deep Learning Classification Model (BERTopic을 활용한 불면증 소셜 데이터 토픽 모델링 및 불면증 경향 문헌 딥러닝 자동분류 모델 구축)

  • Ko, Young Soo;Lee, Soobin;Cha, Minjung;Kim, Seongdeok;Lee, Juhee;Han, Ji Yeong;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.2
    • /
    • pp.111-129
    • /
    • 2022
  • Insomnia is a chronic disease in modern society, with the number of new patients increasing by more than 20% in the last 5 years. Insomnia is a serious disease that requires diagnosis and treatment because the individual and social problems that occur when there is a lack of sleep are serious and the triggers of insomnia are complex. This study collected 5,699 data from 'insomnia', a community on 'Reddit', a social media that freely expresses opinions. Based on the International Classification of Sleep Disorders ICSD-3 standard and the guidelines with the help of experts, the insomnia corpus was constructed by tagging them as insomnia tendency documents and non-insomnia tendency documents. Five deep learning language models (BERT, RoBERTa, ALBERT, ELECTRA, XLNet) were trained using the constructed insomnia corpus as training data. As a result of performance evaluation, RoBERTa showed the highest performance with an accuracy of 81.33%. In order to in-depth analysis of insomnia social data, topic modeling was performed using the newly emerged BERTopic method by supplementing the weaknesses of LDA, which is widely used in the past. As a result of the analysis, 8 subject groups ('Negative emotions', 'Advice and help and gratitude', 'Insomnia-related diseases', 'Sleeping pills', 'Exercise and eating habits', 'Physical characteristics', 'Activity characteristics', 'Environmental characteristics') could be confirmed. Users expressed negative emotions and sought help and advice from the Reddit insomnia community. In addition, they mentioned diseases related to insomnia, shared discourse on the use of sleeping pills, and expressed interest in exercise and eating habits. As insomnia-related characteristics, we found physical characteristics such as breathing, pregnancy, and heart, active characteristics such as zombies, hypnic jerk, and groggy, and environmental characteristics such as sunlight, blankets, temperature, and naps.

A Numerical Study of the Flow Field in the Combustion Chamber of the I.C Engine with Offset Valve (편심 밸브를 갖는 내연기관의 연소실 내부 유동장에 대한 수치적 연구)

  • 양희천;최영기;유홍선;고상근;허선무
    • Transactions of the Korean Society of Mechanical Engineers
    • /
    • v.16 no.8
    • /
    • pp.1552-1565
    • /
    • 1992
  • Three dimensional numerical calculations were carried out for two different combustion chambers with the offset valve in order to investigate the swirl and the squish effects on the flow fields. The modified K-.epsilon. turbulence model considering the change of the density under the condition of the rapid compression and expansion of the pistion was used. During the compression process, it was found that the squish flow which controls the subsequent combustion process was produced due to the piston bowl in the bowl piston type combustion chambers but not for the flat piston type. The swirl velocity close to the solid body rotation was maintained in the flat piston type combustion chambers, but for the bowl piston type a resulting from the change of the solid body rotation was generated in the radial-circumferential plane. For the swirl ratio effect, as the swirl ratio increases, it was found that a large and strong vortex was generated in the radial-circumferential plane of bowl piston type combustion chambers because of the strong inward flows from the combustion chamber wall. These computational results were compared with the results of LDA measurement.

User Experience Analysis and Management Based on Text Mining: A Smart Speaker Case (텍스트 마이닝 기반 사용자 경험 분석 및 관리: 스마트 스피커 사례)

  • Dine Yeon;Gayeon Park;Hee-Woong Kim
    • Information Systems Review
    • /
    • v.22 no.2
    • /
    • pp.77-99
    • /
    • 2020
  • Smart speaker is a device that provides an interactive voice-based service that can search and use various information and contents such as music, calendar, weather, and merchandise using artificial intelligence. Since AI technology provides more sophisticated and optimized services to users by accumulating data, early smart speaker manufacturers tried to build a platform through aggressive marketing. However, the frequency of using smart speakers is less than once a month, accounting for more than one third of the total, and user satisfaction is only 49%. Accordingly, the necessity of strengthening the user experience of smart speakers has emerged in order to acquire a large number of users and to enable continuous use. Therefore, this study analyzes the user experience of the smart speaker and proposes a method for enhancing the user experience of the smart speaker. Based on the analysis results in two stages, we propose ways to enhance the user experience of smart speakers by model. The existing research on the user experience of the smart speaker was mainly conducted by survey and interview-based research, whereas this study collected the actual review data written by the user. Also, this study interpreted the analysis result based on the smart speaker user experience dimension. There is an academic significance in interpreting the text mining results by developing the smart speaker user experience dimension. Based on the results of this study, we can suggest strategies for enhancing the user experience to smart speaker manufacturers.

Natural Convection in a Water Tank with a Heated Horizontal Plate Facing Downward (아래로 향한 수평가열판이 있는 수조에서의 자연대류)

  • Yang, Sun-Kyu;Chung, Moon-Ki;Helmut Hoffmann
    • Nuclear Engineering and Technology
    • /
    • v.27 no.3
    • /
    • pp.301-316
    • /
    • 1995
  • experimental and computational studies ore carried out to investigate the natural convection of the single phase flow in a tank with a heated horizontal plate facing downward. This is a simplified model for investigations of the influence of a core melt at the bottom of a reactor vessel on the thermal hydraulic behavior in a oater filled cavity surrounding the vessel. In this case the vessel is simulated by a hexahedron insulated box with a heated plate Horizontally mounted at the bottom of the box. The box with the heated plate is installed in a water filled hexahedron tank. Coolers are immersed in the U-type water volume between the box and the tank. Although the multicomponent flows exist more probably below the heated plate in reality, present study concentrates on the single phase flow in a first step prior to investigating the complicated multicomponent thermal hydraulic phenomena. In the present study, in order to get a better understanding for the natural convection characteristics below the heated plate, the velocity and temperature are measured by LDA(Laser Doppler Anemometry) and thermocouples, respectively. And How fields are visualized by taking pictures of the How region with suspended particles. The results show the occurrence of a very effective circulation of the fluid in the whole How area as the heater and coolers are put into operation. In the remote region below the heated plate the new is nearly stagnant, and a remarkable temperature stratification can be observed with very thin thermal boundary. Analytical predictions using the FLUTAN code show a reasonable matching of the measured velocity fields.

  • PDF

An Exploratory research on patent trends and technological value of Organic Light-Emitting Diodes display technology (Organic Light-Emitting Diodes 디스플레이 기술의 특허 동향과 기술적 가치에 관한 탐색적 연구)

  • Kim, Mingu;Kim, Yongwoo;Jung, Taehyun;Kim, Youngmin
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.135-155
    • /
    • 2022
  • This study analyzes patent trends by deriving sub-technical fields of Organic Light-Emitting Diodes (OLEDs) industry, and analyzing technology value, originality, and diversity for each sub-technical field. To collect patent data, a set of international patent classification(IPC) codes related to OLED technology was defined, and OLED-related patents applied from 2005 to 2017 were collected using a set of IPC codes. Then, a large number of collected patent documents were classified into 12 major technologies using the Latent Dirichlet Allocation(LDA) topic model and trends for each technology were investigated. Patents related to touch sensor, module, image processing, and circuit driving showed an increasing trend, but virtual reality and user interface recently decreased, and thin film transistor, fingerprint recognition, and optical film showed a continuous trend. To compare the technological value, the number of forward citations, originality, and diversity of patents included in each technology group were investigated. From the results, image processing, user interface(UI) and user experience(UX), module, and adhesive technology with high number of forward citations, originality and diversity showed relatively high technological value. The results provide useful information in the process of establishing a company's technology strategy.