• Title/Summary/Keyword: structured topic Modeling

Search Result 16, Processing Time 0.022 seconds

A Study on the Trends of Construction Safety Accident in Unstructured Text Using Topic Modeling (비정형 텍스트 기반의 토픽 모델링을 이용한 건설 안전사고 동향 분석)

  • Lee, Sang-Gyu
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.10
    • /
    • pp.176-182
    • /
    • 2018
  • In order to understand and track the trends of construction safety accident, this study shows the topic trends in the construction safety accident with LDA(Latent Dirichlet Allocation)-based topic modeling method for data analytics. Especially, it performs to figure out the main issue of construction safety accident with unstructured data analysis based on the topic modeling rather than a variety of structured data analysis for preventing to safety accident in construction industry. To apply this methodology, I randomly collected to 540 news article data about construction accident from January 2017 to February 2018. Based on the unstructured data with the LDA-based topic modeling, I found the 10 topics and identified key issues through 10 keyword in each 10 topics. I forecasted the topic issue related to construction safety accident based on analysis of time-series trends about the news data from January 2017 to February 2018. With this method, this research gives a hint about ways of using unstructured news article data to anticipate safety policy and research field and to respond to construction accident safety issues in the future.

A Study on the News Frame of COVID-19 Vaccine through Structural Topic Modeling and Semantic Network Analysis

  • Eun-Ji Yun;Bo-Young Kang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.5
    • /
    • pp.129-153
    • /
    • 2023
  • This study was conducted in the context of the Covid-19 pandemic by analyzing a large amount of press report frames regarding the Covid-19 vaccine which is of great public interest, in order to explore the role and direction of trusted media as core elements of crisis communication. The study period lasted for eight months beginning in November 2020 when the development of the Covid-19 vaccine was in progress until June 2021. Set-up as research subjects were the Chosun Ilbo, Joongang Ilbo, Dong-A Ilbo and Hankyoreh according to their public confidence rankings and number of readers.The analysis method used structured topic Modeling (STM) and semantic network analysis. As a result, based on a clear cluster of word structures and a central analysis value, a total of 64 relevant frames, 16 for each news company, were gathered. In the third phase a comparative analysis of the four news companies was carried out to verify the organizational degree of the frames and substantial differences.

Investigations on Techniques and Applications of Text Analytics (텍스트 분석 기술 및 활용 동향)

  • Kim, Namgyu;Lee, Donghoon;Choi, Hochang;Wong, William Xiu Shun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.471-492
    • /
    • 2017
  • The demand and interest in big data analytics are increasing rapidly. The concepts around big data include not only existing structured data, but also various kinds of unstructured data such as text, images, videos, and logs. Among the various types of unstructured data, text data have gained particular attention because it is the most representative method to describe and deliver information. Text analysis is generally performed in the following order: document collection, parsing and filtering, structuring, frequency analysis, and similarity analysis. The results of the analysis can be displayed through word cloud, word network, topic modeling, document classification, and semantic analysis. Notably, there is an increasing demand to identify trending topics from the rapidly increasing text data generated through various social media. Thus, research on and applications of topic modeling have been actively carried out in various fields since topic modeling is able to extract the core topics from a huge amount of unstructured text documents and provide the document groups for each different topic. In this paper, we review the major techniques and research trends of text analysis. Further, we also introduce some cases of applications that solve the problems in various fields by using topic modeling.

Machine Learning Method in Medical Education: Focusing on Research Case of Press Frame on Asbestos (의학교육에서 기계학습방법 교육: 석면 언론 프레임 연구사례를 중심으로)

  • Kim, Junhewk;Heo, So-Yun;Kang, Shin-Ik;Kim, Geon-Il;Kang, Dongmug
    • Korean Medical Education Review
    • /
    • v.19 no.3
    • /
    • pp.158-168
    • /
    • 2017
  • There is a more urgent call for educational methods of machine learning in medical education, and therefore, new approaches of teaching and researching machine learning in medicine are needed. This paper presents a case using machine learning through text analysis. Topic modeling of news articles with the keyword 'asbestos' were examined. Two hypotheses were tested using this method, and the process of machine learning of texts is illustrated through this example. Using an automated text analysis method, all the news articles published from January 1, 1990 to November 15, 2016 in South Korea which included 'asbestos' in the title and the body were collected by web scraping. Differences in topics were analyzed by structured topic modelling (STM) and compared by press companies and periods. More articles were found in liberal media outlets. Differences were found in the number and types of topics in the articles according to the partisanship and period. STM showed that the conservative press views asbestos as a personal problem, while the progressive press views asbestos as a social problem. A divergence in the perspective for emphasizing the issues of asbestos between the conservative press and progressive press was also found. Social perspective influences the main topics of news stories. Thus, the patients' uneasiness and pain are not presented by both sources of media. In addition, topics differ between news media sources based on partisanship, and therefore cause divergence in readers' framing. The method of text analysis and its strengths and weaknesses are explained, and an application for the teaching and researching of machine learning in medical education using the methodology of text analysis is considered. An educational method of machine learning in medical education is urgent for future generations.

Topic Automatic Extraction Model based on Unstructured Security Intelligence Report (비정형 보안 인텔리전스 보고서 기반 토픽 자동 추출 모델)

  • Hur, YunA;Lee, Chanhee;Kim, Gyeongmin;Lim, HeuiSeok
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.6
    • /
    • pp.33-39
    • /
    • 2019
  • As cyber attack methods are becoming more intelligent, incidents such as security breaches and international crimes are increasing. In order to predict and respond to these cyber attacks, the characteristics, methods, and types of attack techniques should be identified. To this end, many security companies are publishing security intelligence reports to quickly identify various attack patterns and prevent further damage. However, the reports that each company distributes are not structured, yet, the number of published intelligence reports are ever-increasing. In this paper, we propose a method to extract structured data from unstructured security intelligence reports. We also propose an automatic intelligence report analysis system that divides a large volume of reports into sub-groups based on their topics, making the report analysis process more effective and efficient.

Classification and Recommendation of Scene Templates for PR Video Making Service based on Strategic Meta Information (홍보동영상 제작 서비스를 위한 전략메타정보 기반 장면템플릿 분류 및 추천)

  • Park, Jongbin;Lee, Han-Duck;Kim, Kyung-Won;Jung, Jong-Jin;Lim, Tae-Beom
    • Journal of Broadcast Engineering
    • /
    • v.20 no.6
    • /
    • pp.848-861
    • /
    • 2015
  • In this paper, we introduce a new web-based PR video making service system. Many video editing tools have required tough editing skill or scenario planning stage for a just simple PR video making. Some users may prefer a simple and fast way than sophisticated and complex functionality. To solve this problem, it is important to provide easy user interface and intelligent classification and recommendation scheme. Therefore, we propose a new template classification and recommendation scheme using a topic modeling method. The proposed scheme has the big advantage of being able to handle the unstructured meta data as well as structured one.

A Study on the Evaluation of Importance of Factors Affecting the Vessel Value (선박가치 변화요인에 관한 중요도 평가 연구)

  • Choi, Jung-Suk;Namgung, Ho
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.28 no.1
    • /
    • pp.91-99
    • /
    • 2022
  • The shipping industry is a service industry that operates its business by transporting cargo on ships and receiving freight. Therefore, large-scale capital investment is required for ship operation, and if the value of the ship is uncertain, the risk of shipping management increases. This study aims to identify the factors affecting changes in ship value and to analyze the importance of each variable. To achieve the goal, the factors affecting changes in ship value were identified and structured using the techniques of text mining and topic modeling, and classified into three main factors and 12 sub-factors. This study used AHP analysis to examine the relative importance of each factor. Results indicated that the main factor influencing the change in the vessel value was the shipping factor, followed by the investment factor and the environment factor. Other auxiliary factors that substantially affect the ship value include the volatility of the shipping market and of shipping freight.

Risk Assessment of the Ship′s Collision using Formal Safety Assessment Methodology (공식안전평가시스템에 의한 선박 충돌사고 위험성 평가에 관한 연구( I ))

  • 양원재;전승환;금종수
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.7 no.3
    • /
    • pp.61-74
    • /
    • 2001
  • The prevention of marine accidents has been a major topic in marine society and various policies and countermeasures has been developed, applied to the industries. Formal Safety Assessment is a structured and systematic methodology, aimed at enhancing maritime safety, including protection of life, health, the marine environment and property, by using risk and cost-benefit assessments. In addition, it provides a means of being proactive, enabling potential hazards to be considered before a serious accident occurs. In this paper, we has been screening and ranking of hazards using fuzzy structural modeling method and quantitative risk assessment for the ship's collision in the last 10 years marine accidents.

  • PDF

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • Improving Performance of Recommendation Systems Using Topic Modeling (사용자 관심 이슈 분석을 통한 추천시스템 성능 향상 방안)

    • Choi, Seongi;Hyun, Yoonjin;Kim, Namgyu
      • Journal of Intelligence and Information Systems
      • /
      • v.21 no.3
      • /
      • pp.101-116
      • /
      • 2015
    • Recently, due to the development of smart devices and social media, vast amounts of information with the various forms were accumulated. Particularly, considerable research efforts are being directed towards analyzing unstructured big data to resolve various social problems. Accordingly, focus of data-driven decision-making is being moved from structured data analysis to unstructured one. Also, in the field of recommendation system, which is the typical area of data-driven decision-making, the need of using unstructured data has been steadily increased to improve system performance. Approaches to improve the performance of recommendation systems can be found in two aspects- improving algorithms and acquiring useful data with high quality. Traditionally, most efforts to improve the performance of recommendation system were made by the former approach, while the latter approach has not attracted much attention relatively. In this sense, efforts to utilize unstructured data from variable sources are very timely and necessary. Particularly, as the interests of users are directly connected with their needs, identifying the interests of the user through unstructured big data analysis can be a crew for improving performance of recommendation systems. In this sense, this study proposes the methodology of improving recommendation system by measuring interests of the user. Specially, this study proposes the method to quantify interests of the user by analyzing user's internet usage patterns, and to predict user's repurchase based upon the discovered preferences. There are two important modules in this study. The first module predicts repurchase probability of each category through analyzing users' purchase history. We include the first module to our research scope for comparing the accuracy of traditional purchase-based prediction model to our new model presented in the second module. This procedure extracts purchase history of users. The core part of our methodology is in the second module. This module extracts users' interests by analyzing news articles the users have read. The second module constructs a correspondence matrix between topics and news articles by performing topic modeling on real world news articles. And then, the module analyzes users' news access patterns and then constructs a correspondence matrix between articles and users. After that, by merging the results of the previous processes in the second module, we can obtain a correspondence matrix between users and topics. This matrix describes users' interests in a structured manner. Finally, by using the matrix, the second module builds a model for predicting repurchase probability of each category. In this paper, we also provide experimental results of our performance evaluation. The outline of data used our experiments is as follows. We acquired web transaction data of 5,000 panels from a company that is specialized to analyzing ranks of internet sites. At first we extracted 15,000 URLs of news articles published from July 2012 to June 2013 from the original data and we crawled main contents of the news articles. After that we selected 2,615 users who have read at least one of the extracted news articles. Among the 2,615 users, we discovered that the number of target users who purchase at least one items from our target shopping mall 'G' is 359. In the experiments, we analyzed purchase history and news access records of the 359 internet users. From the performance evaluation, we found that our prediction model using both users' interests and purchase history outperforms a prediction model using only users' purchase history from a view point of misclassification ratio. In detail, our model outperformed the traditional one in appliance, beauty, computer, culture, digital, fashion, and sports categories when artificial neural network based models were used. Similarly, our model outperformed the traditional one in beauty, computer, digital, fashion, food, and furniture categories when decision tree based models were used although the improvement is very small.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.