• Title/Summary/Keyword: 토픽 추출

Search Result 211, Processing Time 0.027 seconds

Analyzing the Trend of False·Exaggerated Advertisement Keywords Using Text-mining Methodology (1990-2019) (텍스트마이닝 기법을 활용한 허위·과장광고 관련 기사의 트렌드 분석(1990-2019))

  • Kim, Do-Hee;Kim, Min-Jeong
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.4
    • /
    • pp.38-49
    • /
    • 2021
  • This study analyzed the trend of the term 'false and exaggerated advertisement' in 5,141 newspaper articles from 1990 to 2019 using text mining methodology. First of all, we identified the most frequent keywords of false and exaggerated advertisements through frequency analysis for all newspaper articles, and understood the context between the extracted keywords. Next, to examine how false and exaggerated advertisements have changed, the frequency analysis was performed by separating articles by 10 years, and the tendency of the keyword that became an issue was identified by comparing the number of academic papers on the subject of the highest keywords of each year. Finally, we identified trends in false and exaggerated advertisements based on the detailed keywords in the topic using the topic modeling. In our results, it was confirmed that the topic that became an issue at a specific time was extracted as the frequent keywords, and the keyword trends by period changed in connection with social and environmental factors. This study is meaningful in helping consumers spend wisely by cultivating background knowledge about unfair advertising. Furthermore, it is expected that the core keyword extraction will provide the true purpose of advertising and deliver its implications to companies and related employees who commit misconduct.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • Analysis of Research Topics among Library, Archives and Museums using Topic Modeling (토픽 모델링을 활용한 도서관, 기록관, 박물관간의 연구 주제 분석)

    • Kim, Heesop;Kang, Bora
      • Journal of Korean Library and Information Science Society
      • /
      • v.50 no.4
      • /
      • pp.339-358
      • /
      • 2019
    • The purpose of this study is to understand the topics of the research for the establishment of cooperative platform between libraries, archives, and museums that carry out the common task of providing knowledge information in a broad sense. To achieve the purpose of this study, 637 bibliographic information on three institutions were collected from the Web version of Scopus database. Among the collected bibliographic information, 5,218 words were extracted through NetMiner V.4 and analysed topic modeling. The results are as follows: First, as a result of analyzing the frequency of word appearance according to the tf-idf weight 'Preservation' was the most hottest topic. Second, the topic modeling analysis through LDA(Latent Dirichlet Allocation) algorithm resulted in 13 topic areas. Third, as a result of expressing 13 topic areas as a network, repository construction was the central topic, and the research topics such as cooperation among institutions, conservation environment for collections, system and policy discovery, life cycle of collections, exhibition of information resources, and information retrieval were closely related to the central topic. Fourth, the trend of 13 topic areas by year 1998 is limited to the specific subjects such as system and policy discovery, information retrieval, and life cycle of collections, while the subsequent studies have been carried out after that year.

    Big Data Analysis of Busan Civil Affairs Using the LDA Topic Modeling Technique (LDA 토픽모델링 기법을 활용한 부산시 민원 빅데이터 분석)

    • Park, Ju-Seop;Lee, Sae-Mi
      • Informatization Policy
      • /
      • v.27 no.2
      • /
      • pp.66-83
      • /
      • 2020
    • Local issues that occur in cities typically garner great attention from the public. While local governments strive to resolve these issues, it is often difficult to effectively eliminate them all, which leads to complaints. In tackling these issues, it is imperative for local governments to use big data to identify the nature of complaints, and proactively provide solutions. This study applies the LDA topic modeling technique to research and analyze trends and patterns in complaints filed online. To this end, 9,625 cases of online complaints submitted to the city of Busan from 2015 to 2017 were analyzed, and 20 topics were identified. From these topics, key topics were singled out, and through analysis of quarterly weighting trends, four "hot" topics(Bus stops, Taxi drivers, Praises, and Administrative handling) and four "cold" topics(CCTV installation, Bus routes, Park facilities including parking, and Festivities issues) were highlighted. The study conducted big data analysis for the identification of trends and patterns in civil affairs and makes an academic impact by encouraging follow-up research. Moreover, the text mining technique used for complaint analysis can be used for other projects requiring big data processing.

    Text Mining Driven Content Analysis of Ebola on News Media and Scientific Publications (텍스트 마이닝을 이용한 매체별 에볼라 주제 분석 - 바이오 분야 연구논문과 뉴스 텍스트 데이터를 이용하여 -)

    • An, Juyoung;Ahn, Kyubin;Song, Min
      • Journal of the Korean Society for Library and Information Science
      • /
      • v.50 no.2
      • /
      • pp.289-307
      • /
      • 2016
    • Infectious diseases such as Ebola virus disease become a social issue and draw public attention to be a major topic on news or research. As a result, there have been a lot of studies on infectious diseases using text-mining techniques. However, there is no research on content analysis of two media channels that have distinct characteristics. Accordingly, in this study, we conduct topic analysis between news (representing a social perspective) and academic research paper (representing perspectives of bio-professionals). As text-mining techniques, topic modeling is applied to extract various topics according to the materials, and the word co-occurrence map based on selected bio entities is used to compare the perspectives of the materials specifically. For network analysis, topic map is built by using Gephi. Aforementioned approaches uncovered the difference of topics between two materials and the characteristics of the two materials. In terms of the word co-occurrence map, however, most of entities are shared in both materials. These results indicate that there are differences and commonalties between social and academic materials.

    A Study on the Research Trends on Domestic Platform Government using Topic Modeling (토픽 모델링을 활용한 한국의 플랫폼정부 연구동향 분석)

    • Suh, Byung-Jo;Shin, Sun-Young
      • Informatization Policy
      • /
      • v.24 no.3
      • /
      • pp.3-26
      • /
      • 2017
    • The amount of unstructured data generated online is increasing exponentially and the analysis of text data is being done in various fields. In order to identify the research trends on the platform government, the title, year, academic society, and abstract information of the academic papers on the subject of platform government were collected from the database of the domestic papers, DBPIA(www.dbpia.co.kr). The results of the existing research on the platform government and related fields were analyzed based on each stage of the national informatization promotion. The technology, service, and governance topics were extracted from papers on platform government and the trends of core topics were analyzed by year. Entering the era of the intelligent information society, this study has significance for providing the basis for defining a new role of government - the platform government that sets the stage for the private sector to lead the innovation, and plays the role of an 'enabler' and 'facilitator' instead. The purpose of this study is to understand the platform government research through objective analysis of its trends. Looking for future directions, this study will contribute to future research by providing reference materials.

    Analyzing Research Trends of Domestic Artificial Intelligence Research Using Network Analysis and Dynamic Topic Modelling (네트워크 분석과 동적 토픽모델링을 활용한 국내 인공지능 분야 연구동향 분석)

    • Jung, Woojin;Oh, Chanhee;Zhu, Yongjun
      • Journal of the Korean Society for Library and Information Science
      • /
      • v.55 no.4
      • /
      • pp.141-157
      • /
      • 2021
    • In this study, we aimed to understand research trends of domestic artificial intelligence research. To achieve the goal, we applied network analysis and dynamic topic modeling to domestic research papers on artificial intelligence. Among the papers that have been indexed in KCI (Korean Journal of Citation Index) by 2020, metadata and abstracts of 2,552 papers where the titles or indexed keywords include 'artificial intelligence' both in Korean and English were collected. Keyword, affiliation, subject field, and abstract were extracted and preprocessed for further analyses. We identified main keywords in the field by analyzing keyword co-occurrence networks as well as the degree and characteristics of research collaboration between domestic and foreign institutions and between industry and university by analyzing institutional collaboration networks. Dynamic topic modeling was performed on 1845 abstracts written in Korean, and 13 topics were obtained from the labeling process. This study broadens the understanding of domestic artificial intelligence research by identifying research trends through dynamic topic modeling from abstracts as well as the degree and characteristics of research collaboration through institutional collaboration networks from author affiliation information. In addition, the results of this study can be used by governmental institutions for making policies in accordance with artificial intelligence era.

    Problem Identification and Improvement Measures through Government24 App User Review Analysis: Insights through Topic Model (정부24 앱 사용자 리뷰 분석을 통한 문제 파악 및 개선방안: 토픽 모델을 통한 통찰)

    • MuMoungCho Han;Mijin Noh;YangSok Kim
      • Smart Media Journal
      • /
      • v.12 no.11
      • /
      • pp.27-35
      • /
      • 2023
    • Fourth Industrial Revolution and COVID-19 pandemic have boosted the use of Government 24 app for public service complaints in the era of non-face-to-face interactions. there has been a growing influx of complaints and improvement demands from users of public apps. Furthermore, systematic management of public apps is deemed necessary. The aim of this study is to analyze the grievances of Government 24 app users, understand the current dissatisfaction among citizens, and propose potential improvements. Data were collected from the Google Play Store from May 2, 2013, to June 30, 2023, comprising a total of 6,344 records. Among these, 1,199 records with a rating of 1 and at least one 'thumbs-up' were used for topic modeling analysis. The analysis revealed seven topics: 'Issues with certificate issuance,' 'Website functionality and UI problems,' 'User ID-related issues,' 'Update problems,' 'Government employee app management issues,' 'Budget wastage concerns ((It's not worth even a single star) or (It's a waste of taxpayers' money)),' and 'Password-related problems.' Furthermore, the overall trend of these topics showed an increase until 2021, a slight decrease in 2022, but a resurgence in 2023, underscoring the urgency of updates and management. We hope that the results of this study will contribute to the development and management of public apps that satisfy citizens in the future.

    PoMEN based Latent One-Class SVM (PoMEN 기반의 Latent One-Class SVM)

    • Lee, Changki
      • Annual Conference on Human and Language Technology
      • /
      • 2012.10a
      • /
      • pp.8-11
      • /
      • 2012
    • One-class SVM은 데이터가 존재하는 영역을 추출하고, 이 영역을 서포트 벡터로 표현하며 표현된 영역 밖의 데이터들은 아웃라이어(outlier)로 간주된다. 본 논문에서는 데이터 포인트마다 숨겨진 변수(hidden variable) 혹은 토픽이 있다고 가정하고, 이를 반영하기 위해 PoMEN에 기반한 Latent One-class SVM을 제안한다. 실험결과 Latent One-class SVM이 대부분의 구간에서 One-class SVM 보다 성능이 높았으며, 특히 높은 정확율을 요구하는 경우에 더욱 효과적임을 알 수 있었다.

    • PDF

    A Study on Opinion Mining of Newspaper Texts based on Topic Modeling (토픽 모델링을 이용한 신문 자료의 오피니언 마이닝에 대한 연구)

    • Kang, Beomil;Song, Min;Jho, Whasun
      • Journal of the Korean Society for Library and Information Science
      • /
      • v.47 no.4
      • /
      • pp.315-334
      • /
      • 2013
    • This study performs opinion mining of newspaper articles, based on topics extracted by topic modeling. We analyze the attitudes of the news media towards a major issue of 'presidential election', assuming that newspaper partisanship is a kind of opinion. We first extract topics from a large collection of newspaper texts, and examine how the topics are distributed over the entire dataset. The structure and content of each topic are then investigated by means of network analysis. Finally we track down the chronological distribution of the topics in each of the newspapers through time serial analysis. The result reveals that both the liberal newspapers and the conservative newspapers exhibit their own tendency to report in line with their adopted ideology. This confirms that we can count on opinion mining technique based on topics in order to analyze opinion in a reliable fashion.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.