• Title/Summary/Keyword: news big data

Search Result 287, Processing Time 0.028 seconds

A MVC Framework for Visualizing Text Data (텍스트 데이터 시각화를 위한 MVC 프레임워크)

  • Choi, Kwang Sun;Jeong, Kyo Sung;Kim, Soo Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.39-58
    • /
    • 2014
  • As the importance of big data and related technologies continues to grow in the industry, it has become highlighted to visualize results of processing and analyzing big data. Visualization of data delivers people effectiveness and clarity for understanding the result of analyzing. By the way, visualization has a role as the GUI (Graphical User Interface) that supports communications between people and analysis systems. Usually to make development and maintenance easier, these GUI parts should be loosely coupled from the parts of processing and analyzing data. And also to implement a loosely coupled architecture, it is necessary to adopt design patterns such as MVC (Model-View-Controller) which is designed for minimizing coupling between UI part and data processing part. On the other hand, big data can be classified as structured data and unstructured data. The visualization of structured data is relatively easy to unstructured data. For all that, as it has been spread out that the people utilize and analyze unstructured data, they usually develop the visualization system only for each project to overcome the limitation traditional visualization system for structured data. Furthermore, for text data which covers a huge part of unstructured data, visualization of data is more difficult. It results from the complexity of technology for analyzing text data as like linguistic analysis, text mining, social network analysis, and so on. And also those technologies are not standardized. This situation makes it more difficult to reuse the visualization system of a project to other projects. We assume that the reason is lack of commonality design of visualization system considering to expanse it to other system. In our research, we suggest a common information model for visualizing text data and propose a comprehensive and reusable framework, TexVizu, for visualizing text data. At first, we survey representative researches in text visualization era. And also we identify common elements for text visualization and common patterns among various cases of its. And then we review and analyze elements and patterns with three different viewpoints as structural viewpoint, interactive viewpoint, and semantic viewpoint. And then we design an integrated model of text data which represent elements for visualization. The structural viewpoint is for identifying structural element from various text documents as like title, author, body, and so on. The interactive viewpoint is for identifying the types of relations and interactions between text documents as like post, comment, reply and so on. The semantic viewpoint is for identifying semantic elements which extracted from analyzing text data linguistically and are represented as tags for classifying types of entity as like people, place or location, time, event and so on. After then we extract and choose common requirements for visualizing text data. The requirements are categorized as four types which are structure information, content information, relation information, trend information. Each type of requirements comprised with required visualization techniques, data and goal (what to know). These requirements are common and key requirement for design a framework which keep that a visualization system are loosely coupled from data processing or analyzing system. Finally we designed a common text visualization framework, TexVizu which is reusable and expansible for various visualization projects by collaborating with various Text Data Loader and Analytical Text Data Visualizer via common interfaces as like ITextDataLoader and IATDProvider. And also TexVisu is comprised with Analytical Text Data Model, Analytical Text Data Storage and Analytical Text Data Controller. In this framework, external components are the specifications of required interfaces for collaborating with this framework. As an experiment, we also adopt this framework into two text visualization systems as like a social opinion mining system and an online news analysis system.

Trend Analysis of Fraudulent Claims by Long Term Care Institutions for the Elderly using Text Mining and BIGKinds (텍스트 마이닝과 빅카인즈를 활용한 노인장기요양기관 부당청구 동향 분석)

  • Youn, Ki-Hyok
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.2
    • /
    • pp.13-24
    • /
    • 2022
  • In order to explore the context of fraudulent claims and the measures for preventing them targeting the long-term care institutions for the elderly, which is increasing every year in Korea, this study conducted the text mining analysis using the media report articles. The media report articles were collected from the news big data analysis system called 'BIG KINDS' for about 15 years from July 2008 when the Long-Term Care Insurance for the Elderly took effect, to February 28th 2022. During this period of time, total 2,627 articles were collected under keywords like 'elderly care+fraudulent claims' and 'long-term care+fraudulent claims', and among them, total 946 articles were selected after excluding overlapped articles. In the results of the text mining analysis in this study, first, the top 10 keywords mentioned in the highest frequency in every section(July 1st 2008-February 28th 2022) were shown in the order of long-term care institution for the elderly, fraudulent claims, National Health Insurance Service, Long-Term Care Insurance for the Elderly, long-term care benefits(expenses), elderly care facilities, The Ministry of Health & Welfare, the elderly, report, and reward(payment). Second, in the results of the N-gram analysis, they were shown in the order of long-term care benefits(expenses) and fraudulent claims, fraudulent claims and long-care institution for the elderly, falsehood and fraudulent claims, report and reward(payment), and long-term care institution for the elderly and report. Third, the analysis of TF-IDF was similar to the results of the frequency analysis while the rankings of report, reward(payment), and increase moved up. Based on such results of the analysis above, this study presented the future direction for the prevention of fraudulent claims of long-term care institutions for the elderly.

Digtal Healthcare Research Trend based on Social Media Data (소셜미디어 데이터에 기반한 디지털 헬스케어 연구 동향)

  • Lee, Taekkyeun
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.3
    • /
    • pp.515-526
    • /
    • 2020
  • Digital healthcare is a combined area of medical field and IT and various information on digital healthcare is provided in social media. This study aims to find the research trend of digital healthcare by collecting and analyzing data related to digital healthcare through the social media. The data were collected from Naver and Daum's news and blogs from January 2008 to June 2019. Major keywords with high frequency were extracted and visualized with wordcloud and network analysis was used to analyze the relationship between major keywords. Research combining medical field and IT from 2008 to 2001, various convergence research based on medical field and IT from 2012 to 2015, convergence research that applied the 4th industrial revolution technologies such as big data, blockchain and AI were actively conducted from 2016 to June 2019.

An Efficient Damage Information Extraction from Government Disaster Reports

  • Shin, Sungho;Hong, Seungkyun;Song, Sa-Kwang
    • Journal of Internet Computing and Services
    • /
    • v.18 no.6
    • /
    • pp.55-63
    • /
    • 2017
  • One of the purposes of Information Technology (IT) is to support human response to natural and social problems such as natural disasters and spread of disease, and to improve the quality of human life. Recent climate change has happened worldwide, natural disasters threaten the quality of life, and human safety is no longer guaranteed. IT must be able to support tasks related to disaster response, and more importantly, it should be used to predict and minimize future damage. In South Korea, the data related to the damage is checked out by each local government and then federal government aggregates it. This data is included in disaster reports that the federal government discloses by disaster case, but it is difficult to obtain raw data of the damage even for research purposes. In order to obtain data, information extraction may be applied to disaster reports. In the field of information extraction, most of the extraction targets are web documents, commercial reports, SNS text, and so on. There is little research on information extraction for government disaster reports. They are mostly text, but the structure of each sentence is very different from that of news articles and commercial reports. The features of the government disaster report should be carefully considered. In this paper, information extraction method for South Korea government reports in the word format is presented. This method is based on patterns and dictionaries and provides some additional ideas for tokenizing the damage representation of the text. The experiment result is F1 score of 80.2 on the test set. This is close to cutting-edge information extraction performance before applying the recent deep learning algorithms.

Correlation Analysis between News Articles and Music Charts using Big Data Technologies based on R (R 기반의 빅데이터 기술을 활용한 뉴스기사와 음원차트의 상관관계 분석)

  • Ha, Jung-chul;Kang, Dong-hoon;Park, Jae-mo;Gil, Joon-Min
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.636-639
    • /
    • 2016
  • 빅데이터의 일종인 뉴스기사 중에 아이돌 그룹관련 뉴스기사는 아이돌 그룹의 대중적 인기에 힘입어 전체 연예계 기사 중에 점점 큰 비중을 차지하고 있다. 아이돌 그룹의 소속사는 여러 홍보 방법 중 뉴스기사의 노출을 통해 비교적 저렴한 비용으로 홍보하여 음원차트 순위 향상을 위해 노력하고 있다. 본 논문에서는 뉴스기사와 음원차트 간의 상관관계를 분석하여 뉴스기사의 노출이 효율적 홍보 수단 인지를 알아보기 위해 먼저 감성분석을 통해 긍정기사와 부정기사가 음원차트 순위에 미치는 영향을 분석하고, 뉴스기사의 수가 많을수록 음원차트 순위가 상승하는지에 대해 알아보고자 한다. 이를 위해 본 논문에서는 R 언어를 이용하여 데이터 수집을 위한 웹 크롤러 설계, 회귀분석을 이용한 감성사전 구축 및 감성분석, 마지막으로 피어스만 상관계수를 이용한 상관관계 분석을 수행한다.

Study of Policy through Big data Analysis about Gambling News (사행산업 관련 뉴스의 빅데이터 분석을 통한 정책 연구)

  • Moon, HyeJung;Kim, SungKyung
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2016.11a
    • /
    • pp.190-193
    • /
    • 2016
  • 본 연구는 사행산업의 분야인 복권, 체육진흥투표권, 경마, 카지노에 대해 언론에서는 어떻게 다루어지고 있는지를 1990년부터 2015년까지의 뉴스데이터를 빅데이터 분석 방법 중 테스트의 의미연결망 분석을 통해 밝혀보고자 하는 연구이다. 이 논문은 의미망 분석을 통해 기사의 빈도와 연결성을 프레이밍과 시민관심 정도로 재조명 하여 기사에 대한 언론보도자의 의도와 시민의 인식차이를 밝혔고, 이를 통해 정책적 특성과 개혁과제를 탐색하였다. 분석결과 복권의 경우 당첨번호, 당첨금, 조작의혹 등 당첨에 대한 부분이 주제인 '사회문제' 형태였으며, 체육진흥투표권의 경우에는 사업입찰, 불법사이트, 발매대상 등 주로 사업추진과 불법사이트에 대한 '의무정보' 종류였고, 경마의 경우 사업장, 홍보, 기사 등으로 사업홍보나 광고 관련 뉴스이었고, 마지막으로 카지노의 경우에는 불법, 도박장, 외국인 등 '주요정보'에 해당하는 논문이었다. 시대에 따라 1990년대에는 카지노, 2000년대에는 복권, 2010년대에는 경마에 대한 기사보도가 많아졌으며, 이에 대한 시민의 반응도 사업비리, 당첨, 시민운동 등의 차이가 있었다. 마지막으로 기사의 빈도와 연결성이 나타내는 프레이밍 정도와 시민의 관심은 '1. 홍보광고, 2. 의무정보, 3. 사회이슈, 4. 주요정보' 네 가지로 구분되었으며 이 중 사고, 비리 등 주요기사로 구분되는 사회문제가 주요 공공의제로 형성되는 것을 확인할 수 있었다.

  • PDF

A Methodology for Analyzing Public Opinion about Science and Technology Issues Using Text Analysis (텍스트 분석을 활용한 과학기술이슈 여론 분석 방법론)

  • Kim, Dasom;Wong, William Xiu Shun;Lim, Myungsu;Liu, Chen;Kim, Namgyu;Park, Junhyung;Kil, Wooyeong;Yoon, Hansool
    • Journal of Information Technology Services
    • /
    • v.14 no.3
    • /
    • pp.33-48
    • /
    • 2015
  • Recently, many users frequently share their opinions on diverse issues using various social media. Therefore, many governments have attempted to establish or improve national policies according to the public opinions captured from the various social media. In this paper, we indicate several limitations of traditional approaches for analyzing public opinions about science and technology and provide an alternative methodology to overcome the limitations. First of all, we distinguish science and technology analysis phase and social issue analysis phase to reflect the fact that public opinion can be formed only when a certain science and technology is applied to a specific social issue. Next, we apply a start list and a stop list successively to acquire clarified and interesting results. Finally, to identify most appropriate documents fitting to a given subject, we develop a new concept of logical filter that consists of not only mere keywords but also a logical relationship among keywords. This study then analyzes the possibilities for the practical use of the proposed methodology thorough its application to discovering core issues and public opinions from 1,700,886 documents comprising SNS, blog, news, and discussion.

News Big Data Analysis System for Public Issue Extraction (공공이슈 추출을 위한 뉴스 빅데이터 분석 시스템)

  • Kim, Seung Ju;Yoon, Chang Geun;Lee, Cha Hun;Park, Dong Hwan;Lee, Hae Jun;Park, Hyeok Ju;Lee, Yong Kyu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.17-20
    • /
    • 2018
  • 대중의 관심인 공공이슈를 파악하기 위하여 다양한 종류의 빅데이터를 분석하는 연구가 진행되고 있다. 그러나 기존의 연구에서는 키워드의 노출 횟수만 파악하여 결과로 반영한다. 본 논문은 포털 사이트로부터 얻은 언론사별 뉴스 빅데이터를 이용하여 키워드별 노출 빈도수, 댓글 수 및 추천 수를 반영한 분석 방법을 제안하였다. 공공이슈를 추출하여 얻어낸 키워드들을 워드클라우드, Sankey다이어그램과 같은 형태로 시각화하여 사용자에게 제공한다. 제안된 방법을 사용하면 대중의 반응을 반영한 분석 결과를 확인 할 수 있다.

A Study on Leadership Typology in Sports Leaders Based on Big Data Analysis (빅데이터 분석을 활용한 스포츠 지도자들의 리더십 유형에 관한 연구)

  • Park, Eun-Mi;Seo, Joung-Hae
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.7
    • /
    • pp.191-198
    • /
    • 2019
  • This paper investigates different types of leadership found in foreign coaches in charge of the Korean national soccer team. To that end, news articles published during the tenure of those coaches were crawled for analysis. The analysis highlighted the following results. First, successful sports leaders showed their own specific types of leadership. Second, failed sports leaders showed specific types of leadership. The findings have the following implications. The leadership established based on the analysis results have practical implications in that they suggest the types of effectiveness leadership that are required of sports leaders in managing and leading athletes whilst generating tangible results and performance.

Modeling Domestic News Topics for Mongolia: Focusing on Changes in Press on Diplomatic Relations between the two countries after the establishment of Diplomatic ties between Korea and Mongolia (몽골에 대한 국내 뉴스 토픽 모델링: 한몽 수교 이후 양국 관계 보도 양상 변화를 중심으로)

  • Yoon, Ji-Soo;Jin, XianMei
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.4
    • /
    • pp.37-46
    • /
    • 2022
  • During the study, big data analysis was conducted focusing on domestic media reports related to Mongolia. The Latent Dirichlet Allocation(LDA) topic modeling was conducted using 130,000 articles with the keyword 'Mongolia' as the target of analysis. As a result of deriving and examining major topics for each period, there were disappearing subjects as the diplomacy level was raised, but most appeared in the beginning were remained and additional issues in various fields were shown.