• Title/Summary/Keyword: news data

Search Result 885, Processing Time 0.024 seconds

News Video Shot Boundary Detection using Singular Value Decomposition and Incremental Clustering (특이값 분해와 점증적 클러스터링을 이용한 뉴스 비디오 샷 경계 탐지)

  • Lee, Han-Sung;Im, Young-Hee;Park, Dai-Hee;Lee, Seong-Whan
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.2
    • /
    • pp.169-177
    • /
    • 2009
  • In this paper, we propose a new shot boundary detection method which is optimized for news video story parsing. This new news shot boundary detection method was designed to satisfy all the following requirements: 1) minimizing the incorrect data in data set for anchor shot detection by improving the recall ratio 2) detecting abrupt cuts and gradual transitions with one single algorithm so as to divide news video into shots with one scan of data set; 3) classifying shots into static or dynamic, therefore, reducing the search space for the subsequent stage of anchor shot detection. The proposed method, based on singular value decomposition with incremental clustering and mercer kernel, has additional desirable features. Applying singular value decomposition, the noise or trivial variations in the video sequence are removed. Therefore, the separability is improved. Mercer kernel improves the possibility of detection of shots which is not separable in input space by mapping data to high dimensional feature space. The experimental results illustrated the superiority of the proposed method with respect to recall criteria and search space reduction for anchor shot detection.

A study on the effect of tax evasion controversy on corporate values in internet news portals through big data analysis (빅데이터 분석을 통한 인터넷 뉴스 포털에서의 탈세 논란이 기업 가치에 미치는 영향 연구)

  • Lee, Sang-Min;Park, Myung-Ho;Kim, Byung-Jun;Park, Dae-Keun
    • Journal of Internet Computing and Services
    • /
    • v.22 no.6
    • /
    • pp.51-57
    • /
    • 2021
  • If a company's actions to save or avoid taxes are judged to be tax evasion rather than legal tax action by the tax authorities, the company will not only pay tax but also non-tax costs such as damage to corporate image and stock price decline due to a series of tax evasion-related news articles. Therefore, this study measures the frequency of occurrence of tax evasion controversial keywords in internet news portal as a factor to measure the severity of the case, and analyzes the effect of the frequency of occurrence on corporate value. In the Korean stock market, we crawl related articles from internet news portal by using keywords that are controversial for tax evasion targeting top companies based on market capitalization, and generate a time series of the frequency of occurrence of keywords about tax evasion by company and analyze the effect of frequency of appearance on book value versus market capitalization. Through panel regression and impulse response analysis, it is analyzed that the frequency of appearance has a negative effect on the market capitalization and the effect gradually decreases until 12 months. This study examines whether the tax evasion issue affects the corporate value of Korean companies and suggests that it is necessary to take these influences into account when entrepreneurs set up tax-planning schemes.

A Study on the Agenda Rank-Order Correlation between Twitter and Portal News about Sewol Ferry Catastrophe (세월호 참사에 대한 트위터와 포털뉴스의 의제 순위 상관관계 연구)

  • Kim, Shin-Ku;Choi, Eun-Kyoung
    • Journal of Internet Computing and Services
    • /
    • v.16 no.3
    • /
    • pp.105-116
    • /
    • 2015
  • The Sewol ferry catastrophe that took place on April 16 2014 was unprecedented in terms of its sociopolitical implications, which had reverberated throughout the Korean nation. Mindful of such distinct characteristics of the Sewol ferry catastrophe, this thesis looks into the salience of the agendas portrayed in Twitter and Portal News coverage on the disaster and the correlation between the attribute-specific agendas of the foregoing mediums by making use of the agenda rank order correlation method. Extraction and analysis of big data revealed that first, while the hypothesis that there were little difference in terms of salience among the main agendas between Twitter and Portal News was dismissed, the rank order correlation proved to be high as regards the main agendas on Twitter and Portal News. This signifies that Twitter agendas exert influence over those on Portal News. Next, and regarding the five main agendas on the incident, there existed differences in salience between the attribute-specific agendas of the two mediums, with low figures for corresponding rank order correlations. Such results signify that Twitter and Portal News have little influence over each other as regards their agenda rank order correlation.

Analysis of News Agenda Using Text mining and Semantic Network Analysis: Focused on COVID-19 Emotions (텍스트 마이닝과 의미 네트워크 분석을 활용한 뉴스 의제 분석: 코로나 19 관련 감정을 중심으로)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.47-64
    • /
    • 2021
  • The global spread of COVID-19 around the world has not only affected many parts of our daily life but also has a huge impact on many areas, including the economy and society. As the number of confirmed cases and deaths increases, medical staff and the public are said to be experiencing psychological problems such as anxiety, depression, and stress. The collective tragedy that accompanies the epidemic raises fear and anxiety, which is known to cause enormous disruptions to the behavior and psychological well-being of many. Long-term negative emotions can reduce people's immunity and destroy their physical balance, so it is essential to understand the psychological state of COVID-19. This study suggests a method of monitoring medial news reflecting current days which requires striving not only for physical but also for psychological quarantine in the prolonged COVID-19 situation. Moreover, it is presented how an easier method of analyzing social media networks applies to those cases. The aim of this study is to assist health policymakers in fast and complex decision-making processes. News plays a major role in setting the policy agenda. Among various major media, news headlines are considered important in the field of communication science as a summary of the core content that the media wants to convey to the audiences who read it. News data used in this study was easily collected using "Bigkinds" that is created by integrating big data technology. With the collected news data, keywords were classified through text mining, and the relationship between words was visualized through semantic network analysis between keywords. Using the KrKwic program, a Korean semantic network analysis tool, text mining was performed and the frequency of words was calculated to easily identify keywords. The frequency of words appearing in keywords of articles related to COVID-19 emotions was checked and visualized in word cloud 'China', 'anxiety', 'situation', 'mind', 'social', and 'health' appeared high in relation to the emotions of COVID-19. In addition, UCINET, a specialized social network analysis program, was used to analyze connection centrality and cluster analysis, and a method of visualizing a graph using Net Draw was performed. As a result of analyzing the connection centrality between each data, it was found that the most central keywords in the keyword-centric network were 'psychology', 'COVID-19', 'blue', and 'anxiety'. The network of frequency of co-occurrence among the keywords appearing in the headlines of the news was visualized as a graph. The thickness of the line on the graph is proportional to the frequency of co-occurrence, and if the frequency of two words appearing at the same time is high, it is indicated by a thick line. It can be seen that the 'COVID-blue' pair is displayed in the boldest, and the 'COVID-emotion' and 'COVID-anxiety' pairs are displayed with a relatively thick line. 'Blue' related to COVID-19 is a word that means depression, and it was confirmed that COVID-19 and depression are keywords that should be of interest now. The research methodology used in this study has the convenience of being able to quickly measure social phenomena and changes while reducing costs. In this study, by analyzing news headlines, we were able to identify people's feelings and perceptions on issues related to COVID-19 depression, and identify the main agendas to be analyzed by deriving important keywords. By presenting and visualizing the subject and important keywords related to the COVID-19 emotion at a time, medical policy managers will be able to be provided a variety of perspectives when identifying and researching the regarding phenomenon. It is expected that it can help to use it as basic data for support, treatment and service development for psychological quarantine issues related to COVID-19.

HKIB-20000 & HKIB-40075: Hangul Benchmark Collections for Text Categorization Research

  • Kim, Jin-Suk;Choe, Ho-Seop;You, Beom-Jong;Seo, Jeong-Hyun;Lee, Suk-Hoon;Ra, Dong-Yul
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.3
    • /
    • pp.165-180
    • /
    • 2009
  • The HKIB, or Hankookilbo, test collections are two archives of Korean newswire stories manually categorized with semi-hierarchical or hierarchical category taxonomies. The base newswire stories were made available by the Hankook Ilbo (The Korea Daily) for research purposes. At first, Chungnam National University and KISTI collaborated to manually tag 40,075 news stories with categories by semi-hierarchical and balanced three-level classification scheme, where each news story has only one level-3 category (single-labeling). We refer to this original data set as HKIB-40075 test collection. And then Yonsei University and KISTI collaborated to select 20,000 newswire stories from the HKIB-40075 test collection, to rearrange the classification scheme to be fully hierarchical but unbalanced, and to assign one or more categories to each news story (multi-labeling). We refer to this modified data set as HKIB-20000 test collection. We benchmark a k-NN categorization algorithm both on HKIB-20000 and on HKIB-40075, illustrating properties of the collections, providing baseline results for future studies, and suggesting new directions for further research on Korean text categorization problem.

Strategy Design to Protect Personal Information on Fake News based on Bigdata and Artificial Intelligence

  • Kang, Jangmook;Lee, Sangwon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.11 no.2
    • /
    • pp.59-66
    • /
    • 2019
  • The emergence of new IT technologies and convergence industries, such as artificial intelligence, bigdata and the Internet of Things, is another chance for South Korea, which has established itself as one of the world's top IT powerhouses. On the other hand, however, privacy concerns that may arise in the process of using such technologies raise the task of harmonizing the development of new industries and the protection of personal information at the same time. In response, the government clearly presented the criteria for deidentifiable measures of personal information and the scope of use of deidentifiable information needed to ensure that bigdata can be safely utilized within the framework of the current Personal Information Protection Act. It strives to promote corporate investment and industrial development by removing them and to ensure that the protection of the people's personal information and human rights is not neglected. This study discusses the strategy of deidentifying personal information protection based on the analysis of fake news. Using the strategies derived from this study, it is assumed that deidentification information that is appropriate for deidentification measures is not personal information and can therefore be used for analysis of big data. By doing so, deidentification information can be safely utilized and managed through administrative and technical safeguards to prevent re-identification, considering the possibility of re-identification due to technology development and data growth.

A Study on Interest Issues Using Social Media New (소셜미디어 뉴스를 이용한 관심 이슈 연구)

  • Kwak, Noh Young;Lee, Moon Bong
    • The Journal of Information Systems
    • /
    • v.32 no.2
    • /
    • pp.177-190
    • /
    • 2023
  • Purpose Recently, as a new business marketing tool, short form content focused on fun and interest has been shared as hashtags. By extracting positive and negative keywords from media audiences through comment analysis of social media news, various stakeholders aim to quickly and easily grasp users' opinions on major news. Design/methodology/approach YouTube videos were searched using the YouTube Data API and the results were collected. Video comments were crawled and implemented as HTML elements, and the collection results were checked on the web page. The collected data consisted of video thumbnails, titles, contents, and comments. Comments were word tokenized with the R program, comparing positive and negative dictionaries, and then quantifying polarity. In addition, social network analysis was conducted using divided positive and negative comments, and the results of centrality analysis and visualization were confirmed. Findings Social media users' opinions on issue news were confirmed by analyzing and visualizing the centrality of keywords through social network analysis by dividing comments into positive and negative. As a result of the analysis, it was found that negative objective reviews had the highest effect on information usefulness. In this way, previous studies have been reaffirmed that online negative information has a strong effect on personal decision-making. Corporate marketers will analyze user comments on social network services (SNS) to detect negative opinions about products or corporate images, which will serve as an opportunity to satisfy customers' needs.

A Study on the Factors Affecting the Intention of Chinese Users to Discriminate Against Fake News on Social Media - Focusing on attitude, social capital, and risk detection - (중국 이용자 소셜미디어 가짜뉴스 판별의도에 미치는 요인에 관한 연구 -태도, 사회자본, 위험감지를 중심으로-)

  • Tan, KeHong;Lee, Hwa Haeng
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.4
    • /
    • pp.337-351
    • /
    • 2022
  • With the full spread and rapid development of social media, the trend of decentralization of social media information propagation is becoming clearer day by day, and the segmentation of time by audiences using social media information is clearly progressing. Therefore, this study aims to study the influence relationship between social media attitudes toward fake news, social capital, risk perception, and discriminant intentions based on existing studies. Accordingly, the research model presented related research questions and organized a questionnaire to collect a total of 500 valid surveys. The SPSS 26.0 program and the AMOS 24.0 program were used to analyze the data. The research results are as follows. First, the more positive the user's attitude towards the fake news identification intention of social media, the more they want to use various methods or tools to identify the authenticity of online information. Second, the more positive the user's attitude towards social media fake news, the more aware of the potential threats social media fake news poses to their own physical, psychological, financial and so on. At the same time, by raising one's own awareness of the dangers, counterintelligence intentions against fake news on social media will also increase. Third, the richer the social capital the user has, the stronger the information literacy, and therefore the stronger the identification intention of social media fake news. Fourth, the higher the value of social capital Chinese users have, the greater the damage they have suffered from fake news, and the higher the risk awareness of fake news to protect their interests. Fifth, it means that Chinese users recognized information suspected of social media and took corresponding measures.

Design and Implementation of Real-Time News App using RSS of the Internet Newspaper (신문사 RSS를 활용한 실시간뉴스 어플리케이션 설계 및 구현)

  • Song, Ju-Whan
    • Journal of Digital Contents Society
    • /
    • v.19 no.4
    • /
    • pp.631-637
    • /
    • 2018
  • In order to read newspaper articles, the use of paper newspapers is decreasing and smartphone are increasingly used. As a result, the number of news apps continues to increase. Many of the news apps in the Android Play Store fall into two categories. The first is an app that is developed by a specific newspaper company and distributes only the articles of the newspaper company. The rest is an app that shows a list of newspapers and shows the homepage when a newspaper is selected. In this paper, we have designed and implemented a Real-Time News app for collecting articles from many newspapers and providing them in real time. Newspapers provide up-to-date articles with RSS feeds. The server program stores them in the DB, and transmits the articles requested in the Real-Time News app in real time. In order to see the latest news, it is possible to collect the articles of each newspaper without visiting the websites of the various newspapers, and it is possible to reduce the mobile data usage used to access each website.

Building a Korean Text Summarization Dataset Using News Articles of Social Media (신문기사와 소셜 미디어를 활용한 한국어 문서요약 데이터 구축)

  • Lee, Gyoung Ho;Park, Yo-Han;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.8
    • /
    • pp.251-258
    • /
    • 2020
  • A training dataset for text summarization consists of pairs of a document and its summary. As conventional approaches to building text summarization dataset are human labor intensive, it is not easy to construct large datasets for text summarization. A collection of news articles is one of the most popular resources for text summarization because it is easily accessible, large-scale and high-quality text. From social media news services, we can collect not only headlines and subheads of news articles but also summary descriptions that human editors write about the news articles. Approximately 425,000 pairs of news articles and their summaries are collected from social media. We implemented an automatic extractive summarizer and trained it on the dataset. The performance of the summarizer is compared with unsupervised models. The summarizer achieved better results than unsupervised models in terms of ROUGE score.