• Title/Summary/Keyword: news paper articles

Search Result 148, Processing Time 0.021 seconds

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.

The Study on Implementation of Crime Terms Classification System for Crime Issues Response

  • Jeong, Inkyu;Yoon, Cheolhee;Kang, Jang Mook
    • International Journal of Advanced Culture Technology
    • /
    • v.8 no.3
    • /
    • pp.61-72
    • /
    • 2020
  • The fear of crime, discussed in the early 1960s in the United States, is a psychological response, such as anxiety or concern about crime, the potential victim of a crime. These anxiety factors lead to the burden of the individual in securing the psychological stability and indirect costs of the crime against the society. Fear of crime is not a good thing, and it is a part that needs to be adjusted so that it cannot be exaggerated and distorted by the policy together with the crime coping and resolution. This is because fear of crime has as much harm as damage caused by criminal act. Eric Pawson has argued that the popular impression of violent crime is not formed because of media reports, but by official statistics. Therefore, the police should watch and analyze news related to fear of crime to reduce the social cost of fear of crime and prepare a preemptive response policy before the people have 'fear of crime'. In this paper, we propose a deep - based news classification system that helps police cope with crimes related to crimes reported in the media efficiently and quickly and precisely. The goal is to establish a system that can quickly identify changes in security issues that are rapidly increasing by categorizing news related to crime among news articles. To construct the system, crime data was learned so that news could be classified according to the type of crime. Deep learning was applied by using Google tensor flow. In the future, it is necessary to continue research on the importance of keyword according to early detection of issues that are rapidly increasing by crime type and the power of the press, and it is also necessary to constantly supplement crime related corpus.

Keyword Extraction from News Corpus using Modified TF-IDF (TF-IDF의 변형을 이용한 전자뉴스에서의 키워드 추출 기법)

  • Lee, Sung-Jick;Kim, Han-Joon
    • The Journal of Society for e-Business Studies
    • /
    • v.14 no.4
    • /
    • pp.59-73
    • /
    • 2009
  • Keyword extraction is an important and essential technique for text mining applications such as information retrieval, text categorization, summarization and topic detection. A set of keywords extracted from a large-scale electronic document data are used for significant features for text mining algorithms and they contribute to improve the performance of document browsing, topic detection, and automated text classification. This paper presents a keyword extraction technique that can be used to detect topics for each news domain from a large document collection of internet news portal sites. Basically, we have used six variants of traditional TF-IDF weighting model. On top of the TF-IDF model, we propose a word filtering technique called 'cross-domain comparison filtering'. To prove effectiveness of our method, we have analyzed usefulness of keywords extracted from Korean news articles and have presented changes of the keywords over time of each news domain.

  • PDF

Categorization of Korean News Articles Based on Convolutional Neural Network Using Doc2Vec and Word2Vec (Doc2Vec과 Word2Vec을 활용한 Convolutional Neural Network 기반 한국어 신문 기사 분류)

  • Kim, Dowoo;Koo, Myoung-Wan
    • Journal of KIISE
    • /
    • v.44 no.7
    • /
    • pp.742-747
    • /
    • 2017
  • In this paper, we propose a novel approach to improve the performance of the Convolutional Neural Network(CNN) word embedding model on top of word2vec with the result of performing like doc2vec in conducting a document classification task. The Word Piece Model(WPM) is empirically proven to outperform other tokenization methods such as the phrase unit, a part-of-speech tagger with substantial experimental evidence (classification rate: 79.5%). Further, we conducted an experiment to classify ten categories of news articles written in Korean by feeding words and document vectors generated by an application of WPM to the baseline and the proposed model. From the results of the experiment, we report the model we proposed showed a higher classification rate (89.88%) than its counterpart model (86.89%), achieving a 22.80% improvement. Throughout this research, it is demonstrated that applying doc2vec in the document classification task yields more effective results because doc2vec generates similar document vector representation for documents belonging to the same category.

A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis (텍스트 마이닝을 활용한 신문사에 따른 내용 및 논조 차이점 분석)

  • Kam, Miah;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.53-77
    • /
    • 2012
  • This study analyses the difference of contents and tones of arguments among three Korean major newspapers, the Kyunghyang Shinmoon, the HanKyoreh, and the Dong-A Ilbo. It is commonly accepted that newspapers in Korea explicitly deliver their own tone of arguments when they talk about some sensitive issues and topics. It could be controversial if readers of newspapers read the news without being aware of the type of tones of arguments because the contents and the tones of arguments can affect readers easily. Thus it is very desirable to have a new tool that can inform the readers of what tone of argument a newspaper has. This study presents the results of clustering and classification techniques as part of text mining analysis. We focus on six main subjects such as Culture, Politics, International, Editorial-opinion, Eco-business and National issues in newspapers, and attempt to identify differences and similarities among the newspapers. The basic unit of text mining analysis is a paragraph of news articles. This study uses a keyword-network analysis tool and visualizes relationships among keywords to make it easier to see the differences. Newspaper articles were gathered from KINDS, the Korean integrated news database system. KINDS preserves news articles of the Kyunghyang Shinmun, the HanKyoreh and the Dong-A Ilbo and these are open to the public. This study used these three Korean major newspapers from KINDS. About 3,030 articles from 2008 to 2012 were used. International, national issues and politics sections were gathered with some specific issues. The International section was collected with the keyword of 'Nuclear weapon of North Korea.' The National issues section was collected with the keyword of '4-major-river.' The Politics section was collected with the keyword of 'Tonghap-Jinbo Dang.' All of the articles from April 2012 to May 2012 of Eco-business, Culture and Editorial-opinion sections were also collected. All of the collected data were handled and edited into paragraphs. We got rid of stop-words using the Lucene Korean Module. We calculated keyword co-occurrence counts from the paired co-occurrence list of keywords in a paragraph. We made a co-occurrence matrix from the list. Once the co-occurrence matrix was built, we used the Cosine coefficient matrix as input for PFNet(Pathfinder Network). In order to analyze these three newspapers and find out the significant keywords in each paper, we analyzed the list of 10 highest frequency keywords and keyword-networks of 20 highest ranking frequency keywords to closely examine the relationships and show the detailed network map among keywords. We used NodeXL software to visualize the PFNet. After drawing all the networks, we compared the results with the classification results. Classification was firstly handled to identify how the tone of argument of a newspaper is different from others. Then, to analyze tones of arguments, all the paragraphs were divided into two types of tones, Positive tone and Negative tone. To identify and classify all of the tones of paragraphs and articles we had collected, supervised learning technique was used. The Na$\ddot{i}$ve Bayesian classifier algorithm provided in the MALLET package was used to classify all the paragraphs in articles. After classification, Precision, Recall and F-value were used to evaluate the results of classification. Based on the results of this study, three subjects such as Culture, Eco-business and Politics showed some differences in contents and tones of arguments among these three newspapers. In addition, for the National issues, tones of arguments on 4-major-rivers project were different from each other. It seems three newspapers have their own specific tone of argument in those sections. And keyword-networks showed different shapes with each other in the same period in the same section. It means that frequently appeared keywords in articles are different and their contents are comprised with different keywords. And the Positive-Negative classification showed the possibility of classifying newspapers' tones of arguments compared to others. These results indicate that the approach in this study is promising to be extended as a new tool to identify the different tones of arguments of newspapers.

Quantitative Text Mining for Social Science: Analysis of Immigrant in the Articles (사회과학을 위한 양적 텍스트 마이닝: 이주, 이민 키워드 논문 및 언론기사 분석)

  • Yi, Soo-Jeong;Choi, Doo-Young
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.5
    • /
    • pp.118-127
    • /
    • 2020
  • The paper introduces trends and methodological challenges of quantitative Korean text analysis by using the case studies of academic and news media articles on "migration" and "immigration" within the periods of 2017-2019. The quantitative text analysis based on natural language processing technology (NLP) and this became an essential tool for social science. It is a part of data science that converts documents into structured data and performs hypothesis discovery and verification as the data and visualize data. Furthermore, we examed the commonly applied social scientific statistical models of quantitative text analysis by using Natural Language Processing (NLP) with R programming and Quanteda.

Article Data Prefetching Policy using User Access Patterns in News-On-demand System (주문형 전자신문 시스템에서 사용자 접근패턴을 이용한 기사 프리패칭 기법)

  • Kim, Yeong-Ju;Choe, Tae-Uk
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.5
    • /
    • pp.1189-1202
    • /
    • 1999
  • As compared with VOD data, NOD article data has the following characteristics: it is created at any time, has a short life cycle, is selected as not one article but several articles by a user, and has high access locality in time. Because of these intrinsic features, user access patterns of NOD article data are different from those of VOD. Thus, building NOD system using the existing techniques of VOD system leads to poor performance. In this paper, we analysis the log file of a currently running electronic newspaper, show that the popularity distribution of NOD articles is different from Zipf distribution of VOD data, and suggest a new popularity model of NOD article data MS-Zipf(Multi-Selection Zipf) distribution and its approximate solution. Also we present a life cycle model of NOD article data, which shows changes of popularity over time. Using this life cycle model, we develop LLBF (Largest Life-cycle Based Frequency) prefetching algorithm and analysis he performance by simulation. The developed LLBF algorithm supports the similar level in hit-ratio to the other prefetching algorithms such as LRU(Least Recently Used) etc, while decreasing the number of data replacement in article prefetching and reducing the overhead of the prefetching in system performance. Using the accurate user access patterns of NOD article data, we could analysis correctly the performance of NOD server system and develop the efficient policies in the implementation of NOD server system.

  • PDF

Ideological Discrepancies in News Media: Focusing on the 2016 U.S. Presidential Election (뉴스미디어에서의 이데올로기 차이: 2016년 미국 대선을 중심으로)

  • Noh, Bokyung;Ban, Hyun
    • The Journal of the Convergence on Culture Technology
    • /
    • v.3 no.4
    • /
    • pp.101-106
    • /
    • 2017
  • This paper investigates how news media frame news editorials to deliver their subjective ideological stance through news discourse related with two candidates in 2016 U.S presidential election. For this purpose, 13 editorials were chosen and analyzed which appeared on the New York Time for the period from Sept. 1 to Sept. 30, almost two months prior to the election, giving special attention to the headlines of those editorials and the expressive linguistic forms in the selected two articles, based on the two theoretical frameworks-van Dijk' (1996)'s ideological square and Martin and White (2005)'s Appraisal Theory. The results are as follows: (1) editorials clearly supported Hillary Clinton; (2) following the appraisal theory, the category of 'feeling' was applied in expressing the preference for Hillary, whereas the strategy of judgment for Trump, where the strategy of 'emphasis' from the ideological framework were used for both candidates.

Domestic Production Process of Early Korean Broadcast CG Equipment

  • Nah, So-Mi
    • Journal of Multimedia Information System
    • /
    • v.8 no.4
    • /
    • pp.285-294
    • /
    • 2021
  • The development process of domestic CG (Computer Graphics) needs to be studied by classifying the technical part and the design part. This paper analyzes how early domestic broadcast equipment evolved and focused on what went through the development process. Domestic broadcasting companies have introduced and used overseas equipment and have begun to develop their own technology. The development process of the early domestic production technology of broadcasting stations was classified into Character Generator, On-line Real-Time Graphic and Sport Coder. It was found that the orientation of broadcasting technology was conclusively focused on visualization, with socio-cultural factors acting along with the evolution of hardware and software. This research is meaningful in reorganizing the history of the development process of domestic CG technology in the early days through previous research (primary verification), newspaper articles and news coverage (secondary verification). This paper looks at what is missing by regaining the past of CG, and argues that with the advent of new technologies today, we must develop through appropriate division of roles and collaboration between engineers and designers.

A Service Marketing Success Using Internet of Things: The Case of Cleveland Museum of Arts in the United States (사물인터넷을 이용한 서비스 마케팅의 성공: 미국 클리블랜드 미술관의 사례)

  • Joo, Mi-Kyoung;Kim, Myung-Hee
    • Journal of Digital Convergence
    • /
    • v.14 no.11
    • /
    • pp.549-555
    • /
    • 2016
  • This paper tries to find the implications applied to art institutions of Korea by analyzing what the content and consequences of services of the Cleveland Museum of Art provided after the Internet of Things adoption in the theoretical perspectives strengthening service marketing through the Internet of Things increases visits and participation in the museum. For analysis, journal articles, statistics, and government press releases, news articles, web pages, web page articles are collected. The results are first, the application of digital and social media is positive in visit to the museum and monetization. Second, the introduction of the Internet of Things will enhance the museum's 'Education', 'Accessibility', 'Communications' services, as well as significantly increase on and offline participation by activating experience. Third, technical support from related companies with the support of local government funds is essential to build the system. Consequently, in order to enable the participation of the spectators in the galleries it is proposed aggressive adoption of such digital technology in domestic barren market of Internet of Things.