• Title/Summary/Keyword: 뉴스 기사 클러스터링

Search Result 10, Processing Time 0.026 seconds

User Oriented clustering of news articles using Tweets Heterogeneous Information Network (트위트 이형 정보 망을 이용한 뉴스 기사의 사용자 지향적 클러스터링)

  • Shoaib, Muhammad;Song, Wang-Cheol
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.85-94
    • /
    • 2013
  • With the emergence of world wide web, in particular web 2.0 the rapidly growing amount of news articles has created a problem for users in selection of news articles according to their requirements. To overcome this problem different clustering mechanism has been proposed to broadly categorize news articles. However these techniques are totally machine oriented techniques and lack users' participation in the process of decision making for membership of clustering. In order to overcome the issue of zero-participation in the process of clustering news articles in this paper we have proposed a framework for clustering news articles by combining users' judgments that they post on twitter with the news articles to cluster the objects. We have employed twitter hash-tags for this purpose. Furthermore we have computed the credibility of users' based on frequency of retweets for their tweets in order to enhance the accuracy of the clustering membership function. In order to test performance of proposed methodology, we performed experiments on tweets messages tweeted during general election 2013 in Pakistan. Our results proved over claim that using users' output better outcome can be achieved then ordinary clustering algorithms.

Design and Development of a Personalized News Recommendation System (개인 맞춤형 뉴스 추천 시스템의 설계 및 개발)

  • Yu, YoungSeo;Lee, Jimin;Lee, Ki Yong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.04a
    • /
    • pp.599-602
    • /
    • 2016
  • 실시간으로 뉴스 기사를 제공하는 온라인 뉴스 시스템이 널리 사용되면서, 사람들은 매 순간 속보와 새로운 뉴스 등 대량의 뉴스 기사에 노출되어 있다. 하지만 방대한 뉴스들로부터 사용자가 원하는 뉴스를 찾는 것은 매우 어려운 일이다. 따라서 개인 관심사에 따라 뉴스를 추천해주는 개인 맞춤형 뉴스 추천 시스템의 필요성이 증가되고 있다. 본 논문에서는 사용자의 관심사를 분석하여, 사용자의 관심사에 따라 관련된 뉴스를 자동으로 추천해주는 뉴스 추천 시스템을 설계 및 개발한다. 제안 시스템은 각 사용자가 북마크한 뉴스 기사와 읽은 뉴스 기사를 클러스터링하여 사용자별 프로파일을 생성한다. 또한 전체 뉴스 기사들을 클러스터링하여 주제 별로 분류한다. 사용자에게 뉴스를 추천하기 위해, 제안 시스템은 해당 사용자 프로파일에 포함된 각 클러스터에 대해 전체 뉴스 기사에 대한 클러스터들 중 가장 가까운 클러스터를 찾아 해당 클러스터 내의 뉴스 기사들을 거리 순으로 추천한다. 실제 구현된 시스템을 통해, 제안한 뉴스 추천 시스템이 각 개인에게 뉴스를 효과적으로 추천함을 보인다.

A Study on an Effective Event Detection Method for Event-Focused News Summarization (사건중심 뉴스기사 자동요약을 위한 사건탐지 기법에 관한 연구)

  • Chung, Young-Mee;Kim, Yong-Kwang
    • Journal of the Korean Society for information Management
    • /
    • v.25 no.4
    • /
    • pp.227-243
    • /
    • 2008
  • This study investigates an event detection method with the aim of generating an event-focused news summary from a set of news articles on a certain event using a multi-document summarization technique. The event detection method first classifies news articles into the event related topic categories by employing a SVM classifier and then creates event clusters containing news articles on an event by a modified single pass clustering algorithm. The clustering algorithm applies a time penalty function as well as cluster partitioning to enhance the clustering performance. It was found that the event detection method proposed in this study showed a satisfactory performance in terms of both the F-measure and the detection cost.

Contextual Advertisement System based on Document Clustering (문서 클러스터링을 이용한 문맥 광고 시스템)

  • Lee, Dong-Kwang;Kang, In-Ho;An, Dong-Un
    • The KIPS Transactions:PartB
    • /
    • v.15B no.1
    • /
    • pp.73-80
    • /
    • 2008
  • In this paper, an advertisement-keyword finding method using document clustering is proposed to solve problems by ambiguous words and incorrect identification of main keywords. News articles that have similar contents and the same advertisement-keywords are clustered to construct the contextual information of advertisement-keywords. In addition to news articles, the web page and summary of a product are also used to construct the contextual information. The given document is classified as one of the news article clusters, and then cluster-relevant advertisement-keywords are used to identify keywords in the document. We could achieve 21% precision improvement by our proposed method.

A Study on the Deduction of Social Issues Applying Word Embedding: With an Empasis on News Articles related to the Disables (단어 임베딩(Word Embedding) 기법을 적용한 키워드 중심의 사회적 이슈 도출 연구: 장애인 관련 뉴스 기사를 중심으로)

  • Choi, Garam;Choi, Sung-Pil
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.1
    • /
    • pp.231-250
    • /
    • 2018
  • In this paper, we propose a new methodology for extracting and formalizing subjective topics at a specific time using a set of keywords extracted automatically from online news articles. To do this, we first extracted a set of keywords by applying TF-IDF methods selected by a series of comparative experiments on various statistical weighting schemes that can measure the importance of individual words in a large set of texts. In order to effectively calculate the semantic relation between extracted keywords, a set of word embedding vectors was constructed by using about 1,000,000 news articles collected separately. Individual keywords extracted were quantified in the form of numerical vectors and clustered by K-means algorithm. As a result of qualitative in-depth analysis of each keyword cluster finally obtained, we witnessed that most of the clusters were evaluated as appropriate topics with sufficient semantic concentration for us to easily assign labels to them.

A news visualization based on an algorithm by journalistic values (저널리즘 가치에 기초한 알고리즘을 이용한 뉴스 시각화)

  • Park, Daemin;Kim, Gi-Nam;Kang, Nam-Yong;Suh, Bongwon;Ha, Hyo-Ji;On, Byung-Won
    • Journal of the HCI Society of Korea
    • /
    • v.9 no.2
    • /
    • pp.5-12
    • /
    • 2014
  • There was widespread criticism of the online news services due to their bias toward sensational and soft news. Thus, news services based on journalist values are socially requested. News source network analysis(NSNA), an algorithm to cluster and weight news sources, quotes, and articles, is suggested as a method to emphasize on journalist values like facts, variety, depth, and criticism in the previous study. This study suggests 'News Sources' as a visualization tool of NSNA. 'News Sources' shows news as bar graphs, weighted by facts and criticism, and arranged by organizations and subjects. This study designed a beta version using KINDS, a news archive of Korean Press Foundation.

News Video Shot Boundary Detection using Singular Value Decomposition and Incremental Clustering (특이값 분해와 점증적 클러스터링을 이용한 뉴스 비디오 샷 경계 탐지)

  • Lee, Han-Sung;Im, Young-Hee;Park, Dai-Hee;Lee, Seong-Whan
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.2
    • /
    • pp.169-177
    • /
    • 2009
  • In this paper, we propose a new shot boundary detection method which is optimized for news video story parsing. This new news shot boundary detection method was designed to satisfy all the following requirements: 1) minimizing the incorrect data in data set for anchor shot detection by improving the recall ratio 2) detecting abrupt cuts and gradual transitions with one single algorithm so as to divide news video into shots with one scan of data set; 3) classifying shots into static or dynamic, therefore, reducing the search space for the subsequent stage of anchor shot detection. The proposed method, based on singular value decomposition with incremental clustering and mercer kernel, has additional desirable features. Applying singular value decomposition, the noise or trivial variations in the video sequence are removed. Therefore, the separability is improved. Mercer kernel improves the possibility of detection of shots which is not separable in input space by mapping data to high dimensional feature space. The experimental results illustrated the superiority of the proposed method with respect to recall criteria and search space reduction for anchor shot detection.

Analysis and Visualization for Comment Messages of Internet Posts (인터넷 게시물의 댓글 분석 및 시각화)

  • Lee, Yun-Jung;Ji, Jeong-Hoon;Woo, Gyun;Cho, Hwan-Gue
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.7
    • /
    • pp.45-56
    • /
    • 2009
  • There are many internet users who collect the public opinions and express their opinions for internet news or blog articles through the replying comment on online community. But, it is hard to search and explore useful messages on web blogs since most of web blog systems show articles and their comments to the form of sequential list. Also, spam and malicious comments have become social problems as the internet users increase. In this paper, we propose a clustering and visualizing system for responding comments on large-scale weblogs, namely 'Daum AGORA,' using similarity analysis. Our system shows the comment clustering result as a simple screen view. Our system also detects spam comments using Needleman-Wunsch algorithm that is a well-known algorithm in bioinformatics.

Hierarchical Automatic Classification of News Articles based on Association Rules (연관규칙을 이용한 뉴스기사의 계층적 자동분류기법)

  • Joo, Kil-Hong;Shin, Eun-Young;Lee, Joo-Il;Lee, Won-Suk
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.6
    • /
    • pp.730-741
    • /
    • 2011
  • With the development of the internet and computer technology, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The conventional document categorization method used only the keywords of related documents for document classification. However, this paper proposed keyword extraction method of based on association rule. This method extracts a set of related keywords which are involved in document's category and classifies representative keyword by using the classification rule proposed in this paper. In addition, this paper proposed the preprocessing method for efficient keywords creation and predicted the new document's category. We can design the classifier and measure the performance throughout the experiment to increase the profile's classification performance. When predicting the category, substituting all the classification rules one by one is the major reason to decrease the process performance in a profile. Finally, this paper suggested automatically categorizing plan which can be applied to hierarchical category architecture, extended from simple category architecture.

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.