• Title/Summary/Keyword: News Article Classification

Search Result 25, Processing Time 0.022 seconds

Sentiment Analysis of COVID-19 Tweets: Impact of Pre-processing Step

  • Ayadi, Rami;Shahin, Osama R.;Ghorbel, Osama;Alanazi, Rayan;Saidi, Anouar
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.3
    • /
    • pp.206-211
    • /
    • 2021
  • Internet users are increasingly invited to express their opinions on various subjects in social networks, e-commerce sites, news sites, forums, etc. Much of this information, which describes feelings, becomes the subject of study in several areas of research such as: "Sensing opinions and analyzing feelings". It is the process of identifying the polarity of the feelings held in the opinions found in the interactions of Internet users on the web and classifying them as positive, negative, or neutral. In this article, we suggest the implementation of a sentiment analysis tool that has the role of detecting the polarity of opinions from people about COVID-19 extracted from social media (tweeter) in the Arabic language and to know the impact of the pre-processing phase on the opinions classification. The results show gaps in this area of research, first of all, the lack of resources when collecting data. Second, Arabic language is more complexes in pre-processing step, especially the dialects in the pre-treatment phase. But ultimately the results obtained are promising.

Trends of South Korea's Informatization and Libraries' Role Based on Newspaper Big Data (신문 빅데이터를 바탕으로 본 국내 정보화의 경향과 도서관의 역할)

  • Na, Kyoungsik;Lee, Jisu
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.9
    • /
    • pp.14-33
    • /
    • 2018
  • The purpose of this study to analyze the informatization trends in Korea through objective newspaper data for the period from 1998 to 2017 for informatization and library in four newspapers including KyoungHyang Newspaper, Kookmin Ilbo, Hankyoreh and Hankookilbo. Based on the analysis results of metadata and related words using BIGKinds, a news big data system, this study presented analysis of simple frequency, classification and classification of the keywords 'information', 'informatization' and 'library'. Based on the results, we compared and analyzed the tendency of informatization in the media through comparison with the 'Information White Paper' which is the publication of government agencies and with research about the research topic of 4 academic journals in the Library and Information Science field. This study tried to interpret the trends of informatization based on the media and it is meaningful that we analyzed the big data of newspaper article which is the long term and time series data. Based on the results of the study, implications of the growth and development of libraries with domestic informatization were suggested. It is expected that we will be able to create a basic framework for developing library informatization policy through the further studies.

Curation Service to Improve User's Access to National R & D Information : Focusing on Issues R&D Service (사용자의 국가 R&D 정보 이용 접근성 향상을 위한 큐레이션 서비스 : 이슈로 보는 R&D 사례를 중심으로)

  • Yu, Eun-ji;Choi, Kwang-Nam;Hwang, Youna
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.9
    • /
    • pp.1-10
    • /
    • 2020
  • National R & D data covers information in all fields from basic science research to industrialization, but it is expressed in technical terms, which make it difficult for the public to use. Accordingly, NTIS developed and launched the data curation service 'R&D issue service', which selects national R&D information on national and social issues and provides them to the public. Therefore, this study aims to analyze the effect of a data curation service on NTIS users' access to R&D data and suggest how to develop the curation service. The R&D issue service extracts issue from the news article and provide related national R&D projects, achievements and major research institute. All raw data used for the service are open to the public, organized in a report format and provided as PDF files. In addition, automative process is developed for all NTIS users to make individual issue packaging like administrator. The results show that 'R&D issue service' launching increases users' access and convenience to R&D data related to major issues, and the number of page views of users increased after the service was opened.

A Literature Review and Classification of Recommender Systems on Academic Journals (추천시스템관련 학술논문 분석 및 분류)

  • Park, Deuk-Hee;Kim, Hyea-Kyeong;Choi, Il-Young;Kim, Jae-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.139-152
    • /
    • 2011
  • Recommender systems have become an important research field since the emergence of the first paper on collaborative filtering in the mid-1990s. In general, recommender systems are defined as the supporting systems which help users to find information, products, or services (such as books, movies, music, digital products, web sites, and TV programs) by aggregating and analyzing suggestions from other users, which mean reviews from various authorities, and user attributes. However, as academic researches on recommender systems have increased significantly over the last ten years, more researches are required to be applicable in the real world situation. Because research field on recommender systems is still wide and less mature than other research fields. Accordingly, the existing articles on recommender systems need to be reviewed toward the next generation of recommender systems. However, it would be not easy to confine the recommender system researches to specific disciplines, considering the nature of the recommender system researches. So, we reviewed all articles on recommender systems from 37 journals which were published from 2001 to 2010. The 37 journals are selected from top 125 journals of the MIS Journal Rankings. Also, the literature search was based on the descriptors "Recommender system", "Recommendation system", "Personalization system", "Collaborative filtering" and "Contents filtering". The full text of each article was reviewed to eliminate the article that was not actually related to recommender systems. Many of articles were excluded because the articles such as Conference papers, master's and doctoral dissertations, textbook, unpublished working papers, non-English publication papers and news were unfit for our research. We classified articles by year of publication, journals, recommendation fields, and data mining techniques. The recommendation fields and data mining techniques of 187 articles are reviewed and classified into eight recommendation fields (book, document, image, movie, music, shopping, TV program, and others) and eight data mining techniques (association rule, clustering, decision tree, k-nearest neighbor, link analysis, neural network, regression, and other heuristic methods). The results represented in this paper have several significant implications. First, based on previous publication rates, the interest in the recommender system related research will grow significantly in the future. Second, 49 articles are related to movie recommendation whereas image and TV program recommendation are identified in only 6 articles. This result has been caused by the easy use of MovieLens data set. So, it is necessary to prepare data set of other fields. Third, recently social network analysis has been used in the various applications. However studies on recommender systems using social network analysis are deficient. Henceforth, we expect that new recommendation approaches using social network analysis will be developed in the recommender systems. So, it will be an interesting and further research area to evaluate the recommendation system researches using social method analysis. This result provides trend of recommender system researches by examining the published literature, and provides practitioners and researchers with insight and future direction on recommender systems. We hope that this research helps anyone who is interested in recommender systems research to gain insight for future research.

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.