• 제목/요약/키워드: News Article Classification

검색결과 24건 처리시간 0.019초

심층 주제, 지역, 장르를 모두 분류할 수 있는 다면적 뉴스 기사 자동 분류 모델 연구 (Research on Multi-facted News Article Classification Models Classifying Subjects, Geographies and Genres)

  • 이효진;최성필
    • 한국문헌정보학회지
    • /
    • 제58권3호
    • /
    • pp.65-89
    • /
    • 2024
  • 본 연구는 한국어 사전학습 모델을 활용하여 뉴스 기사를 주제, 장르, 지역별로 각각 분류하는 모델을 구축하였다. 이를 위해 국내 언론사의 분류체계를 참고하여 새로운 뉴스 기사 분류체계를 설계하였다. 주제 및 장르 분류 모델은 대분류와 중분류 모델을 연결한 계층적 구조의 분류 모델로 구현하여 카테고리 통합 모델의 성능과 비교하였다. 평가 결과, 계층적 구조의 분류 모델은 모호하거나 중복된 카테고리에서 카테고리 통합 모델보다 더 명확한 분류를 수행할 수 있다는 이점이 있었다. 뉴스 기사의 지역적 분류를 위해서는 18개의 카테고리에 대하여 분류를 수행하는 모델을 구축하였으며 지역 관련 뉴스 기사의 경우, 지역적 특성이 본문에 명확히 드러나 높은 성능을 기록할 수 있었다. 본 연구는 주제, 장르, 지역의 다각적인 측면에서 뉴스 기사를 효과적으로 분류할 수 있음을 보여주었으며, 이를 통해 사용자 요구에 부합하는 다차원적 뉴스 기사 분류 서비스의 가능성을 제시한 점에서 의의가 있다.

FAGON: Fake News Detection Model Using Grammatical Transformation on Deep Neural Network

  • Seo, Youngkyung;Han, Seong-Soo;Jeon, You-Boo;Jeong, Chang-Sung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권10호
    • /
    • pp.4958-4970
    • /
    • 2019
  • As technology advances, the amount of fake news is increasing more and more by various reasons such as political issues and advertisement exaggeration. However, there have been very few research works on fake news detection, especially which uses grammatical transformation on deep neural network. In this paper, we shall present a new Fake News Detection Model, called FAGON(Fake news detection model using Grammatical transformation On deep Neural network) which determines efficiently if the proposition is true or not for the given article by learning grammatical transformation on neural network. Especially, our model focuses the Korean language. It consists of two modules: sentence generator and classification. The former generates multiple sentences which have the same meaning as the proposition, but with different grammar by training the grammatical transformation. The latter classifies the proposition as true or false by training with vectors generated from each sentence of the article and the multiple sentences obtained from the former model respectively. We shall show that our model is designed to detect fake news effectively by exploiting various grammatical transformation and proper classification structure.

Arabic Stock News Sentiments Using the Bidirectional Encoder Representations from Transformers Model

  • Eman Alasmari;Mohamed Hamdy;Khaled H. Alyoubi;Fahd Saleh Alotaibi
    • International Journal of Computer Science & Network Security
    • /
    • 제24권2호
    • /
    • pp.113-123
    • /
    • 2024
  • Stock market news sentiment analysis (SA) aims to identify the attitudes of the news of the stock on the official platforms toward companies' stocks. It supports making the right decision in investing or analysts' evaluation. However, the research on Arabic SA is limited compared to that on English SA due to the complexity and limited corpora of the Arabic language. This paper develops a model of sentiment classification to predict the polarity of Arabic stock news in microblogs. Also, it aims to extract the reasons which lead to polarity categorization as the main economic causes or aspects based on semantic unity. Therefore, this paper presents an Arabic SA approach based on the logistic regression model and the Bidirectional Encoder Representations from Transformers (BERT) model. The proposed model is used to classify articles as positive, negative, or neutral. It was trained on the basis of data collected from an official Saudi stock market article platform that was later preprocessed and labeled. Moreover, the economic reasons for the articles based on semantic unit, divided into seven economic aspects to highlight the polarity of the articles, were investigated. The supervised BERT model obtained 88% article classification accuracy based on SA, and the unsupervised mean Word2Vec encoder obtained 80% economic-aspect clustering accuracy. Predicting polarity classification on the Arabic stock market news and their economic reasons would provide valuable benefits to the stock SA field.

일간지를 통해 본 주거환경문제의 연구 ( I ) - 동아일보 (1920년~1990년) 기사 유형의 변천 - (A Study of Housing Environment Problems through the Daily newspapers ( I ) - The Change of a type of the Dong-A daily papers (1920~1990) -)

  • 신경주
    • 한국주거학회논문집
    • /
    • 제2권2호
    • /
    • pp.41-53
    • /
    • 1991
  • This study discussed the change of housing environmental problems from the early 1900s to the present.The reason is to find the solution of serious housing environment problems. The documentary research method was used for this study.Articles of content analysis(N= 1129)were published in 1920(the first edition)to December. 31, 1990 which were The Dong - A daily news article about housing environment. The main content of this study was examined the change, such as the number of whole article by time series and importance of article(column number of article), classification of article subject, and the number of article by subject. On the basis of this data, was made by chronological classification of the change of housing environment problems for 70 years. Since overall results will become supply of right information about housing environment to fur peoples, will provide the oppronment that oneself ran participate the protection of housing environment, and further will take a part solution of housing environment problems.At the future, I am going to design deep analysis of article content by subject.

  • PDF

토픽모델링을 활용한 해운물류 뉴스 분석 (Analysis of Shipping and Logistics News Articles using Topic Modeling)

  • 윤희영;곽일엽
    • 무역학회지
    • /
    • 제46권4호
    • /
    • pp.61-76
    • /
    • 2021
  • This study focuses on three logistics-related news (Logistics Newspaper, Korea Shipping Gadget, and Korea Shipping Newspaper) in order to present changes in logistics issues, centering on Corona 19, which has recently had the greatest impact in the world. For data collection, two-year news articles in 2019 and 2020 (title, article, content, date, article classification, article URL) were collected through web crawling (using Python's BeautifulSoup, requests module) on the homepages of three representative logistics-related media companies. As for the data analysis methods, fundamental statistical analysis, Latent Dirichlet Allocation (LDA) for topic modeling, and Scattertext were performed. The analysis results were as follows. First, among the three news media related to logistics, the Korea Shipping Newspaper was carrying out the most active media activities. Second, through topic modeling with LDA, eight logistics-related topics were identified, and keywords and significant issues of each topic were presented. Third, the keywords were visually expressed through Scattertext. This is the first study to present changes in the logistics field, focusing on articles from representative logistics-related media in 2019 and 2020. In particular, 2019 and 2020 can be divided into before and after the outbreak of Corona 19, which has had a great impact not only on the logistics field but also on our lives as a whole. For future work, a multi-faceted approach is required, such as comparative studies of logistics issues between countries or presenting implications based on long-term time-series articles.

가짜뉴스(Fake News) 현황분석을 통해 본 디지털매체 시대의 쟁점과 뉴스콘텐츠 제작 가이드라인 (Controversy and Guideline Suggestion Surrounding Fake News in the Digital Media Age)

  • 권만우;전용우;임하진
    • 한국멀티미디어학회논문지
    • /
    • 제18권11호
    • /
    • pp.1419-1426
    • /
    • 2015
  • Distinguishing border between news and advertising is disappearing. Traditional journalism considered editorial part deals news and ad part handle commercial messages. But now this classification is meaningless. Current news consumers do not separate advertising content and non-advertising content. In Korea, making fake news or paid news pages is becoming social problem. Fake news uses various camouflages to pretend to be real news. This paper descriptively analyzed Korean fake news cases and suggested some guidelines for publishing news. We analyzed 3 major newspaper web sites from July to September, 2014. These three newspapers publish section pages everyday containing fake news or sponsored news. Totally more than one thousand articles were selected for content analysis. We coded the numbers of fake news, day of the week, the rate of sponsored news, average fake news publication number per pages, the conformity between news and advertising, and the type of fake news. We also coded the number of sponsored news article in day sections. We used method of comparing the advertising contents and news articles. As a result, 24.8% of news article were published for the advertising sponsors. Advertorial or fake news were sometimes arranged same pages the same day. We coded the conformity between same advertising and news content. More than 60 percent (60.9%) of fake news match with their sponsors. PR style of fake news is top and advertising type of fake news is the lowest.

뉴스 댓글의 감정 분류를 위한 자질 가중치 설정 (Feature Weighting for Opinion Classification of Comments on News Articles)

  • 이공주;김재훈;서형원;류길수
    • Journal of Advanced Marine Engineering and Technology
    • /
    • 제34권6호
    • /
    • pp.871-879
    • /
    • 2010
  • 본 논문은 뉴스 기사의 댓글에 대한 사용자의 감정을 분류하는 시스템을 제안한다. 제안된 시스템은 댓글의 문서 분류 시스템으로 기계학습에 기반을 두고 있다. 댓글은 일반적인 문서와 달리 본문을 가지고 있으며 본문의 내용이 독자의 감정에 영향을 줄 수 있다. 본 논문에서는 이와 같은 댓글의 특성과 여러 가지 자원을 이용하여 감정 분류를 위한 자질을 제안하고 이들의 가중치 설정 방법을 제안한다. 실험을 통해 이러한 가중치 설정 방법이 한글 뉴스의 댓글에 대한 감정을 분류하는데 효과적임을 알 수 있었다. 또한 댓글과 같이 많은 오류를 포함하는 문서에 대해서 문자 단위의 2음절과 3음절 자질도 충분히 이용 가치가 있음을 확인할 수 있었다. 향후에 뉴스 기사의 댓글뿐 아니라 상품 댓글 등 일반적인 감정 분석에 적용할 계획이다.

복수의 신문기사 자동요약에 관한 실험적 연구 (An Experimental Study on Automatic Summarization of Multiple News Articles)

  • 김용광;정영미
    • 정보관리학회지
    • /
    • 제23권1호
    • /
    • pp.83-98
    • /
    • 2006
  • 이 연구에서는 복수의 신문기사를 자동으로 요약하기 위해 문장의 의미범주를 활용한 템플리트 기반 요약 기법을 제시하였다. 먼저 학습과정에서 사건/사고 관련 신문기사의 요약문에 포함할 핵심 정보의 의미범주를 식별한 다음 템플리트를 구성하는 각 슬롯의 단서어를 선정한다. 자동요약 과정에서는 입력되는 복수의 뉴스기사들을 사건/사고 별로 범주화한 후 각 기사로부터 주요 문장을 추출하여 템플리트의 각 슬롯을 채운다. 마지막으로 문장을 단문으로 분리하여 템플리트의 내용을 수정한 후 이로부터 요약문을 작성한다. 자동 생성된 요약문을 평가한 결과 요약 정확률과 요약 재현율은 각각 0.541과 0.581로 나타났고, 요약문장 중복률은 0.116으로 나타났다.

재해기상 언론기사 빅데이터를 활용한 피해정보 자동 분류기 개발 (Developing and Evaluating Damage Information Classifier of High Impact Weather by Using News Big Data)

  • 조수지;이기광
    • 산업경영시스템학회지
    • /
    • 제46권3호
    • /
    • pp.7-14
    • /
    • 2023
  • Recently, the importance of impact-based forecasting has increased along with the socio-economic impact of severe weather have emerged. As news articles contain unconstructed information closely related to the people's life, this study developed and evaluated a binary classification algorithm about snowfall damage information by using media articles text mining. We collected news articles during 2009 to 2021 which containing 'heavy snow' in its body context and labelled whether each article correspond to specific damage fields such as car accident. To develop a classifier, we proposed a probability-based classifier based on the ratio of the two conditional probabilities, which is defined as I/O Ratio in this study. During the construction process, we also adopted the n-gram approach to consider contextual meaning of each keyword. The accuracy of the classifier was 75%, supporting the possibility of application of news big data to the impact-based forecasting. We expect the performance of the classifier will be improve in the further research as the various training data is accumulated. The result of this study can be readily expanded by applying the same methodology to other disasters in the future. Furthermore, the result of this study can reduce social and economic damage of high impact weather by supporting the establishment of an integrated meteorological decision support system.

Prediction of Stock Returns from News Article's Recommended Stocks Using XGBoost and LightGBM Models

  • Yoo-jin Hwang;Seung-yeon Son;Zoon-ky Lee
    • 한국컴퓨터정보학회논문지
    • /
    • 제29권2호
    • /
    • pp.51-59
    • /
    • 2024
  • 투자자는 수익의 극대화를 위해 언론사의 기사를 포함한 다양한 정보를 활용하여 투자 전략을 수립한다. 이에 국내 언론사에서도 신뢰도 있는 투자정보를 제공하기 위해, 애널리스트의 종목분석 보고서에 기초한 종목 추천기사를 게재하고 있다. 본 연구에서는 종목 추천기사 게재를 하나의 사건(event)으로 간주하고, XGBoost와 LightGBM 모델을 활용하여 기사 게재 10일 이후 가격의 상승 또는 하락을 예측하는 분류 모델을 제시한다. 또한, 전체 추천종목을 유가증권시장과 코스닥 시장 및 기업규모(대형/소형)에 따라 4가지로 분류하고, 하위 그룹에 따라 모델의 예측 정확도에 차이가 있는지 파악하고자 한다. 학습 결과 전체 모델의 분류 정확도는 XGBoost 75%, LightGBM 71%로 나타났고, 예측 정확도는 유가증권 시장 예측력이 코스닥시장 주식 대비 높게 나타났으며, 대형주의 예측력이 소형주 보다 높게 나타났다. 마지막으로, SHAP(Shapley Additive exPlanations) 분석을 통해 개별 모델의 예측에 중요한 변수를 살펴보고 모델의 해석력을 제고하였다.