DOI QR코드

DOI QR Code

Research on Multi-facted News Article Classification Models Classifying Subjects, Geographies and Genres

심층 주제, 지역, 장르를 모두 분류할 수 있는 다면적 뉴스 기사 자동 분류 모델 연구

  • 이효진 (경기대학교 문헌정보학과) ;
  • 최성필 (경기대학교 문헌정보학과)
  • Received : 2024.07.26
  • Accepted : 2024.08.20
  • Published : 2024.08.31

Abstract

This study developed a model to classify news articles into categories of topic, genre, and region using a Korean Pre-trained Language model. To achieve this, a new news article classification system was designed by referring to the classification systems of domestic media outlets. The topic and genre classification models were implemented as hierarchical classification models that link the main categories and subcategories, and their performance was compared with that of an integrated category model. The evaluation results showed that the hierarchical structure classification model had the advantage of providing more precise categorization in ambiguous or overlapping categories compared to the integrated category model. For regional classification of news articles, a model was built to classify into 18 categories, and for regional news articles, the regional characteristics were clearly reflected in the text, resulting in high performance. This study demonstrated the effectiveness of classifying news articles from multiple perspectives-topic, genre, and region-and emphasized the significance of suggesting the potential for a multi-dimensional news article classification service that meets user needs.

본 연구는 한국어 사전학습 모델을 활용하여 뉴스 기사를 주제, 장르, 지역별로 각각 분류하는 모델을 구축하였다. 이를 위해 국내 언론사의 분류체계를 참고하여 새로운 뉴스 기사 분류체계를 설계하였다. 주제 및 장르 분류 모델은 대분류와 중분류 모델을 연결한 계층적 구조의 분류 모델로 구현하여 카테고리 통합 모델의 성능과 비교하였다. 평가 결과, 계층적 구조의 분류 모델은 모호하거나 중복된 카테고리에서 카테고리 통합 모델보다 더 명확한 분류를 수행할 수 있다는 이점이 있었다. 뉴스 기사의 지역적 분류를 위해서는 18개의 카테고리에 대하여 분류를 수행하는 모델을 구축하였으며 지역 관련 뉴스 기사의 경우, 지역적 특성이 본문에 명확히 드러나 높은 성능을 기록할 수 있었다. 본 연구는 주제, 장르, 지역의 다각적인 측면에서 뉴스 기사를 효과적으로 분류할 수 있음을 보여주었으며, 이를 통해 사용자 요구에 부합하는 다차원적 뉴스 기사 분류 서비스의 가능성을 제시한 점에서 의의가 있다.

Keywords

References

  1. Gukjenews (2024. 7. 5.). Gukjenews homepage news article category. Available: https://www.gukjenews.com/ 
  2. Hankookilbo (2024. 7. 5.). Hankookilbo homepage news article category. Available: https://www.hankookilbo.com/ 
  3. Jang, Jihyoung & Hong, Charmgil (2021). Automated topic classification of the news articles on North Korea and korean unification. The Korean Institute of Information Scientists and Engineers Proceedings of the 2022 Korea Computer Congress, 2126-2128. 
  4. Jeonkuk Maeil Shinmun (2024. 7. 5.). Jeonkuk Maeil Shinmun homepage news article category. Available: https://www.jeonmae.co.kr/ 
  5. JoongAng (20204. 7. 5.). JoongAng homepage news article category. Available: https://www.joongang.co.kr/ 
  6. Kang, Seungtae & Jang, Gil Jin (2023). COVID-19 multilingual news article auto-indexing and classification using ChatGPT and multilingual BERT. Journal of the Institute of Electronics and Information Engineers, 60(7), 20-29. https://doi.org/10.5573/ieie.2023.60.7.20 
  7. Kim, Deok Gi & On, Byung Won (2023). A comparative study on KoBERT-based context Vector clustering methods for Korean news article classification. The Proceedings of the 2023 KIIT Summer Conference, 499-503. 
  8. Kim, Heyoung (2015). Automatic Classification of Korean Newspapers by Topic Using Topic Modeling. Master's thesis, Korea University, linguistics. https://doi.org/10.23186/korea.000000059980.11009.0000915 
  9. Kim, Mi Sun (2022). Newspaper Article Classification Model Based on Core Keyword Extraction: Using The Korea Agricultural Newspaper Article. Master's thesis, Chung Buchk National University, Management Information System 
  10. Korea Press Foundation (2024. 7. 5.). AI language model 'KPF-BERT' released based on Big Kinds article. Available: https://kpf.or.kr/front/board/boardContentsView.do?board_id=246&contents_id=0efb18236cbe482293f4b366a251f676&link_g_topmenu_id=ccd1e88d6d7345cca51f20ce9f56d652&link_g_submenu_id=8f52dfc509b34e90aa799cf2d8204223&link_g_homepage=F 
  11. Lee, Jae Uk, Ko, Byeong-Kyou, & Kim, Pan Koo (2016). News category classification using mutual information and log normalization. The Journal of Korean Institute of Information Technology, 14(7), 79-85. https://doi.org/10.14801/jkiit.2016.14.7.79 
  12. Newsis (2024. 7. 5.). Newsis homepage news article category. Available: https://www.newsis.com/ 
  13. Sung, Na Young & Koo, Myoung Wan (2018). A study on category classification of Korean news by a MemN2N(End-to-End Memory Networks) based on WPM. The Korean Institute of Information Scientists and Engineers Proceeding of the 2018 Korea Computer Congress. 
  14. The Kyunghyang Shinmun (20204. 7. 5.). Kyunghyang Shinmun homepage news article category. Available: https://www.khan.co.kr/ 
  15. The Munhwa Ilbo (20204. 7. 5.). The Munhwa Ilbo homepage news article category. Available: https://www.munhwa.com/ 
  16. Yonhap News Agency (20204. 7. 5.). Yonhap News Agency homepage news article category. Available: https://www.yna.co.kr/
  17. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805v2. https://doi.org/10.48550/arXiv.1810.04805 
  18. IPTC (2024. 7. 5.). NewsCodes mediatopic. arts, culture, entertainment and media. Available: https://cv.iptc.org/newscodes/mediatopic/01000000 
  19. Park Sungjoon, Moon Jihyung, Kim Sungdong, Cho Won Ik, Han Jiyoon, Park Jangwon, Song Chisung, Kim Junseong, Song Youngsook, Oh Taehwan, Lee Joohong, Oh Juhyun, Lyu Sungwon, Jeong Younghoon, Lee Inkwon, Seo Sangwoo, Lee Dongjun, Kim Hyunwoo, Lee Myeonghwa, Jang Seongbo, Do Seungwon, Kim Sunkyoung, Lim Kyungtae, Lee Jongwon, Park Kyumin, Shin Jamin, Kim Seonghyun, Park Lucy, Oh Alice, Ha Jung-Woo, & Cho Kyunghyun (2021). Klue: Korean language understanding evaluation. arXiv preprint arXiv:2105.09680. https://doi.org/10.48550/arXiv.2105.09680 
  20. sktelecom (2021. 2. 14.). KoBERT. 출처: https://sktelecom.github.io/project/kobert/