DOI QR코드

DOI QR Code

A Comparative Study of Text analysis and Network embedding Methods for Effective Fake News Detection

효과적인 가짜 뉴스 탐지를 위한 텍스트 분석과 네트워크 임베딩 방법의 비교 연구

  • Park, Sung Soo (SKK Business School, Sungkyunkwan University) ;
  • Lee, Kun Chang (Global Business Administration/Dept of Health Sciences & & Technology, SAIHST(Samsung Advanced Institute for Health Science & Technology), Sungkyunkwan University)
  • 박성수 (성균관대학교 경영대학) ;
  • 이건창 (성균관대학교 글로벌경영학과/삼성융합의과학원 융합의과학과)
  • Received : 2019.02.21
  • Accepted : 2019.05.20
  • Published : 2019.05.28

Abstract

Fake news is a form of misinformation that has the advantage of rapid spreading of information on media platforms that users interact with, such as social media. There has been a lot of social problems due to the recent increase in fake news. In this paper, we propose a method to detect such false news. Previous research on fake news detection mainly focused on text analysis. This research focuses on a network where social media news spreads, generates qualities with DeepWalk, a network embedding method, and classifies fake news using logistic regression analysis. We conducted an experiment on fake news detection using 211 news on the Internet and 1.2 million news diffusion network data. The results show that the accuracy of false network detection using network embedding is 10.6% higher than that of text analysis. In addition, fake news detection, which combines text analysis and network embedding, does not show an increase in accuracy over network embedding. The results of this study can be effectively applied to the detection of fake news that organizations spread online.

가짜 뉴스는 소셜 미디어와 같이 사용자가 상호작용하는 미디어 플랫폼에서 정보가 빠른 속도로 확산되는 이점을 가지는 오류 정보(misinformation)의 한 형태이다. 최근 가짜 뉴스의 증가로 인해 사회적으로 많은 문제가 발생하고 있다. 본 논문에서는 이러한 가짜 뉴스를 탐지하는 방법을 제안한다. 이전의 가짜 뉴스 탐지는 텍스트 분석을 사용한 연구가 주로 수행되었다. 본 연구는 소셜 미디어의 뉴스가 확산되는 네트워크에 초점을 두고, 네트워크 임베딩 방법인 DeepWalk 로 자질을 생성하고 로지스틱 회귀분석을 사용하여 가짜 뉴스를 분류한다. 인터넷에 공개된 뉴스 211개와 120만개의 뉴스 확산 네트워크 데이터를 사용한 가짜 뉴스 탐지에 대한 실험을 수행하였다. 연구 결과 텍스트 분석에 비하여 네트워크 임베딩을 사용한 가짜 뉴스 탐지의 정확도가 최소 1.7%에서 최대 10.6% 더 높게 나타났다. 또한, 텍스트 분석과 네트워크 임베딩을 결합한 가짜 뉴스 탐지는 네트워크 임베딩에 비해 정확도의 상승이 나타나지 않았다. 본 연구의 결과는 기업이나 조직은 온라인 상에서 확산되는 가짜 뉴스 탐지에 효과적으로 활용될 수 있다.

Keywords

DJTJBT_2019_v17n5_137_f0001.png 이미지

Fig. 1. DeepWalk Overview

DJTJBT_2019_v17n5_137_f0002.png 이미지

Fig. 2. Fake news detection procedures that combine text and network embedding methods

Table 1. Statistics for fake news detection data set

DJTJBT_2019_v17n5_137_t0001.png 이미지

Table 2. Fake news detection experiment results using text analysis and graph embedding

DJTJBT_2019_v17n5_137_t0002.png 이미지

Table 3. Fake News Detection Analysis Results Combining Text Analysis and Graph Embedding

DJTJBT_2019_v17n5_137_t0003.png 이미지

References

  1. H. Allcott & M. Gentzkow. (2017). "Social media and fake news in the 2016 election". Journal of Economic Perspectives, 31(2), 211-236. DOI: 10.1257/jep.31.2.211
  2. N. J. Conroy, V. L. Rubin & Y. Chen. (2015). "Automatic deception detection: methods for finding fake news". Proceedings of the Association for Information Science and Technology, 52(1), 1-11.
  3. E. Mustafaraj & P. T. Metaxas. (2017). "The fake news spreading plague: Was it preventable?". Proceedings of the 2017 ACM on Web Science Conference, 235-239.
  4. M. Potthast, J. Kiesel, K. Reinartz, J. Bevendorff & B. Stein. (2018). "A stylometric inquiry into hyperpartisan and fake news". Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL'18), 231-240
  5. J. Tang, Y. Chang, & H. Liu. (2014). "Mining social media with social theories: A survey". ACM SIGKDD Explorations Newsletter. 15(2), 20-29. https://doi.org/10.1145/2641190.2641195
  6. V. L. Rubin, Y. Chen & N. J. Conroy. (2015). "Deception detection for news: Three types of fakes". Proceedings of the Association for Information Science and Technology, 52(1), 1-11. https://doi.org/10.1002/pra2.2015.145052010082
  7. B. D. Horne & S. Adali. (2017). "This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news". Proceedings of AAAI, 1-9.
  8. H. Ahmed, I. Traore & S. Saad. (2017). "Detection of online fake news using n-gram analysis and machine learning techniques". International Conference on Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments, 127-138.
  9. Z. Jia, L. Zhao & N. Zhou.. (2017). "The research on fraud group mining which based on social network analysis". IEEE International Conference on Computer and Communications Analysis, 2387-2391.
  10. Z. Jin, J. Cao, Y. Zhang & J. Luo. (2016). "News Verification by Exploiting Conflicting Social Viewpoints in Microblogs". Proceedings of AAAI, 2972-2978.
  11. E. Tacchini, G. Ballarin, M. L. Della Vedova, S. Moret, & L. de Alfaro. (2017). "Some like it hoax: Automated fake news detection in social networks," CEUR Workshop Proceedings (1960), 1-12.
  12. N. Ruchansky, S. Seo & Y. Liu. "CSI: A hybrid deep model for fake news detection". Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM '17), 797-806.
  13. S. Kwon, M. Cha, K. Jung, W. Chen & Y. Wang. (2013). "Prominent features of rumor propagation in online social media". IEEE 13th International Conference on Data Mining, 1103-1108.
  14. T. Mikolov, K. Chen, G. Corrado & J. Dean. (2013). "Efficient Estimation of Word Representations in Vector Space". Proceedings of Workshop at ICLR, 1-12.
  15. B. Perozzi, R. Al-Rfou & S. Skiena. (2014). "DeepWalk: Online learning of social representations". Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '14), 701-710.
  16. A. Grover & J. Leskovec. (2016). "Node2Vec: Scalable feature learning for networks". Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16), 855-864.
  17. K. Shu, S. Wang & H. Liu. (2019). "Exploiting tri-relationship for fake news detection". Proceedings of 12th ACM International Conference on Web Search and Data Mining (WSDM 2019), arXiv:1712.07709.