DOI QR코드

DOI QR Code

A Study on the Effect of the Document Summarization Technique on the Fake News Detection Model

문서 요약 기법이 가짜 뉴스 탐지 모형에 미치는 영향에 관한 연구

  • Shim, Jae-Seung (Graduate School of Business IT, Kookmin University) ;
  • Won, Ha-Ram (Graduate School of Business IT, Kookmin University) ;
  • Ahn, Hyunchul (Graduate School of Business IT, Kookmin University)
  • Received : 2019.08.08
  • Accepted : 2019.09.20
  • Published : 2019.09.30

Abstract

Fake news has emerged as a significant issue over the last few years, igniting discussions and research on how to solve this problem. In particular, studies on automated fact-checking and fake news detection using artificial intelligence and text analysis techniques have drawn attention. Fake news detection research entails a form of document classification; thus, document classification techniques have been widely used in this type of research. However, document summarization techniques have been inconspicuous in this field. At the same time, automatic news summarization services have become popular, and a recent study found that the use of news summarized through abstractive summarization has strengthened the predictive performance of fake news detection models. Therefore, the need to study the integration of document summarization technology in the domestic news data environment has become evident. In order to examine the effect of extractive summarization on the fake news detection model, we first summarized news articles through extractive summarization. Second, we created a summarized news-based detection model. Finally, we compared our model with the full-text-based detection model. The study found that BPN(Back Propagation Neural Network) and SVM(Support Vector Machine) did not exhibit a large difference in performance; however, for DT(Decision Tree), the full-text-based model demonstrated a somewhat better performance. In the case of LR(Logistic Regression), our model exhibited the superior performance. Nonetheless, the results did not show a statistically significant difference between our model and the full-text-based model. Therefore, when the summary is applied, at least the core information of the fake news is preserved, and the LR-based model can confirm the possibility of performance improvement. This study features an experimental application of extractive summarization in fake news detection research by employing various machine-learning algorithms. The study's limitations are, essentially, the relatively small amount of data and the lack of comparison between various summarization technologies. Therefore, an in-depth analysis that applies various analytical techniques to a larger data volume would be helpful in the future.

가짜뉴스가 전세계적 이슈로 부상한 최근 수년간 가짜뉴스 문제 해결을 위한 논의와 연구가 지속되고 있다. 특히 인공지능과 텍스트 분석을 이용한 자동화 가짜 뉴스 탐지에 대한 연구가 주목을 받고 있는데, 대부분 문서 분류 기법을 이용한 연구들이 주를 이루고 있는 가운데 문서 요약 기법은 지금까지 거의 활용되지 않았다. 그러나 최근 가짜뉴스 탐지 연구에 생성 요약 기법을 적용하여 성능 개선을 이끌어낸 사례가 해외에서 보고된 바 있으며, 추출 요약 기법 기반의 뉴스 자동 요약 서비스가 대중화된 현재, 요약된 뉴스 정보가 국내 가짜뉴스 탐지 모형의 성능 제고에 긍정적인 영향을 미치는지 확인해 볼 필요가 있다. 이에 본 연구에서는 국내 가짜뉴스에 요약 기법을 적용했을 때 정보 손실이 일어나는지, 혹은 정보가 그대로 보전되거나 혹은 잡음 제거를 통한 정보 획득 효과가 발생하는지 알아보기 위해 국내 뉴스 데이터에 추출 요약 기법을 적용하여 '본문 기반 가짜뉴스 탐지 모형'과 '요약문 기반 가짜뉴스 탐지 모형'을 구축하고, 다수의 기계학습 알고리즘을 적용하여 두 모형의 성능을 비교하는 실험을 수행하였다. 그 결과 BPN(Back Propagation Neural Network)과 SVM(Support Vector Machine)의 경우 큰 성능 차이가 발생하지 않았지만 DT(Decision Tree)의 경우 본문 기반 모델이, LR(Logistic Regression)의 경우 요약문 기반 모델이 다소 우세한 성능을 보였음을 확인하였다. 결과를 검증하는 과정에서 통계적으로 유의미한 수준으로는 요약문 기반 모델과 본문 기반 모델간의 차이가 확인되지는 않았지만, 요약을 적용하였을 경우 가짜뉴스 판별에 도움이 되는 핵심 정보는 최소한 보전되며 LR의 경우 성능 향상의 가능성이 있음을 확인하였다. 본 연구는 추출요약 기법을 국내 가짜뉴스 탐지 연구에 처음으로 적용해 본 도전적인 연구라는 점에서 의의가 있다. 하지만 한계점으로는 비교적 적은 데이터로 실험이 수행되었다는 점과 한 가지 문서요약기법만 사용되었다는 점을 제시할 수 있다. 향후 대규모의 데이터에서도 같은 맥락의 실험결과가 도출되는지 검증하고, 보다 다양한 문서요약기법을 적용해 봄으로써 요약 기법 간 차이를 규명하는 확장된 연구가 추후 수행되어야 할 것이다.

Keywords

References

  1. Afroz, S., M. Brennan, and R. Greenstadt, "Detecting Hoaxes, Frauds, and Deception in Writing Style Online," 2012 IEEE Symposium on Security and Privacy, (2012), 461-475.
  2. Aker, A., L. Derczynski, and K. Bontcheva, "Simple Open Stance Classification for Rumour Analysis," Proceedings of Recent Advances in Natural Language Processing, (2017), 31-39.
  3. Allahyari, M., S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, and K. Kochut, "Text Summarization Techniques: A Brief Survey," International Journal of Advanced Computer Science and Applications, Vol.8, No.10, (2017), 397-405.
  4. Bondiellia, A., and F., Marcelloni, "A survey on fake news and rumour detection techniques," Information Sciences, Vol.497, (2019), 38-55. https://doi.org/10.1016/j.ins.2019.05.035
  5. Castillo, C., M. Mendoza, and B. Poblete, "Information Credibility on Twitter," Proceedings of the 20th International Conference on World Wide Web, (2011), 675-684.
  6. Diab, S., "Optimizing Stochastic Gradient Descent in Text Classification Based on Fine-Tuning Hyper-Parameters Approach," International Journal of Computer Science and Information Security, Vol.16, No.12, (2018), 155-160.
  7. Erkan, G., and D. R. Radev, "LexRank: Graph-based Lexical Centrality as Salience in Text Summarization," Journal of Artificial Intelligence Research, Vol.22, (2004), 457-479. https://doi.org/10.1613/jair.1523
  8. Esmaeilzadeh, S., G. X. Peh, and A. Xu, "Neural Abstractive Text Summarization and Fake News Detection," arXiv preprint arXiv:1904.00788, (2019).
  9. Ferreira, W., and A. Vlachos, "Emergent: a novel data-set for stance classification," Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (2016), 1163-1168.
  10. Gambhir, M., and V. Gupta, "Recent automatic text summarization techniques: a survey," Artificial Intelligence Review, Vol.47, No.1, (2017), 1-66. https://doi.org/10.1007/s10462-016-9475-9
  11. Giasemidis, G., C. Singleton, I. Agrafiotis, J. R. C. Nurse, A. Pilgrim, C. Willis, and D. V. Greetham, "Determining the Veracity of Rumours on Twitter," Social Informatics Part I, (2016), 185-205.
  12. Hardalov, M., I. Koychev, and P. Nakov, "In Search of Credible News," Artificial Intelligence: Methodology, Systems, and Applications, (2016), 172-180.
  13. Hyun, Y., and N. Kim, "Text Mining-based Fake News Detection Using News And Social Media Data," The Journal of Society for e-Business Studies, Vol.23, No.4, (2018), 19-39. https://doi.org/10.7838/JSEBS.2018.23.4.019
  14. Jeon, B., and H. Ahn, "A Collaborative Filtering System Combined with Users Review Mining : Application to the Recommendation of Smartphone Apps," Journal of Intelligence and Information System, Vol.21, No.2, (2015), 1-18. https://doi.org/10.13088/jiis.2015.21.2.01
  15. Jin, H., "Compressed three news articles ... NAVER AI summary bot appeared," Digital Times, 2017, Available at http://www.dt.co.kr/contents.html?article_no=2017112902101131043001 (Accessed 30 August 2019).
  16. Kim, K.-B., "A Passport Recognition and Face Verification Using Enhanced Fuzzy ART Based RBF Network and PCA Algorithm," Journal of Intelligence and Information System, Vol.12, No.1, (2006), 17-31.
  17. Kim, S. S., and Green Consumer Network in Korea, Consumer Awareness Survey on Mobile Video Service, Kim Sung Soo Representative Office, 2018. Available at http://theminjoo.kr/inspectionDetail.do?nt_id=16&bd_seq=126493 (Accessed 30 August 2019).
  18. Koo, B.-K., "News Summary Bots and Human Power in the Infinite Information Age," The Hankyoreh, 2017, Available at http://www.hani.co.kr/arti/economy/it/822844.html (Accessed 30 August 2019).
  19. Kwon, S., M. Cha, K. Jung, W. Chen, and Y. Wang, "Prominent Features of Rumor Propagation in Online Social Media," IEEE 13th International Conference on Data Mining, (2013), 1103-1108.
  20. Lafferty, J., A. McCallum, and F. C. N. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," Proceedings of the 18th International Conference on Machine Learning 2001, (2001), 282-289.
  21. Ma, J., W. Gao, P. Mitra, S. Kwon, B. J. Jansen, K. F. Wong, and M. Cha, "Detecting Rumors from Microblogs with Recurrent Neural Networks," Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, (2016), 3818-3824.
  22. Park, S. S., and K. C., Lee, "A Comparative Study of Text analysis and Network embedding Methods for Effective Fake News Detection," Journal of Digital Convergence, Vol.17, No.5(2019), 137-143. https://doi.org/10.14400/JDC.2019.17.5.137
  23. Radev, D. R., E. Hovy, and K. McKeown, "Introduction to the Special Issue on Summarization," Journal of Computational linguistics, Vol.28, (2002), 399-408. https://doi.org/10.1162/089120102762671927
  24. Seol, J., and S. Lee, " lexrankr: LexRank based Korean multi-document summarization," Journal of the Korean Institute of Information Scientist and Engineers 2016 Winter Academic Conference, (2016), 458-460.
  25. Shin, H., "Naver 'Summary Bot', is it an evolution or editorial infringement?," Sisain, 2017, Available at https://www.sisain.co.kr/news/articleView.html?idxno=30828 (Accessed 30 August 2019).
  26. Lee S., and H.-J. Kim, "Keyword Extraction from News Corpus using Modified TF-IDF," The Journal of Society for e-Business Studies, Vol.14, No.4(2009), 59-73.
  27. Wang, W. Y., ""Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection," Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vol.2, (2017), 422-426.
  28. Yang, J, "'News' and 'Fake News' that ordinary citizens think," Media issue, Vol.5, No.1, (2019), 1-15.
  29. Yoon, S. O., "A Study on the Characteristics and Problems of Fake News Regulations proposed by the National Assembly," Journal of Media Law, Ethics and Policy Research, Vol.18, No.1, (2019), 103-138. https://doi.org/10.26542/JML.2019.4.18.1.103
  30. Yun, T. U., and H. Ahn, "Fake News Detection for Korean News Using Text Mining and Machine Learning Techniques," Journal of Information Technology Applications & Management, Vol.25, No.1, (2018), 19-32. https://doi.org/10.21219/JITAM.2018.25.1.019
  31. Yun, Y., Ko, E., and Kim, N., "Subject-Balanced Intelligent Text Summarization Scheme," Journal of Intelligence and Information System, Vol.25, No.2, (2019), 141-166. https://doi.org/10.13088/JIIS.2019.25.2.141
  32. Zhang, H., Z. Fan, J. Zheng, and Q. Liu, "An Improving Deception Detection Method in Computer-Mediated Communication," Journal of Networks, Vol.7, No.11, (2012), 1811-1816.
  33. Zhao, Z., P. Resnick, and Q. Mei, "Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts," Proceedings of the 24th International Conference on World Wide Web, (2015), 1395-1405.
  34. Zubiaga, A., M. Liakata, and R. Procter, "Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media," arXiv preprint arXiv:1610.07363, (2016).