DOI QR코드

DOI QR Code

Facebook Spam Post Filtering based on Instagram-based Transfer Learning and Meta Information of Posts

인스타그램 기반의 전이학습과 게시글 메타 정보를 활용한 페이스북 스팸 게시글 판별

  • Kim, Junhong (School of Industrial Management Engineering, Korea University) ;
  • Seo, Deokseong (School of Industrial Management Engineering, Korea University) ;
  • Kim, Haedong (School of Industrial Management Engineering, Korea University) ;
  • Kang, Pilsung (School of Industrial Management Engineering, Korea University)
  • 김준홍 (고려대학교 산업경영공학부) ;
  • 서덕성 (고려대학교 산업경영공학부) ;
  • 김해동 (고려대학교 산업경영공학부) ;
  • 강필성 (고려대학교 산업경영공학부)
  • Received : 2016.08.13
  • Accepted : 2017.02.18
  • Published : 2017.06.15

Abstract

This study develops a text spam filtering system for Facebook based on two variable categories: keywords learned from Instagram and meta-information of Facebook posts. Since there is no explicit labels for spam/ham posts, we utilize hash tags in Instagram to train classification models. In addition, the filtering accuracy is enhanced by considering meta-information of Facebook posts. To verify the proposed filtering system, we conduct an empirical experiment based on a total of 1,795,067 and 761,861 Facebook and Instagram documents, respectively. Employing random forest as a base classification algorithm, experimental result shows that the proposed filtering system yield 99% and 98% in terms of filtering accuracy and F1-measure, respectively. We expect that the proposed filtering scheme can be applied other web services suffering from massive spam posts but no explicit spam labels are available.

Keywords

References

  1. Breiman, L. (2001), Random Forests, Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
  2. Fernandez-Delgado. M. and Cernadas. E. (2014), Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?, Journal of Machine Learning Research, 15, 3133-3181.
  3. Gao, H., Chen, Y., Lee, K., Palsetia, D., and Choudhary, A. N. (2012), Towards Online Spam Filtering in Social Networks, In NDSS 12, 1-16.
  4. Jo, C. Y. (2011), A Semiotic Study for New Media-applied to the case for Social Network Service, Semiotic Inquiry, 30, 125-154.
  5. Joe, I. H. and Shim, H. T. (2009), A SVM-based Spam Filtering System for Short Message Service, The Korean Institute of Communications and Information Sciences, 34(9), 908-913.
  6. Kanaris, I., Kanaris, K., and Stamatatos, E. (2006), Spam detection using character n-grams, Hellenic conference on artificial intelligence, 3955, 95-104.
  7. Lee, H. N., Song, M. G., and Im, E. G. (2011a), A Study on Structuring Spam Short Message Service(SMS) filter, The Korean Institute of Communications and Information Sciences, 1072-1073.
  8. Lee, S. J. and Choi, D. J. (2011b), Personalized Mobile Junk Message Filtering System, The Journal of the Korea Contents Association, 11(12), 122-135. https://doi.org/10.5392/JKCA.2011.11.12.122
  9. Lee, S. W. (2010), Spam Filter by Using X2 Statistics and Support Vector Machines, The KIPS transactions, 17(3), 249-254.
  10. Oh, Y. H., Kim, H., Yoon, J. S., and Lee, J. S. (2014), Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games, Journal of Korean Institute of Industrial Engineers, 40(1), 8-17. https://doi.org/10.7232/JKIIE.2014.40.1.008
  11. Quan, X., Liu, W., and Qiu, B. (2011), Term Weighting Schemes for Question Categorization, IEEE Transactions on Pattern Analysis and Machine Intelligence archive, 33(5), 1009-1021. https://doi.org/10.1109/TPAMI.2010.154
  12. Shannon, C. E. (2001), A mathematical theory of communication, ACM SIGMOBILE Mobile Computing and Communications Review, 5(1), 3-55. https://doi.org/10.1145/584091.584093
  13. Soiraya, M., Thanalerdmongkol, S., and Chantrapornchai, C. (2012), Using a Data Mining Approach : Spam Detection on Facebook, International Journal of Computer Applications, 58(13), 26-31. https://doi.org/10.5120/9343-3660
  14. Stringhini, G., Kruegel, C., and Vigna G. (2010), Detecting spammers on social networks, Proceedings of the 26th Annual Computer Security Applications Conference, 1-9.
  15. Yang, C., Harkreader, R. C., and Gu, G. (2011), Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers, In International Workshop on Recent Advances in Intrusion Detection, 318-337.
  16. Yang, C., Harkreader, R. C., and Gu, G. (2013), Empirical evaluation and new design for fighting evolving Twitter spammers, IEEE Transactions on Information Forensics and Security, 8(8), 1280-1293. https://doi.org/10.1109/TIFS.2013.2267732
  17. Zhang, X., Li, Z., Zhu, S., and Liang, W. (2016), Detecting spam and promoting campaigns in Twitter, ACM Transactions on the Web (TWEB), 10(1), 4:1-28.
  18. Zheng, X., Zeng, Z., Chen, Z., Yu, Y., and Rong, C. (2015), Detecting spammers on social networks, Neurocomputing, 159, 27-34. https://doi.org/10.1016/j.neucom.2015.02.047