DOI QR코드

DOI QR Code

Hate Speech Detection Using Modified Principal Component Analysis and Enhanced Convolution Neural Network on Twitter Dataset

  • Majed, Alowaidi (Department of Information Technology, College of Computer and Information Sciences, Majmaah University)
  • 투고 : 2023.01.05
  • 발행 : 2023.01.30

초록

Traditionally used for networking computers and communications, the Internet has been evolving from the beginning. Internet is the backbone for many things on the web including social media. The concept of social networking which started in the early 1990s has also been growing with the internet. Social Networking Sites (SNSs) sprung and stayed back to an important element of internet usage mainly due to the services or provisions they allow on the web. Twitter and Facebook have become the primary means by which most individuals keep in touch with others and carry on substantive conversations. These sites allow the posting of photos, videos and support audio and video storage on the sites which can be shared amongst users. Although an attractive option, these provisions have also culminated in issues for these sites like posting offensive material. Though not always, users of SNSs have their share in promoting hate by their words or speeches which is difficult to be curtailed after being uploaded in the media. Hence, this article outlines a process for extracting user reviews from the Twitter corpus in order to identify instances of hate speech. Through the use of MPCA (Modified Principal Component Analysis) and ECNN, we are able to identify instances of hate speech in the text (Enhanced Convolutional Neural Network). With the use of NLP, a fully autonomous system for assessing syntax and meaning can be established (NLP). There is a strong emphasis on pre-processing, feature extraction, and classification. Cleansing the text by removing extra spaces, punctuation, and stop words is what normalization is all about. In the process of extracting features, these features that have already been processed are used. During the feature extraction process, the MPCA algorithm is used. It takes a set of related features and pulls out the ones that tell us the most about the dataset we give itThe proposed categorization method is then put forth as a means of detecting instances of hate speech or abusive language. It is argued that ECNN is superior to other methods for identifying hateful content online. It can take in massive amounts of data and quickly return accurate results, especially for larger datasets. As a result, the proposed MPCA+ECNN algorithm improves not only the F-measure values, but also the accuracy, precision, and recall.

키워드

참고문헌

  1. H. Watanabe, M. Bouazizi, T. Ohtsuki, Hate speech on twitter a pragmaticapproach to collect hateful and offensive expressions and perform hate speech detection, IEEEAccess, 13825-13835, 2018.
  2. Z. Zhang, D. Robinson, J. Tepper, Detecting hate speech on twitter using aconvolution-gru based deep neural network, In15th European Semantic Web Conference, pp745-760, 2018
  3. S. Malmasi, M. Zampieri, Detecting hate speech in so-cial media. 2017.
  4. Biere, Shanita, Sandjai Bhulai, and Master Business Analytics, Hate speech detection using natural language processing techniques, Master Business Analytics Department of Mathematics Faculty of Science, 2018.
  5. P. Fortuna, Automatic detection of hate speech in text: anoverview of the topic and dataset annotation with hierarchical classes, 2017.
  6. D. Robinson, Z. Zhang, J. Tepper, Hate speech detection on Twitter: feature engineering vs feature selection, European Semantic Web Conference, pp 46-49, 2018.
  7. R. Gomez, J. Gibert, L. Gomez, D. Karatzas, Exploring hate speech detection in multimodal publications, Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020.
  8. Shirbhate G. Amit, S N. Deshmukh, Feature Extraction for Sentiment Classification on Twitter Data, International Journal of Science and Research (IJSR), pp.2183-2189, 2016
  9. R. Batool, A.M Khattak, J. Maqbool, S. Lee, Precise tweet classification and sentiment analysis. 2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS). IEEE, 2013.
  10. M. Ahmad, S. Aftab, I.Ali. Sentiment analysis of tweets using svm, Int. J. Comput. Appl, pp 25-29, 2017
  11. Z. Mossie, J.H. Wang, Vulnerable community identification using hate speech detection on social media, Information Processing & Management, p102087, 2020.
  12. T. Davidson, D. , Warmsley, M. Macy, Weber, Automated hate speech detection and the problem of offensive language, Proceedings of the International AAAI Conference on Web and Social Media. pp 512-515, 2017.
  13. G. Haibo, H. Wenxue, C. Jianxin, X. Yonghong, Optimization of principal component analysis in feature extraction, International Conference on Mechatronics and Automation, IEEE, pp 3128-3132, 2007.
  14. M. Imani, H. Ghassemian, Principal component discriminant analysis for feature extraction and classification of hyperspectral images, Iranian Conference on Intelligent Systems (ICIS), pp 1-5, 2014.
  15. M. Ghiassi, J. Skinner, D. Zimbra, Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network, Expert Systems with applications, pp 6266-6282. 2013
  16. J. Freeman, Content search within large environmental datasets using a convolution neural network, Computers & Geosciences, p 104479, 2020
  17. B. Riordan, M.N. Jones, Redundancy in perceptual and linguistic experience: Comparing feature-based and distributional models of semantic representation, Topics in Cognitive Science, pp 303-345, 2011 https://doi.org/10.1111/j.1756-8765.2010.01111.x
  18. A. Farkiya, P. Saini, S. Sinha, S. Desai, Natural Language Processing using NLTK and WordNet, pp 5465-5469,2015.
  19. S. Jain, S. Shukla, R. Wadhvani, Dynamic selection of normalization techniques using data complexity measures, Expert Systems with Applications, pp 252-262, 2018
  20. Y.H. Taguchi, Y. Murakami, Principal component analysis based feature extraction approach to identify circulating microRNA biomarkers, PloS one, pe66714, 2013
  21. A. Pal, Principal Component Analysis of TF-IDF In Click Through Rate Prediction‖, International Journal of New Technology and Research (IJNTR), pp 24-26, 2018.
  22. M. Suganuma,, S. Shirakawa, N. Nagao, A genetic programming approach to designing convolutional neural network architectures, Proceedings of the genetic and evolutionary computation conference. 2017.
  23. https://data.world/crowdflower/hate-speech-identification
  24. H. Almerekhi, H. Kwak, B.J. Jansen, J. Salminen, Detecting toxicity triggers in online discussions, In: The proceedings of the 30th ACM conference on hypertext and social media, pp 291-292, 2019.
  25. H. Liu, P. Burnap, W. Alorainy, M.L. Williams, Fuzzy multi-task learning for hate speech type identification, In The world wide web conference, pp. 3006-3012, 2019
  26. J.S. Lee, Refined filtering of image noise using local statistics, Computer graphics and image processing, pp 380-389, 1981. https://doi.org/10.1016/s0146-664x(81)80018-4
  27. N. Kumar, A. Sharma, Sentimental analysis for political activities from social media data analytics, 2017.
  28. N. Kumar, N. Sukavanam, A cascaded CNN model for multiple human tracking and re-localization in complex video sequences with large displacement. Multimedia Tools and Applications, pp 6109-6134, 2020.   https://doi.org/10.1007/s11042-019-08501-4