DOI QR코드

DOI QR Code

BERT-Based Logits Ensemble Model for Gender Bias and Hate Speech Detection

  • 투고 : 2022.09.06
  • 심사 : 2023.01.29
  • 발행 : 2023.10.31

초록

Malicious hate speech and gender bias comments are common in online communities, causing social problems in our society. Gender bias and hate speech detection has been investigated. However, it is difficult because there are diverse ways to express them in words. To solve this problem, we attempted to detect malicious comments in a Korean hate speech dataset constructed in 2020. We explored bidirectional encoder representations from transformers (BERT)-based deep learning models utilizing hyperparameter tuning, data sampling, and logits ensembles with a label distribution. We evaluated our model in Kaggle competitions for gender bias, general bias, and hate speech detection. For gender bias detection, an F1-score of 0.7711 was achieved using an ensemble of the Soongsil-BERT and KcELECTRA models. The general bias task included the gender bias task, and the ensemble model achieved the best F1-score of 0.7166.

키워드

과제정보

This study was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2021R1F1A1061433) and partly by the Synapsoft ATC+ grant funded by the Ministry of Trade, Industry and Energy (MOTIE, Korea).

참고문헌

  1. A. Schmidt and M. Wiegand, "A survey on hate speech detection using natural language processing," in Proceedings of the 5th International Workshop on Natural Language Processing for Social Media, Valencia, Spain, 2017, pp. 1-10. https://doi.org/10.18653/v1/w17-1101
  2. H. Zhong, H. Li, A. C. Squicciarini, S. M. Rajtmajer, C. Griffin, D. J. Miller, and C. Caragea, "Content-driven detection of cyberbullying on the Instagram social network," in Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), New York, NY, 2016, pp. 3952-3958.
  3. J. H. Park and P. Fung, "One-step and two-step classification for abusive language detection on twitter," in Proceedings of the 1st Workshop on Abusive Language Online, Vancouver, Canada, 2017, pp. 41-45. https://doi.org/10.18653/v1/w17-3006
  4. H. Mulki, H. Haddad, C. B. Ali, and H. Alshabani, "L-HSAB: a Levantine twitter dataset for hate speech and abusive language," in Proceedings of the 3rd Workshop on Abusive Language Online, Florence, Italy, 2019, pp. 111-118. https://doi.org/10.18653/v1/W19-3512
  5. J. Zhao, T. Wang, M. Yatskar, V. Ordonez, and K. W. Chang, "Men also like shopping: reducing gender bias amplification using corpus-level constraints," in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, Denmark, 2017, pp. 2979-2989. https://doi.org/10.18653/v1/d17-1323
  6. S. Kiritchenko and S. M. Mohammad, "Examining gender and race bias in two hundred sentiment analysis systems," in Proceedings of the 7th Joint Conference on Lexical and Computational Semantics, New Orleans, LA, 2018, pp. 43-53. https://doi.org/10.18653/v1/s18-2005
  7. K. Lu, P. Mardziel, F. Wu, P. Amancharla, and A. Datta, "Gender bias in neural natural language processing," Logic, Language, and Security. Cham, Switzerland: Springer, 2020, pp. 189-202. https://doi.org/10.1007/978-3-030-62077-6_14
  8. Z. Ahmed, B. Vidgen, and S. A. Hale, "Tackling racial bias in automated online hate detection: towards fair and accurate detection of hateful users with geometric deep learning," EPJ Data Science, vol. 11, article no. 8, 2022. https://doi.org/10.1140/epjds/s13688-022-00319-9
  9. C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang, "Abusive language detection in online user content," in Proceedings of the 25th International Conference on World Wide Web, Geneva, Switzerland, 2016, pp. 145-153. https://doi.org/10.1145/2872427.2883062
  10. H. Rizwan, M. H. Shakeel, and A. Karim, "Hate-speech and offensive language detection in roman Urdu," in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Virtual Event, 2020, pp. 2512-2522. http://dx.doi.org/10.18653/v1/2020.emnlp-main.197
  11. P. Chiril, E. W. Pamungkas, F. Benamara, V. Moriceau, and V. Patti, "Emotionally informed hate speech detection: a multi-target perspective," Cognitive Computation, vol. 14, pp. 322-352, 2022. https://doi.org/10.1007/s12559-021-09862-5
  12. P. Fortuna, J. Soler-Company, and L. Wanner, "How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets?," Information Processing & Management, vol. 58, no. 3, article no. 102524, 2021. https://doi.org/10.1016/j.ipm.2021.102524
  13. N. S. Mullah and W. M. N. W. Zainon, "Advances in machine learning algorithms for hate speech detection in social media: a review," IEEE Access, vol. 9, pp. 88364-88376, 2021. https://doi.org/10.1109/ACCESS.2021.3089515
  14. P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, "Deep learning for hate speech detection in tweets,' in Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 2017, pp. 759-760. https://doi.org/10.1145/3041021.3054223
  15. R. Alshalan and H. Al-Khalifa, "A deep learning approach for automatic hate speech detection in the saudi twittersphere," Applied Sciences, vol. 10, no. 23, article no. 8614, 2020. https://doi.org/10.3390/app10238614
  16. O. Sharif and M. M. Hoque, "Tackling cyber-aggression: Identification and fine-grained categorization of aggressive texts on social media using weighted ensemble of transformers," Neurocomputing, vol. 490, pp. 462-481, 2022. https://doi.org/10.1016/j.neucom.2021.12.022
  17. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "BERT: pre-training of deep bidirectional transformers for language understanding," 2018 [Online]. Available: https://arxiv.org/abs/1810.04805.
  18. J. Moon, W. I. Cho, and J. Lee, "BEEP! Korean corpus of online news comments for toxic speech detection," in Proceedings of the 8th International Workshop on Natural Language Processing for Social Media, Virtual Event, 2020, pp. 25-31. https://doi.org/10.18653/v1/2020.socialnlp-1.4
  19. W. I. Cho, J. W. Kim, S. M. Kim, and N. S. Kim, "On measuring gender bias in translation of gender-neutral pronouns," in Proceedings of the 1st Workshop on Gender Bias in Natural Language Processing, Florence, Italy, 2019, pp. 173-181. https://doi.org/10.18653/v1/W19-3824
  20. W. I. Cho and J. Moon, "A study on the construction of Korean Hate speech corpus: based on the attributes of online toxic comments," in Proceedings of Annual Conference on Human and Language Technology, Virtual Event, 2020, pp. 298-303.
  21. S. Zimmerman, U. Kruschwitz, and C. Fox, "Improving hate speech detection with deep learning ensembles," in Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC), Miyazaki, Japan, 2018, pp. 2546-2553.
  22. M. R. Karim, S. K. Dey, T. Islam, S. Sarker, M. H. Menon, K. Hossain, M. A. Hossain, and S. Decker, "DeepHateExplainer: explainable hate speech detection in under-resourced Bengali language," in Proceedings of 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), Porto, Portugal, 2021, pp. 1-10. https://doi.org/10.1109/DSAA53316.2021.9564230
  23. K. Webster, M. Recasens, V. Axelrod, and J. Baldridge, "Mind the GAP: a balanced corpus of gendered ambiguous pronouns," Transactions of the Association for Computational Linguistics, vol. 6, pp. 605-617, 2018. https://doi.org/10.1162/tacl_a_00240