BERT-Based Logits Ensemble Model for Gender Bias and Hate Speech Detection

Sanggeon Yun;Seungshik Kang;Hyeokman Kim;

doi:10.3745/JIPS.04.0287

Journal of Information Processing Systems

제19권5호
/
Pages.641-651
/
2023
/
1976-913X(pISSN)
/
2092-805X(eISSN)

한국정보처리학회 (Korea Information Processing Society)

DOI QR Code

BERT-Based Logits Ensemble Model for Gender Bias and Hate Speech Detection

Sanggeon Yun (Dept. of Computer Science, Kookmin University) ;
Seungshik Kang (Dept. of Computer Science, Kookmin University) ;
Hyeokman Kim (Dept. of Computer Science, Kookmin University)

투고 : 2022.09.06
심사 : 2023.01.29
발행 : 2023.10.31

https://doi.org/10.3745/JIPS.04.0287 인용 PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Malicious hate speech and gender bias comments are common in online communities, causing social problems in our society. Gender bias and hate speech detection has been investigated. However, it is difficult because there are diverse ways to express them in words. To solve this problem, we attempted to detect malicious comments in a Korean hate speech dataset constructed in 2020. We explored bidirectional encoder representations from transformers (BERT)-based deep learning models utilizing hyperparameter tuning, data sampling, and logits ensembles with a label distribution. We evaluated our model in Kaggle competitions for gender bias, general bias, and hate speech detection. For gender bias detection, an F1-score of 0.7711 was achieved using an ensemble of the Soongsil-BERT and KcELECTRA models. The general bias task included the gender bias task, and the ensemble model achieved the best F1-score of 0.7166.

키워드

과제정보

This study was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2021R1F1A1061433) and partly by the Synapsoft ATC+ grant funded by the Ministry of Trade, Industry and Energy (MOTIE, Korea).

참고문헌

A. Schmidt and M. Wiegand, "A survey on hate speech detection using natural language processing," in Proceedings of the 5th International Workshop on Natural Language Processing for Social Media, Valencia, Spain, 2017, pp. 1-10. https://doi.org/10.18653/v1/w17-1101
H. Zhong, H. Li, A. C. Squicciarini, S. M. Rajtmajer, C. Griffin, D. J. Miller, and C. Caragea, "Content-driven detection of cyberbullying on the Instagram social network," in Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), New York, NY, 2016, pp. 3952-3958.
J. H. Park and P. Fung, "One-step and two-step classification for abusive language detection on twitter," in Proceedings of the 1st Workshop on Abusive Language Online, Vancouver, Canada, 2017, pp. 41-45. https://doi.org/10.18653/v1/w17-3006
H. Mulki, H. Haddad, C. B. Ali, and H. Alshabani, "L-HSAB: a Levantine twitter dataset for hate speech and abusive language," in Proceedings of the 3rd Workshop on Abusive Language Online, Florence, Italy, 2019, pp. 111-118. https://doi.org/10.18653/v1/W19-3512
J. Zhao, T. Wang, M. Yatskar, V. Ordonez, and K. W. Chang, "Men also like shopping: reducing gender bias amplification using corpus-level constraints," in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, Denmark, 2017, pp. 2979-2989. https://doi.org/10.18653/v1/d17-1323
S. Kiritchenko and S. M. Mohammad, "Examining gender and race bias in two hundred sentiment analysis systems," in Proceedings of the 7th Joint Conference on Lexical and Computational Semantics, New Orleans, LA, 2018, pp. 43-53. https://doi.org/10.18653/v1/s18-2005
K. Lu, P. Mardziel, F. Wu, P. Amancharla, and A. Datta, "Gender bias in neural natural language processing," Logic, Language, and Security. Cham, Switzerland: Springer, 2020, pp. 189-202. https://doi.org/10.1007/978-3-030-62077-6_14
Z. Ahmed, B. Vidgen, and S. A. Hale, "Tackling racial bias in automated online hate detection: towards fair and accurate detection of hateful users with geometric deep learning," EPJ Data Science, vol. 11, article no. 8, 2022. https://doi.org/10.1140/epjds/s13688-022-00319-9
C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang, "Abusive language detection in online user content," in Proceedings of the 25th International Conference on World Wide Web, Geneva, Switzerland, 2016, pp. 145-153. https://doi.org/10.1145/2872427.2883062
H. Rizwan, M. H. Shakeel, and A. Karim, "Hate-speech and offensive language detection in roman Urdu," in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Virtual Event, 2020, pp. 2512-2522. http://dx.doi.org/10.18653/v1/2020.emnlp-main.197
P. Chiril, E. W. Pamungkas, F. Benamara, V. Moriceau, and V. Patti, "Emotionally informed hate speech detection: a multi-target perspective," Cognitive Computation, vol. 14, pp. 322-352, 2022. https://doi.org/10.1007/s12559-021-09862-5
P. Fortuna, J. Soler-Company, and L. Wanner, "How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets?," Information Processing & Management, vol. 58, no. 3, article no. 102524, 2021. https://doi.org/10.1016/j.ipm.2021.102524
N. S. Mullah and W. M. N. W. Zainon, "Advances in machine learning algorithms for hate speech detection in social media: a review," IEEE Access, vol. 9, pp. 88364-88376, 2021. https://doi.org/10.1109/ACCESS.2021.3089515
P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, "Deep learning for hate speech detection in tweets,' in Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, 2017, pp. 759-760. https://doi.org/10.1145/3041021.3054223
R. Alshalan and H. Al-Khalifa, "A deep learning approach for automatic hate speech detection in the saudi twittersphere," Applied Sciences, vol. 10, no. 23, article no. 8614, 2020. https://doi.org/10.3390/app10238614
O. Sharif and M. M. Hoque, "Tackling cyber-aggression: Identification and fine-grained categorization of aggressive texts on social media using weighted ensemble of transformers," Neurocomputing, vol. 490, pp. 462-481, 2022. https://doi.org/10.1016/j.neucom.2021.12.022
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "BERT: pre-training of deep bidirectional transformers for language understanding," 2018 [Online]. Available: https://arxiv.org/abs/1810.04805.
J. Moon, W. I. Cho, and J. Lee, "BEEP! Korean corpus of online news comments for toxic speech detection," in Proceedings of the 8th International Workshop on Natural Language Processing for Social Media, Virtual Event, 2020, pp. 25-31. https://doi.org/10.18653/v1/2020.socialnlp-1.4
W. I. Cho, J. W. Kim, S. M. Kim, and N. S. Kim, "On measuring gender bias in translation of gender-neutral pronouns," in Proceedings of the 1st Workshop on Gender Bias in Natural Language Processing, Florence, Italy, 2019, pp. 173-181. https://doi.org/10.18653/v1/W19-3824
W. I. Cho and J. Moon, "A study on the construction of Korean Hate speech corpus: based on the attributes of online toxic comments," in Proceedings of Annual Conference on Human and Language Technology, Virtual Event, 2020, pp. 298-303.
S. Zimmerman, U. Kruschwitz, and C. Fox, "Improving hate speech detection with deep learning ensembles," in Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC), Miyazaki, Japan, 2018, pp. 2546-2553.
M. R. Karim, S. K. Dey, T. Islam, S. Sarker, M. H. Menon, K. Hossain, M. A. Hossain, and S. Decker, "DeepHateExplainer: explainable hate speech detection in under-resourced Bengali language," in Proceedings of 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), Porto, Portugal, 2021, pp. 1-10. https://doi.org/10.1109/DSAA53316.2021.9564230
K. Webster, M. Recasens, V. Axelrod, and J. Baldridge, "Mind the GAP: a balanced corpus of gendered ambiguous pronouns," Transactions of the Association for Computational Linguistics, vol. 6, pp. 605-617, 2018. https://doi.org/10.1162/tacl_a_00240

Journal of Information Processing Systems

BERT-Based Logits Ensemble Model for Gender Bias and Hate Speech Detection

초록

키워드

과제정보

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)