DOI QR코드

DOI QR Code

Generating adversarial examples on toxic comment detection

악성 댓글 탐지기에 대한 대항 예제 생성

  • Son, Soohyun (Dept of Computer Engineering, Hanyang University) ;
  • Lee, Sangkyun (Dept of Computer Engineering, Hanyang University)
  • 손수현 (한양대학교 컴퓨터공학과) ;
  • 이상근 (한양대학교 컴퓨터공학과)
  • Published : 2019.10.30

Abstract

In this paper, we propose a method to generate adversarial examples for toxicity detection neural networks. Our dataset is represented by a one-hot vector and we constrain that only one character is allowed to be modified. The location to be changed is founded by the maximum area of input gradient, which represents the most affecting character the model to make decisions. Despite the fact that we have strong constraint compared to the image-based adversarial attack, we have achieved about 49% successful rate.

Keywords