Generating adversarial examples on toxic comment detection

Son, Soohyun;Lee, Sangkyun;

doi:10.3745/PKIPS.y2019m10a.795

Proceedings of the Korea Information Processing Society Conference (한국정보처리학회:학술대회논문집)

2019.10a
/
Pages.795-797
/
2019
/
2005-0011(pISSN)
/
2671-7298(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Generating adversarial examples on toxic comment detection

악성 댓글 탐지기에 대한 대항 예제 생성

Son, Soohyun (Dept of Computer Engineering, Hanyang University) ;
Lee, Sangkyun (Dept of Computer Engineering, Hanyang University)

손수현 (한양대학교 컴퓨터공학과) ;
이상근 (한양대학교 컴퓨터공학과)

Published : 2019.10.30

https://doi.org/10.3745/PKIPS.y2019m10a.795 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we propose a method to generate adversarial examples for toxicity detection neural networks. Our dataset is represented by a one-hot vector and we constrain that only one character is allowed to be modified. The location to be changed is founded by the maximum area of input gradient, which represents the most affecting character the model to make decisions. Despite the fact that we have strong constraint compared to the image-based adversarial attack, we have achieved about 49% successful rate.

Proceedings of the Korea Information Processing Society Conference (한국정보처리학회:학술대회논문집)

Generating adversarial examples on toxic comment detection

악성 댓글 탐지기에 대한 대항 예제 생성

Abstract

Keywords

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)