DOI QR코드

DOI QR Code

Detecting Adversarial Example Using Ensemble Method on Deep Neural Network

딥뉴럴네트워크에서의 적대적 샘플에 관한 앙상블 방어 연구

  • 권현 (육군사관학교 전자공학과) ;
  • 윤준혁 (서울대학교 전기정보공학부) ;
  • 김준섭 (육군사관학교 전자공학과) ;
  • 박상준 (육군사관학교 전자공학과) ;
  • 김용철 (육군사관학교 전자공학과)
  • Received : 2021.04.15
  • Accepted : 2021.06.28
  • Published : 2021.06.30

Abstract

Deep neural networks (DNNs) provide excellent performance for image, speech, and pattern recognition. However, DNNs sometimes misrecognize certain adversarial examples. An adversarial example is a sample that adds optimized noise to the original data, which makes the DNN erroneously misclassified, although there is nothing wrong with the human eye. Therefore studies on defense against adversarial example attacks are required. In this paper, we have experimentally analyzed the success rate of detection for adversarial examples by adjusting various parameters. The performance of the ensemble defense method was analyzed using fast gradient sign method, DeepFool method, Carlini & Wanger method, which are adversarial example attack methods. Moreover, we used MNIST as experimental data and Tensorflow as a machine learning library. As an experimental method, we carried out performance analysis based on three adversarial example attack methods, threshold, number of models, and random noise. As a result, when there were 7 models and a threshold of 1, the detection rate for adversarial example is 98.3%, and the accuracy of 99.2% of the original sample is maintained.

딥뉴럴네트워크는 이미지 인식, 음성 인식, 패턴 인식 등에 좋은 성능을 보여주고 있는 대표적인 딥러닝모델 중에 하나이다. 하지만 이러한 딥뉴럴네트워크는 적대적 샘플을 오인식하는 취약점이 있다. 적대적 샘플은 원본 데이터에 최소한의 노이즈를 추가하여 사람이 보기에는 이상이 없지만 딥뉴럴네트워크가 잘못 인식 하게 하는 샘플을 의미한다. 이러한 적대적 샘플은 딥뉴럴네트워크를 활용하는 자율주행차량이나 의료사업에서 차량 표지판 오인식이나 환자 진단의 오인식을 일으키면 큰 사고가 일어나기 때문에 적대적 샘플 공격에 대한 방어연구가 요구된다. 본 논문에서는 여러 가지 파라미터를 조절하여 적대적 샘플에 대한 앙상블 방어방법을 실험적으로 분석하였다. 적대적 샘플의 생성방법으로 fast gradient sign method, DeepFool method, Carlini & Wanger method을 이용하여 앙상블 방어방법의 성능을 분석하였다. 실험 데이터로 MNIST 데이터셋을 사용하였으며, 머신러닝 라이브러리로는 텐서플로우를 사용하였다. 실험방법의 각 파라미터들로 3가지 적대적 샘플 공격방법, 적정기준선, 모델 수, 랜덤노이즈에 따른 성능을 분석하였다. 실험결과로 앙상블 방어방법은 모델수가 7이고 적정기준선이 1일 때, 적대적 샘플에 대한 탐지 성공률 98.3%이고 원본샘플의 99.2% 정확도를 유지하는 성능을 보였다.

Keywords

Acknowledgement

본 논문은 화랑대연구소의 2021년도(21-군학-3) 저술활동비 지원을 받아 연구되었음.

References

  1. J. Schmidhuber, "Deep learning in neural networks: An overview," Neural Netw., vol. 61, pp. 85-117, Jan. 2015. https://doi.org/10.1016/j.neunet.2014.09.003
  2. Kleesiek, Jens, et al. "Deep MRI brain extraction: A 3D convolutional neural network for skull stripping." NeuroImage 129 (2016): 460-469. https://doi.org/10.1016/j.neuroimage.2016.01.024
  3. Barreno, Marco, et al. "The security of machine learning." Machine Learning 81.2 (2010): 121-148. https://doi.org/10.1007/s10994-010-5188-5
  4. Biggio, Battista, Blaine Nelson, and Pavel Laskov. "Poisoning attacks against support vector machines." arXiv preprint arXiv:1206.6389 (2012).
  5. C. Szegedy,W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus, "Intriguing properties of neural networks," in Proc. 2nd Int. Conf. Learn. Represent. (ICLR), Banff, AB, Canada, Apr. 2014.
  6. He, Warren, et al. "Adversarial example defense: Ensembles of weak defenses are not strong." 11th {USENIX} Workshop on Offensive Technologies ({WOOT} 17). 2017.
  7. Xu, Weilin, David Evans, and Yanjun Qi. "Feature squeezing: Detecting adversarial examples in deep neural networks." arXiv preprint arXiv:1704.01155 (2017).
  8. Tramer, Florian, et al. "Ensemble adversarial training: Attacks and defenses." arXiv preprint arXiv:1705.07204 (2017).
  9. Kurakin, Alexey, Ian Goodfellow, and Samy Bengio. "Adversarial machine learning at scale." arXiv preprint arXiv:1611.01236 (2016).
  10. Moosavi-Dezfooli, Seyed-Mohsen, Alhussein Fawzi, and Pascal Frossard. "Deepfool: a simple and accurate method to fool deep neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  11. Carlini, Nicholas, and David Wagner. "Towards evaluating the robustness of neural networks." 2017 ieee symposium on security and privacy (sp). IEEE, 2017.
  12. Y. LeCun, C. Cortes, and C. J. Burges. (2010). Mnist Handwritten Digit Database. AT&T Labs. [Online]. Available: http://yann.lecun.com/exdb/mnist
  13. Papernot, Nicolas, et al. "Distillation as a defense to adversarial perturbations against deep neural networks." 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 2016.
  14. Nasr, George E., E. A. Badr, and C. Joun. "Cross entropy error function in neural networks: Forecasting gasoline demand." FLAIRS conference. 2002.
  15. Abadi, Martin, et al. "Tensorflow: A system for large-scale machine learning." 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 2016.
  16. Li, Jiahao, et al. "Fully connected network-based intra prediction for image coding." IEEE Transactions on Image Processing 27.7 (2018): 3236-3247. https://doi.org/10.1109/TIP.2018.2817044
  17. Kwon, Hyun, et al. "Classification score approach for detecting adversarial example in deep neural network." Multimedia Tools and Applications 80.7 (2021): 10339-10360. https://doi.org/10.1007/s11042-020-09167-z
  18. Kwon, Hyun, et al. "Selective audio adversarial example in evasion attack on speech recognition system." IEEE Transactions on Information Forensics and Security 15 (2019): 526-538. https://doi.org/10.1109/tifs.2019.2925452
  19. Kwon, Hyun. "Friend-Guard Textfooler Attack on Text Classification System." IEEE Access (2021).
  20. Kwon, Hyun. "Detecting Backdoor Attacks via Class Difference in Deep Neural Networks." IEEE Access 8 (2020): 191049-191056. https://doi.org/10.1109/access.2020.3032411
  21. Kwon, Hyun, Hyunsoo Yoon, and Ki-Woong Park. "Multi-targeted backdoor: Indentifying backdoor attack for multiple deep neural networks." IEICE Transactions on Information and Systems 103.4 (2020): 883-887.