DOI QR코드

DOI QR Code

Grad-CAM based deep learning network for location detection of the main object

주 객체 위치 검출을 위한 Grad-CAM 기반의 딥러닝 네트워크

  • Kim, Seon-Jin (Department of Information and Communication Engineering, Chung-buk National University) ;
  • Lee, Jong-Keun (Department of Information and Communication Engineering, Chung-buk National University) ;
  • Kwak, Nae-Jung (Department of Information and Communication Engineering, Chung-buk National University) ;
  • Ryu, Sung-Pil (Department of Information and Communication Engineering, Chung-buk National University) ;
  • Ahn, Jae-Hyeong (Department of Information and Communication Engineering, Chung-buk National University)
  • Received : 2019.12.13
  • Accepted : 2019.12.24
  • Published : 2020.02.29

Abstract

In this paper, we propose an optimal deep learning network architecture for main object location detection through weak supervised learning. The proposed network adds convolution blocks for improving the localization accuracy of the main object through weakly-supervised learning. The additional deep learning network consists of five additional blocks that add a composite product layer based on VGG-16. And the proposed network was trained by the method of weakly-supervised learning that does not require real location information for objects. In addition, Grad-CAM to compensate for the weakness of GAP in CAM, which is one of weak supervised learning methods, was used. The proposed network was tested through the CUB-200-2011 data set, we could obtain 50.13% in top-1 localization error. Also, the proposed network shows higher accuracy in detecting the main object than the existing method.

본 논문에서는 약한 지도학습을 통한 주 객체 위치 검출을 위한 최적의 딥러닝 네트워크 구조를 제안한다. 제안된 네트워크는 약한 지도학습을 통한 주 객체의 위치 검출 정확도를 향상시키기 위해 컨벌루션 블록을 추가하였다. 추가적인 딥러닝 네트워크는 VGG-16을 기반으로 합성곱 층을 더해주는 5가지 추가적인 블록으로 구성되며 객체의 실제 위치 정보가 필요하지 않는 약한 지도 학습의 방법으로 학습하였다. 또한 객체의 위치 검출에는 약한 지도학습의 방법 중, CAM에서 GAP이 필요하다는 단점을 보완한 Grad-CAM을 사용하였다. 제안한 네트워크는 CUB-200-2011 데이터 셋을 이용하여 성능을 테스트하였으며 Top-1 Localization Error를 산출하였을 때 50.13%의 결과를 얻을 수 있었다. 또한 제안한 네트워크는 기존의 방법보다 주 객체를 검출하는데 더 높은 정확도를 보인다.

Keywords

References

  1. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. "ImageNet Large Scale Visual Recognition Challenge," arXiv:1409.0575v3, 2015.
  2. S. Ren, K. He, R. Girshick, and J. Sun. "Faster R-CNN: towards real-time object detection with region proposal networks," arXiv:1506.01497v3, 2016.
  3. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu, and A. C. Berg., "SSD: Single Shot MultiBox Detector," arXiv:1512.02325v5, 2016.
  4. J. Choe, and H. Shim, "ADL:Attention-based Dropout Layer for Weakly Supervised Object Localization," arXiv:1908.10028v1, 2019.
  5. Y. Wei, J. Feng, X. Liang, M. M. Cheng, Y. Zhao, and S. Yan, "Object region mining with adversarial erasing: A simple classification to semantic segmentation approach," arXiv:1703.08448v3, 2018.
  6. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. "Learning Deep Features for Discriminative Localization," arXiv:1512.04150, 2015.
  7. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization," arXiv: 1610.02391, 2016.
  8. K. K. Singh, and Y. J. Lee, "Hide-and-Seek: Forcing a network to be meticulous for weakly-supervised object and action localization," arXiv:1704.04232v2, 2017.
  9. M. Lin, Q. Chen, and S. Yan, "Network In Network," arXiv:1312.4400, 2013.
  10. K. Simonyan, and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv: 1409.1556, 2014.
  11. X. Zhang, Y. Wei, J. Feng, Y. Yang, and T. Huang, "Adversarial complementary learning for weakly supervised object localization," arXiv:1804.06962v1, 2018.
  12. C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, "The Caltech-UCSD Birds-200-2011 Dataset," California Institute of Technology, 2011.
  13. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," arXiv:1512.03385, 2015.
  14. J. Lee, E. Kim, S. Lee, J. Lee, and S. Yoon, "FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stochastic Inference," arXiv:1902.10421, 2019.