DOI QR코드

DOI QR Code

Semantic Object Segmentation Using Conditional Generative Adversarial Network with Residual Connections

잔차 연결의 조건부 생성적 적대 신경망을 사용한 시맨틱 객체 분할

  • Ibrahem, Hatem (School of Information and Communication Engineering, Chung-Buk National University) ;
  • Salem, Ahmed (School of Information and Communication Engineering, Chung-Buk National University) ;
  • Yagoub, Bilel (School of Information and Communication Engineering, Chung-Buk National University) ;
  • Kang, Hyun Su (School of Information and Communication Engineering, Chung-Buk National University) ;
  • Suh, Jae-Won (School of Electronics Engineering, Chung-Buk National University)
  • Received : 2022.11.22
  • Accepted : 2022.12.04
  • Published : 2022.12.31

Abstract

In this paper, we propose an image-to-image translation approach based on the conditional generative adversarial network for semantic segmentation. Semantic segmentation is the task of clustering parts of an image together which belong to the same object class. Unlike the traditional pixel-wise classification approach, the proposed method parses an input RGB image to its corresponding semantic segmentation mask using a pixel regression approach. The proposed method is based on the Pix2Pix image synthesis method. We employ residual connections-based convolutional neural network architectures for both the generator and discriminator architectures, as the residual connections speed up the training process and generate more accurate results. The proposed method has been trained and tested on the NYU-depthV2 dataset and could achieve a good mIOU value (49.5%). We also compare the proposed approach to the current methods in semantic segmentation showing that the proposed method outperforms most of those methods.

본 논문에서는 시맨틱 분할을 위한 조건부 생성적 적대 신경망 기반의 이미지 대 이미지 변환 접근법을 제안한다. 시맨틱 분할은 동일한 개체 클래스에 속하는 이미지 부분을 함께 클러스터링하는 작업이다. 기존의 픽셀별 분류 방식과 달리 제안하는 방식은 픽셀 회귀 방식을 사용하여 입력 RGB 이미지를 해당 시맨틱 분할 마스크로 구문 분석한다. 제안하는 방법은 Pix2Pix 이미지 합성 방식을 기반으로 하였다. 잔차 연결이 훈련 프로세스를 가속화하고 더 정확한 결과를 생성하므로 생성기 및 판별기 아키텍처 모두에 대해 잔여 연결 기반 컨볼루션 신경망 아키텍처를 사용하였다. 제안하는 방법은 NYU-depthV2 데이터셋를 이용하여 학습 및 테스트 되었으며 우수한 mIOU 값(49.5%)을 달성할 수 있었다. 또한 시맨틱 객체분할 실험에서 제안한 방법과 현재 방법을 비교하여 제안한 방법이 기존의 대부분의 방법들보다 성능이 우수함을 보였다.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1A2C1007571) and (No. 2022R1A5A8026986).

References

  1. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative Adversarial Networks," Communications of the ACM, vol. 63, no. 11, pp. 139-144, Nov. 2020. https://doi.org/10.1145/3422622
  2. A. Radford, L. Metz, and S. Chintala, "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks," arXiv:1511.06434, 2015.
  3. P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, "Image-to-Image Translation with Conditional Adversarial Networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu: HI, USA, pp. 1125-1134, 2017.
  4. J. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks," in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp. 2223-2232, 2017.
  5. J. Long, E. Shelhamer, and T. Darrell, "Fully Convolutional Networks for Semantic Segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston: MA, USA, pp. 3431-3440, 2015.
  6. V. Badrinarayanan, A. Kendall, and R. Cipolla "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481-2495, Jan. 2017. https://doi.org/10.1109/TPAMI.2016.2644615
  7. O. Ronneberger, P. Fischer, and T. Brox. "U-Net: Convolutional Networks for Biomedical Image Segmentation," in International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, pp. 234-241, 2015.
  8. L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, pp. 834-848, Apr. 2017. https://doi.org/10.1109/TPAMI.2017.2699184
  9. H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid Scene Parsing Network," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2882-2890, Honolulu: HI, USA, 2017.
  10. J. Lafferty, A. McCallum, and F. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data," in International Conference on Machine Learning, San Francisco: CA, USA, 2021.
  11. H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, "ICNet for Real-Time Semantic Segmentation on High-Resolution Images," in Proceedings of the European conference on Computer Vision, Muich, Germany, pp. 405-420, 2018.
  12. Q. Li, W. Yang, W. Liu, Y. Yu, and S. He, "From Contexts to Locality: Ultra-high Resolution Image Segmentation via Locality-aware Contextual Correlation," in Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal: QC, Canada, pp. 7252-7261, 2021.
  13. T. Shen, Y. Zhang, L. Qi, and J. Kuen, X. Xie, J. Wu, Z. Lin, and J. Jia, "High Quality Segmentation for Ultra High-resolution Images," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans: LA, USA, pp. 1310-1319, 2022.
  14. T. Karras, S. Laine, and T. Aila, "A Style-Based Generator Architecture for Generative Adversarial Networks," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach: CA, USA, pp. 4401-4410, 2019.
  15. H. Ibrahem, A. Salem, and H. S. Kang, "Exploration of Semantic Label Decomposition and Dataset Size in Semantic Indoor Scenes Synthesis via Optimized Residual Generative Adversarial Networks," Sensors, vol. 22, no. 21, p. 8306, 2022. https://doi.org/10.3390/s22218306
  16. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, "Indoor Segmentation and Support Inference from RGBD Images," in European Conference on Computer Vision, Florence, Itlay, pp. 746-760, 2012.
  17. G. Lin, A. Milan, C. Shen, and I. Reid, "RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu: HI, USA, pp. 1925-1934, 2017.
  18. D. Lin, G. Chen, D. Cohen-Or, P. Heng, and H. Huang, "Cascaded Feature Network for Semantic Segmentation of RGB-D Images," in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp. 1311-1319, 2017.
  19. P. Bilinski and V. Prisacariu, "Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation," in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, Salt Lake City,: UT, USA, pp. 6596-6605, 2018.
  20. X. Hu, K. Yang, L. Fei, and K. Wang, "ACNET: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation," in IEEE International Conference on Image Processing, Taipei, Taiwan, pp. 1440-1444, 2019.
  21. S. Vandenhende, S. Georgoulis, and L. V. Gool, "MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning," in European Conference on Computer Vision, Glasgow, UK, pp. 527-543, 2020.
  22. C. Du, T. Li, Y. Liu, Z. Wen, T. Hua, Y. Wang, and H. Zhao, "Improving Multi-Modal Learning with Uni-Modal Teachers," arXiv preprint arXiv:2106.11059, 2021.