DOI QR코드

DOI QR Code

Semantic Image Segmentation Combining Image-level and Pixel-level Classification

영상수준과 픽셀수준 분류를 결합한 영상 의미분할

  • Kim, Seon Kuk (Dept. of Electronic and Computer Engineering, Graduate School, Chonnam National University) ;
  • Lee, Chil Woo (Dept. of Electronic and Computer Engineering, Chonnam National University)
  • Received : 2018.08.09
  • Accepted : 2018.11.26
  • Published : 2018.12.31

Abstract

In this paper, we propose a CNN based deep learning algorithm for semantic segmentation of images. In order to improve the accuracy of semantic segmentation, we combined pixel level object classification and image level object classification. The image level object classification is used to accurately detect the characteristics of an image, and the pixel level object classification is used to indicate which object area is included in each pixel. The proposed network structure consists of three parts in total. A part for extracting the features of the image, a part for outputting the final result in the resolution size of the original image, and a part for performing the image level object classification. Loss functions exist for image level and pixel level classification, respectively. Image-level object classification uses KL-Divergence and pixel level object classification uses cross-entropy. In addition, it combines the layer of the resolution of the network extracting the features and the network of the resolution to secure the position information of the lost feature and the information of the boundary of the object due to the pooling operation.

Keywords

MTMDCW_2018_v21n12_1425_f0001.png 이미지

Fig. 1. The network architecture proposed in this paper. (a) Convolutional network for extracting features of an image, (b) a segmentation network for increasing the resolution of the lower resolution layer to the resolution of the original image, (c) a network for performing image level object classification, (d) a layer for combining with the layer of the convolution network in the segmentation network.

MTMDCW_2018_v21n12_1425_f0002.png 이미지

Fig. 2. A comparison of existing research results with images.

Table 1. Comparing Accuracy with Existing Studies

MTMDCW_2018_v21n12_1425_t0001.png 이미지

Table 2. Comparing the accuracy of each object

MTMDCW_2018_v21n12_1425_t0002.png 이미지

References

  1. S.K. Kim and H.B. Kang, "Semantic Segmentation of Indoor Scenes Using Depth Superpix," Journal of Korea Multimedia Society, Vol. 19, No. 3, pp. 531-538, 2016. https://doi.org/10.9717/kmms.2016.19.3.531
  2. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., "Going Ceeper with Convolutions," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.
  3. K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," Proceedings of ICLR, 2015, https://arxiv.org/pdf/1409.1556.pdf
  4. L. Ye, Z. Liu, and Y. Wang, "Learning Semantic Segmentation with Diverse Supervision," Proceeding of IEEE Winter Conference on Applications of Computer Vision, pp. 1461-1469, 2018.
  5. J. Jiang, Z. Zhang, Y. Huang, and L. Zheng, "Incorporating Depth into both CNN and CRF for Indoor Semantic Segmentation," Proceeding of IEEE International Conference on Software Engineering and Service Science, pp. 512-530, 2017.
  6. J. Liu, Y. Wang, and Y. Li, "Collaborative Deconvolutional Neural Networks for Joint Depth Estimation and Semantic Segmentation," IEEE Transactions on Neural Networks and Learning Systems, Vol. 29, No.11, pp. 1-12, 2018. https://doi.org/10.1109/TNNLS.2018.2880596
  7. K. Rakelly, E. Shelhamer, T. Darrell, A. Efros, and S. Levine, "Conditional Networks for Few-Shot Semantic Segmentation," Proceeding of 6th International Conference on Learning Representations, 2018.
  8. E. Shelhamer, J. Long, and T. Darrell, "Fully Convolutional Networks for Semantic Segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 4, pp. 640-651, 2017. https://doi.org/10.1109/TPAMI.2016.2572683
  9. V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 12, pp. 2481-2495, 2017. https://doi.org/10.1109/TPAMI.2016.2644615
  10. Wikipedia, https://en.wikipedia.org/wiki/Cross_entropy (accessed Sept., 11, 2018).
  11. Wikipedia, https://en.wikipedia.org/wiki/Kullback-Leibler_divergence (accessed Sept., 11, 2018).
  12. S. Song, S. P. Lichtenberg, and J. Xiao, "Sun Rgb-D: A Rgb-D Scene Understanding Benchmark Suite," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567-576, 2015.

Cited by

  1. 다중 경로 특징점 융합 기반의 의미론적 영상 분할 기법 vol.24, pp.1, 2021, https://doi.org/10.9717/kmms.2020.24.1.001