Efficient Inference of Image Objects using Semantic Segmentation

Lim, Heonyeong;Lee, Yurim;Jee, Minkyu;Go, Myunghyun;Kim, Hakdong;Kim, Wonil;

doi:10.5909/JBE.2019.24.1.67

Journal of Broadcast Engineering (방송공학회논문지)

Volume 24 Issue 1
/
Pages.67-76
/
2019
/
1226-7953(pISSN)
/
2287-9137(eISSN)

The Korean Institute of Broadcast and Media Engineers (한국방송∙미디어공학회)

DOI QR Code

Efficient Inference of Image Objects using Semantic Segmentation

시멘틱 세그멘테이션을 활용한 이미지 오브젝트의 효율적인 영역 추론

Lim, Heonyeong (Department of Digital Contents, Sejong University) ;
Lee, Yurim (Department of Artificial Intelligence and Linguistic Engineering, Sejong University) ;
Jee, Minkyu (Department of Software Convergence, Sejong University) ;
Go, Myunghyun (Department of Digital Contents, Sejong University) ;
Kim, Hakdong (Department of Digital Contents, Sejong University) ;
Kim, Wonil (Department of Software, Sejong University)

임헌영 (세종대학교 디지털콘텐츠학과) ;
이유림 (세종대학교 인공지능언어공학과) ;
지민규 (세종대학교 소프트웨어융합학과) ;
고명현 (세종대학교 디지털콘텐츠학과) ;
김학동 (세종대학교 디지털콘텐츠학과) ;
김원일 (세종대학교 소프트웨어학과)

Received : 2018.11.15
Accepted : 2018.12.31
Published : 2019.01.30

https://doi.org/10.5909/JBE.2019.24.1.67 Citation PDF KSCI KPUBS HTML

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we propose an efficient object classification method based on semantic segmentation for multi-labeled image data. In addition to various pixel unit information and processing techniques such as color information, contour, contrast, and saturation included in image data, a detailed region in which each object is located is extracted as a meaningful unit and the experiment is conducted to reflect the result in the inference. We use a neural network that has been proven to perform well in image classification to understand which object is located where image data containing various class objects are located. Based on these researches, we aim to provide artificial intelligence services that can classify real-time detailed areas of complex images containing various objects in the future.

본 연구에서는 다중 라벨링이 되어 있는 이미지 데이터를 대상으로 시멘틱 세그멘테이션을 활용한 효율적인 오브젝트별 영역 분류 기법을 연구한다. 이미지 데이터에 포함된 색상 정보, 윤곽선, 명암, 채도 등 다양한 픽셀 단위 정보와 프로세싱 기법뿐만 아니라 각 오브젝트들이 위치한 세부 영역을 의미 있는 단위로 추출하여 추론 결과에 반영하는 실험을 진행하고 그 결과에 대해 논의한다. 이미지 분류에서 훌륭한 성능을 검증받은 뉴럴 네트워크를 활용하여 비정형성이 심하고 다양한 클래스 오브젝트가 포함된 이미지 데이터를 대상으로 어떤 오브젝트가 어디에 위치하였는지 파악하는 작업을 진행한다. 이러한 연구를 기반으로 향후 다양한 오브젝트가 포함된 복잡한 이미지의 실시간 세부 영역 분류를 진행하는 인공지능 서비스 제공을 목표로 한다.

Keywords

BSGHC3_2019_v24n1_67_f0001.png 이미지

그림 1. 컨볼루셔널 뉴럴 네트워크의 아키텍처 예시 (이미지 분류) Fig. 1. Example of convolutional neural networks (Image classification)

BSGHC3_2019_v24n1_67_f0002.png 이미지

그림 2. VGG11 아키텍처를 기반으로 한 U-net 아키텍처 (시멘틱 세그멘테이션)^[8] Fig. 2. U-net architecture based on VGG11 model (Semantic segmentation)^[8]

BSGHC3_2019_v24n1_67_f0003.png 이미지

그림 3. 다운샘플링 과정에서 패션 아이템의 특징을 잃은 사례 (FCN 기반 아키텍처) Fig. 3. Case that lost feature of fashion item in downsampling process (FCN based model)

BSGHC3_2019_v24n1_67_f0004.png 이미지

그림 4. 밀집 블록(Dense block)에서의 밀집 연결성 개념 Fig. 4. Dense connectivity concept of dense block

BSGHC3_2019_v24n1_67_f0005.png 이미지

그림 5. 데이터 세트 예시 Fig. 5. Sample of data set

BSGHC3_2019_v24n1_67_f0006.png 이미지

그림 6. 시멘틱 세그멘테이션 결과 예시 Fig. 6. Examples of semantic segmentation

BSGHC3_2019_v24n1_67_f0007.png 이미지

그림 7. FCN 기반 아키텍처와의 결과 비교 Fig. 7. Comparison of results with FCN-based architecture

BSGHC3_2019_v24n1_67_f0008.png 이미지

그림 8. 배경과 인물의 영역 분류에는 성공하였으나, 오브젝트의 디테일을 추론하지 못한 사례 Fig. 8. Successful classification of the background and person, but the case of not inferring the detail of the object

표 1. Fully convolutional densenet layers Table 1. Fully convolutional densenet layers

BSGHC3_2019_v24n1_67_t0001.png 이미지

표 2. 데이터 세트 명세 Table 2. Data Set Description

BSGHC3_2019_v24n1_67_t0002.png 이미지

표 3. 데이터 세트별 학습 결과 Table 3. Learning results by data set

BSGHC3_2019_v24n1_67_t0003.png 이미지

표 4. 학습 관련 파라미터 Table 4. Learning parameters

BSGHC3_2019_v24n1_67_t0004.png 이미지

References

M. Thoma, "A survey of semantic segmentation," CoRR, vol.abs/1602.06541, 2016, [Online]. Available: http://arxiv.org/abs/1602. 06541
Krizhevsky, A., Sutskever, I., and Hinton, G. E. "ImageNet classification with deep convolutional neural net-works." In NIPS, 2012, pp. 1106-1114
J. Long, E. Shelhamer, and T. Darrell. "Fully convolutional networks for semantic segmentation." In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431-3440
X. Liang, L. Lin, W. Yang, P. Luo, J. Huang, and S. Yan. "Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval." In IEEE Transactions on Multimedia, 2016. pp. 1175-1186 https://doi.org/10.1109/TMM.2016.2542983
G. Huang, Z. Liu, K. Q. Weinberger, and L.van derMaaten. "Densely connected convolutional networks." In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional networks for biomedical image segmentation," In Navab N., Hornegger J., Wells W., Frangi A. (eds) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Springer, Cham
Simon Jegou, M. Drozdzal, D. Vazquez, et al. "The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation." In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
V. Iglovikov and A. Shvets. "Ternausnet: U-net with vgg11 encoder pre-trained on imagenet for image segmentation." arXiv preprint arXiv:1801.05746, 2018.
S. Ioffe and C. Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." In ICML, 2015, pp. 446-456
Nair, V. and Hinton, G. E. (2010). "Rectified linear units improve restricted boltzmann machines." In Proc. 27th International Conference on Machine Learning, 2010
N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. "Dropout: a simple way to prevent neural networks from overfitting." In JMLR, 2014. pp. 1929-1958
R. Caruana, S. Lawrence, and L. Giles, "Overfitting in neural nets: Backpropagation, gradient, and early stopping," In Advances in Neural Information Processing Systems, 2001, vol. 13, pp. 402-408.
J. Brostow, J. Fauqueur, and R. Cipolla. "Semantic object classes in video: A high-definition ground truth database." In Pattern Recognition Letters, 2009, vol. 30, Issue 2, pp. 88-97 https://doi.org/10.1016/j.patrec.2008.04.005
J. Lafferty, A. McCallum, and F. Pereira. "Conditional random fields: Probabilistic models for segmenting and labeling sequence data." In Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001), pp. 282-289

Journal of Broadcast Engineering (방송공학회논문지)

Efficient Inference of Image Objects using Semantic Segmentation

시멘틱 세그멘테이션을 활용한 이미지 오브젝트의 효율적인 영역 추론

Abstract

Keywords

References

Detail Search