DOI QR코드

DOI QR Code

Material Image Classification using Normal Map Generation

Normal map 생성을 이용한 물질 이미지 분류

  • 남현길 (한양대학교 컴퓨터소프트웨어학과) ;
  • 김태현 (한양대학교 컴퓨터소프트웨어학과) ;
  • 박종일 (한양대학교 컴퓨터소프트웨어학과)
  • Received : 2021.12.02
  • Accepted : 2022.01.19
  • Published : 2022.01.30

Abstract

In this study, a method of generating and utilizing a normal map image used to represent the characteristics of the surface of an image material to improve the classification accuracy of the original material image is proposed. First of all, (1) to generate a normal map that reflects the surface properties of a material in an image, a U-Net with attention-R2 gate as a generator was used, and a Pix2Pix-based method using the generated normal map and the similarity with the original normal map as a reconstruction loss was used. Next, (2) we propose a network that can improve the accuracy of classification of the original material image by applying the previously created normal map image to the attention gate of the classification network. For normal maps generated using Pixar Dataset, the similarity between normal maps corresponding to ground truth is evaluated. In this case, the results of reconstruction loss function applied differently according to the similarity metrics are compared. In addition, for evaluation of material image classification, it was confirmed that the proposed method based on MINC-2500 and FMD datasets and comparative experiments in previous studies could be more accurately distinguished. The method proposed in this paper is expected to be the basis for various image processing and network construction that can identify substances within an image.

본 연구에서는 이미지 물질의 표면의 특성을 나타내는데 사용되는 노말 맵(normal map) 이미지를 생성하고, 이를 활용하여 원본 물질 이미지의 분류 정확도를 향상시키는 방법을 제안한다. 우선, (1) 이미지 내에서 물질의 표면 특성을 반영하고 있는 노말 맵을 생성하기 위해서 Generator로 Attention-R2 Gate를 적용한 U-Net을 사용하고, 생성된 노말 맵과 원본 노말 맵의 유사도를 Reconstruction loss로 활용한 Pix2Pix 기반의 방법을 사용하였다. 그 다음으로 (2) 앞서 만들어진 노말 맵 이미지를 분류 네트워크의 Attention Gate에 적용하여 원본 물질 이미지를 분류의 정확도를 개선할 수 있는 네트워크를 제안한다. 그리고 Pixar Dataset을 이용하여 생성된 노말 맵에 대해서, Ground Truth에 해당하는 노말 맵 사이의 유사도를 평가한다. 이 때, 유사도 측정 방식에 따라 다르게 적용된 reconstruction loss function의 결과를 비교한다. 또한 물질 이미지 분류에 대한 평가를 위해서 MINC-2500과 FMD 데이터셋을 기준으로 제안된 방법과 선행연구의 비교 실험을 통해 보다 정확하게 구분할 수 있음을 확인하였다. 본 논문에서 제안된 방법은 이미지 내에서 물질을 파악하는 할 수 있는 다양한 이미지 처리 및 네트워크 구축에 기반이 될 수 있을 것으로 기대된다.

Keywords

Acknowledgement

이 성과는 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임 (NRF-2019R1A4A1029800).

References

  1. He, Zhengyu. "Deep Learning in Image Classification: A Survey Report." 2020 2nd International Conference on Information Technology and Computer Application (ITCA). IEEE, pp. 174-177. 2020
  2. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems 25. pp. 1097-1105. 2012.
  3. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770-778. 2016.
  4. GAO, DUAN & Li, Xiao & Dong, Yue & Peers, Pieter & Xu, Kun & Tong, Xin. Deep inverse rendering for high-resolution SVBRDF estimation from an arbitrary number of images. ACM Transactions on Graphics. 38. pp.1-15. 2019. 10.1145/3306346.3323042
  5. Deschaintre, Valentin et al. "Single-image SVBRDF capture with a rendering-aware deep network." ACM Transactions on Graphics (TOG) 37. pp 1 - 15. 2018. https://doi.org/10.1145/3197517.3201378
  6. Kampouris, Christos, et al. "Fine-grained material classification using micro-geometry and reflectance." European Conference on Computer Vision. Springer, Cham, pp.778-792. 2016.
  7. Hyeongil Nam, and Jong-Il Park. " Normal map generation based on Pix2Pix for rendering fabric image", Proceedings of the Korean Society of Broadcast Engineers Conference. The Korean Society of Broadcast and Media Engineers. pp. 166-169. 2020.
  8. Isola, Phillip et al. "Image-to-Image Translation with Conditional Adversarial Networks." 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5967-5976. 2016.
  9. Oktay, Ozan, et al. "Attention u-net: Learning where to look for the pancreas." arXiv preprint arXiv:1804.03999 2018.
  10. So-hyun Lim and Jun-chul Chun " Image-to-Image Translation Based on U-Net with R2 and Attention" J. Internet Comput. Serv. 21.4. pp 9-16.2020 https://doi.org/10.7472/JKSII.2020.21.4.9
  11. Zuo, Qiang, Songyu Chen, and Zhifang Wang. "R2AU-Net: Attention Recurrent Residual Convolutional Neural Network for Multimodal Medical Image Segmentation." Security and Communication Networks 2021. 2021.
  12. Li, Yanchun, Nanfeng Xiao, and Wanli Ouyang. "Improved generative adversarial networks with reconstruction loss." Neurocomputing. 323. pp. 363-372. 2019. https://doi.org/10.1016/j.neucom.2018.10.014
  13. Shi, Haoyue, et al. "Loss Functions for Person Image Generation." BMVC. 2020.
  14. Ding, Keyan, et al. "Image quality assessment: Unifying structure and texture similarity." arXiv preprint arXiv:2004.07728. 2020.
  15. Huang, Yanping, et al. "Gpipe: Efficient training of giant neural networks using pipeline parallelism." Advances in neural information processing systems 32. pp.103-112. 2019.
  16. Pixar One Twenty Eight by Pixar Animation Studios, https://renderman.pixar.com/
  17. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. "Image quality assessment: from error visibility to structural similarity". IEEE transactions on image processing, 13(4):600, 2004. https://doi.org/10.1109/TIP.2003.819861
  18. ZHANG, Richard, et al. "The unreasonable effectiveness of deep features as a perceptual metric". In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586-595. 2018.
  19. Bell, Sean, et al. "Material recognition in the wild with the materials in context database." Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3479-3487. 2015.
  20. L. Sharan, R. Rosenholtz, and E. H. Adelson, "Accuracy and speed of material categorization in real-world images", Journal of Vision, vol. 14, no. 9, article 12, 2014
  21. Wang, Fei, et al. "Residual attention network for image classification." Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3156-3164. 2017.
  22. Xue, Jia, Hang Zhang, and Kristin Dana. "Deep texture manifold for ground terrain recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 558-567. 2018.
  23. Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929. 2020.
  24. Chen, Chun-Fu, Quanfu Fan, and Rameswar Panda. "Crossvit: Cross-attention multi-scale vision transformer for image classification." arXiv preprint arXiv:2103.14899. 2021.