DOI QR코드

DOI QR Code

Progressive occupancy network for 3D reconstruction

3차원 형상 복원을 위한 점진적 점유 예측 네트워크

  • Kim, Yonggyu (KOREATECH (Korea University of Technology and Education)) ;
  • Kim, Duksu (KOREATECH (Korea University of Technology and Education))
  • 김용규 (한국기술교육대학교 컴퓨터공학부) ;
  • 김덕수 (한국기술교육대학교 컴퓨터공학부)
  • Received : 2021.06.11
  • Accepted : 2021.06.25
  • Published : 2021.07.23

Abstract

3D reconstruction means that reconstructing the 3D shape of the object in an image and a video. We proposed a progressive occupancy network architecture that can recover not only the overall shape of the object but also the local details. Unlike the original occupancy network, which uses a feature vector embedding information of the whole image, we extract and utilize the different levels of image features depending on the receptive field size. We also propose a novel network architecture that applies the image features sequentially to the decoder blocks in the decoder and improves the quality of the reconstructed 3D shape progressively. In addition, we design a novel decoder block structure that combines the different levels of image features properly and uses them for updating the input point feature. We trained our progressive occupancy network with ShapeNet. We compare its representation power with two prior methods, including prior occupancy network(ONet) and the recent work(DISN) that used different levels of image features like ours. From the perspective of evaluation metrics, our network shows better performance than ONet for all the metrics, and it achieved a little better or a compatible score with DISN. For visualization results, we found that our method successfully reconstructs the local details that ONet misses. Also, compare with DISN that fails to reconstruct the thin parts or occluded parts of the object, our progressive occupancy network successfully catches the parts. These results validate the usefulness of the proposed network architecture.

3차원 형상 복원(3D reconstruction)은 이미지 또는 영상 속 물체를 3차원 형상으로 복원하는 것을 말한다. 본 연구는 물체의 전반적 형상을 넘어 세부적인 모습까지 복원할 수 있는 표현력을 가진 3차원 형상 복원 네트워크인, 점진적 점유 네트워크를 제안한다. 본 연구가 제안하는 네트워크는 이미지 전체의 정보를 담고 있는 특징(feature)을 사용하는 기존 점유 네트워크와 달리, 수용 영역(receptive field)의 크기에 따라 다양한 수준의 이미지 특징을 추출해서 사용한다. 그리고, 다양한 수준의 이미지 특징을 디코더(decoder) 내 디코더 블록(decoder block)들에 순차적으로 반영하여, 형상 복원의 품질이 단계적으로 개선하는 네트워크 구조를 제안한다. 본 연구는 또한, 다양한 수준의 이미지 특징을 적절히 조합하여 사용하는 디코더 블록구조를 제안한다. 본 연구는 제안하는 네트워크의 성능 검증을 위해 ShapeNet 데이터 세트를 사용하였으며, 기존의 점유 네트워크(ONet) 및 다양한 수준의 이미지 특징을 사용하는 최신 연구(DISN)와 성능 비교하였다. 그 결과, 기존 점유 네트워크 대비 세 가지 검증 지표 모두에서 높은 성능을 달성하였으며, DISN과는 대등한 수준의 성능을 보여주었다. 그리고 복원 형상의 시각적 비교 결과, 본 연구의 점진적 점유 네트워크가 기존 점유 네트워크 대비, 물체의 세부 모습을 잘 복원하는 것을 확인하였다. 또한, DISN이 복원 실패한 물체의 얇은 부분 또는 이미지에서 가려진 부분을 본 연구의 네트워크는 잘 잡아내는 결과를 확인할 수 있었다. 이러한 결과는 본 연구가 제안하는 점진적 점유 네트워크의 유용성을 검증하는 결과다.

Keywords

Acknowledgement

이 논문은 2021년도 정부(교육부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업임(No.2021R1I1A3048263)

References

  1. A. M. Andrew, "Multiple view geometry in computer vision," Kybernetes, 2001.
  2. H. Fan, H. Su, and L. J. Guibas, "A point set generation network for 3d object reconstruction from a single image," in Proc. of the IEEE conference on computer vision and pattern recognition, 2017, pp. 605-613.
  3. C. Hane, S. Tulsiani, and J. Malik, "Hierarchical surface prediction for 3d object reconstruction," in International Conference on 3D Vision (3DV), 2017, pp. 412-420.
  4. N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang, "Pixel2mesh: Generating 3d mesh models from single rgb images," in Proc. of the European Conference on Computer Vision (ECCV), 2018, pp. 52-67.
  5. Y.-P. Xiao, Y.-K. Lai, F.-L. Zhang, C. Li, and L. Gao, "A survey on deep geometry learning: From a representation perspective," Computational Visual Media, vol. 6, no. 2, pp. 113-133, 2020. https://doi.org/10.1007/s41095-020-0174-8
  6. C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese, "3d-r2n2: A unified approach for single and multi-view 3d object reconstruction," in European conference on computer vision, 2016, pp. 628-644.
  7. P. Achlioptas, O. Diamanti, I. Mitliagkas, and L. Guibas, "Learning representations and generative models for 3d point clouds," in International conference on machine learning. PMLR, 2018, pp. 40-49.
  8. G. Yang, X. Huang, Z. Hao, M.-Y. Liu, S. Belongie, and B. Hariharan, "Pointflow: 3d point cloud generation with continuous normalizing flows," in Proc. of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4541-4550.
  9. W. Yifan, F. Serena, S. Wu, C. Oztireli, and O. Sorkine-Hornung, "Differentiable surface splatting for point-based geometry processing," ACM Transactions on Graphics (TOG), vol. 38, no. 6, pp. 1-14, 2019.
  10. H. Kato, Y. Ushiku, and T. Harada, "Neural 3d mesh renderer," in Proc. of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3907-3916.
  11. L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, "Occupancy networks: Learning 3d reconstruction in function space," in Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4460-4470.
  12. A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, and H. Su, "ShapeNet: An information-rich 3d model repository," arXiv preprint arXiv:1512.03012, 2015.
  13. Q. Xu, W. Wang, D. Ceylan, R. Mech, and U. Neumann, "DISN: Deep implicit surface network for high-quality single-view 3d reconstruction," in Advances in Neural Information Processing Systems, vol. 32, 2019.
  14. T. Groueix, M. Fisher, V. G. Kim, B. C. Russell, and M. Aubry, "A papier-mache approach to learning 3d surface generation," in Proc. of the IEEE conference on computer vision and pattern recognition, 2018, pp. 216-224.
  15. M. Tatarchenko, A. Dosovitskiy, and T. Brox, "Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs," in Proc. of the IEEE International Conference on Computer Vision, 2017, pp. 2088-2096.
  16. G. Riegler, A. Osman Ulusoy, and A. Geiger, "Octnet: Learning deep 3d representations at high resolutions," in Proc. of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3577-3586.
  17. J. Wu, C. Zhang, X. Zhang, Z. Zhang, W. T. Freeman, and J. B. Tenenbaum, "Learning shape priors for single-view 3d completion and reconstruction," in Proc. of the European Conference on Computer Vision (ECCV), 2018, pp. 646-662.
  18. W. E. Lorensen and H. E. Cline, "Marching cubes: A high resolution 3d surface construction algorithm," ACM siggraph computer graphics, vol. 21, no. 4, pp. 163-169, 1987. https://doi.org/10.1145/37402.37422
  19. S. Saito, Z. Huang, R. Natsume, S. Morishima, A. Kanazawa, and H. Li, "Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization," in Proc. of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2304-2314.
  20. J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Love-grove, "Deepsdf: Learning continuous signed distance functions for shape representation," in Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 165-174.
  21. K. Genova, F. Cole, D. Vlasic, A. Sarna, W. T. Freeman, and T. Funkhouser, "Learning shape templates with structured implicit functions," in Proc. of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7154-7164.