DOI QR코드

DOI QR Code

ASPPMVSNet: A high-receptive-field multiview stereo network for dense three-dimensional reconstruction

  • Saleh, Saeed (Department of Computer Science and Engineering, Sogang University) ;
  • Sungjun, Lee (Immersive Media Section, Electronics and Telecommunications Research Institute) ;
  • Yongju, Cho (Department of Computer Science and Engineering, Sogang University) ;
  • Unsang, Park (Department of Computer Science and Engineering, Sogang University)
  • Received : 2021.08.31
  • Accepted : 2022.03.29
  • Published : 2022.12.10

Abstract

The learning-based multiview stereo (MVS) methods for three-dimensional (3D) reconstruction generally use 3D volumes for depth inference. The quality of the reconstructed depth maps and the corresponding point clouds is directly influenced by the spatial resolution of the 3D volume. Consequently, these methods produce point clouds with sparse local regions because of the lack of the memory required to encode a high volume of information. Here, we apply the atrous spatial pyramid pooling (ASPP) module in MVS methods to obtain dense feature maps with multiscale, long-range, contextual information using high receptive fields. For a given 3D volume with the same spatial resolution as that in the MVS methods, the dense feature maps from the ASPP module encoded with superior information can produce dense point clouds without a high memory footprint. Furthermore, we propose a 3D loss for training the MVS networks, which improves the predicted depth values by 24.44%. The ASPP module provides state-of-the-art qualitative results by constructing relatively dense point clouds, which improves the DTU MVS dataset benchmarks by 2.25% compared with those achieved in the previous MVS methods.

Keywords

Acknowledgement

This work was supported by the Electronics and Telecommunications Research Institute (ETRI) grant by the Korean government (22ZH1210, fundamental media contents technologies for hyper-realistic media space).

References

  1. J. L. Schonberger and J.-M. Frahm, Structure-from-motion revisited, (IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA), June 2016. https://doi.org/10.1109/CVPR.2016.445
  2. Y. Furukawa and C. Hernandez, Multi-View Stereo: A Tutorial, CGV, 9 (2015), no. 1-2, 1-148. https://doi.org/10.1561/0600000052
  3. X. Gu, Z. Fan, S. Zhu, Z. Dai, F. Tan, and P. Tan, Cascade cost volume for high-resolution multi-view stereo and stereo matching, (IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA), June 2020. https://doi.org/10.1109/cvpr42600.2020.00257
  4. S. Im, H. G. Jeon, S. Lin, and I. S. Kweon, DPSNet: End-to-End Deep Plane Sweep Stereo, arXiv preprint, May 2019. https://doi.org/10.48550/arXiv.1905.00538
  5. Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan, MVSNet: Depth inference for unstructured multi-view stereo, (European Conference Computer Vision), Munich, Germany, 2018. https://doi.org/10.1007/978-3-030-01237-3_47
  6. K. Luo, T. Guan, L. Ju, H. Huang, and Y. Luo, P-MVSNet: Learning patch-wise matching confidence aggregation for multi-view stereo, (IEEE/CVF International Conference on Computer Vision, Seoul, Rep. of Korea), 2019. https://doi.org/10.1109/iccv.2019.01055
  7. Y. Yao, Z. Luo, S. Li, T. Shen, T. Fang, and L. Quan, Recurrent MVSNet for high-resolution multi-view stereo depth inference, (IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA), 2019. https://doi.org/10.1109/cvpr.2019.00567
  8. R. Chen, S. Han, J. Xu, and H. Su, Point-based multi-view stereo network, (IEEE/CVF International Conference on Computer Vision, Seoul, Rep. of Korea), 2019. https://doi.org/10.1109/iccv.2019.00162
  9. Y. Wei, J. Feng, X. Liang, M.-M. Cheng, Y. Zhao, and S. Yan, Object region mining with adversarial erasing: A simple classification to semantic segmentation approach, (IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA), 2017. https://doi.org/10.1109/cvpr.2017.687
  10. Y. Wei, X. Liang, Y. Chen, X. Shen, M.-M. Cheng, J. Feng, Y. Zhao, and S. Yan, STC: A simple to complex framework for weakly supervised semantic segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 39 (2017), no. 11, 2314-2320. https://doi.org/10.1109/TPAMI.2016.2636150
  11. L.-C. Chen, Y. Yang, J. Wang, W. Xu, and A. L. Yuille, Attention to scale: Scale-aware semantic image segmentation, (IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas NV, USA), 2016. https://doi.org/10.1109/cvpr.2016.396
  12. G. Papandreou, I. Kokkinos, and P.-A. Savalle, Modeling local and global deformations in deep learning: Epitomic convolution, multiple instance learning, and sliding window detection, (IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA), 2015. https://doi.org/10.1109/cvpr.2015.7298636
  13. I. Kreso, I. Kreso, D. Causevic, J. Krapac, and S. Segvic, Convolutional scale invariance for semantic segmentation, (Conference Proceedings Pattern Recognition, Hannover, Germany), 2016. https://doi.org/10.1007/978-3-319-45886-1_6
  14. I. Kokkinos, Pushing the boundaries of boundary detection using deep learning, arXiv Preprint, Jan. 2016. https://doi.org/10.48550/arXiv.1511.07386
  15. G. Ghiasi and C. C. Fowlkes, Laplacian pyramid reconstruction and refinement for semantic segmentation, (Proc. European Conference on Computer Vision, Amsterdam, Netherlands), Oct. 2016. https://doi.org/10.1007/978-3-319-46487-9_32
  16. J. Cao, Y. Pang, and X. Li, Triply supervised decoder networks for joint detection and segmentation, (IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA), 2019. https://doi.org/10.1109/cvpr.2019.00757
  17. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, IEEE Trans. Pattern Anal. Mach. Intell. 40 (2018), no. 4, 834-848. https://doi.org/10.1109/TPAMI.2017.2699184
  18. X. Lian, Y. Pang, J. Han, and J. Pan, Cascaded hierarchical atrous spatial pyramid pooling module for semantic segmentation, Pattern Recognit. 110 (2021), 107622. https://doi.org/10.1016/j.patcog.2020.107622
  19. L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, Rethinking atrous convolution for semantic image segmentation, arXiv Preprint, Dec. 2017. https://doi.org/10.48550/arXiv.1706.05587
  20. J. L. Schonberger, E. Zheng, J. M. Frahm, and M. Pollefeys, Pixelwise view selection for unstructured multi-view stereo, (Proc. European Conference on Computer Vision, Amsterdam, Netherlands), Oct. 2016. https://doi.org/10.1007/978-3-319-46487-9_31
  21. K. N. Kutulakos and S. M. Seitz, A theory of shape by space carving, (Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece), Sept. 1999. https://doi.org/10.1109/ICCV.1999.791235
  22. A. Kar, C. Hane, and J. Malik, Learning a multi-view stereo machine, arXiv preprint, Aug. 2017. https://doi.org/10.48550/arXiv.1708.05375
  23. S. M. Seitz and C. R. Dyer, Photorealistic Scene Reconstruction by Voxel Coloring US Patent 6363170B1, filed Apr, vol. 29, issued Mar. 26, 2002. 1999.
  24. M. Ji, J. Gall, H. Zheng, Y. Liu, and L. Fang, SurfaceNet: An end-to-end 3D neural network for multiview stereopsis, (IEEE International Conference on Computer Vision, Venice, Italy), Oct. 2017. https://doi.org/10.1109/iccv.2017.253
  25. M. Lhuillier and L. Quan, A quasi-dense approach to surface reconstruction from uncalibrated images, IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005), no. 3, 418-433. https://doi.org/10.1109/TPAMI.2005.44
  26. Y. Furukawa and J. Ponce, Accurate, dense, and robust multiview stereopsis, IEEE Trans. Pattern Anal. Mach. Intell. 32 (2010), no. 8, 1362-1376. https://doi.org/10.1109/TPAMI.2009.161
  27. E. Tola, C. Strecha, and P. Fua, Efficient large-scale multi-view stereo for ultra high-resolution image sets, Mach. Vis. Appl. 23 (2012), no. 5, 903-920. https://doi.org/10.1007/s00138-011-0346-8
  28. S. Galliani, K. Lasinger, and K. Schindler, Massively parallel multiview stereopsis by surface normal diffusion, (IEEE International Conference on Computer Vision, Santiago, Chile), Dec. 2015. https://doi.org/10.1109/iccv.2015.106
  29. Y. Yao, S. Li, S. Zhu, H. Deng, T. Fang, and L. Quan, Relative camera refinement for accurate dense reconstruction, (International Conference on 3D Vision, Qingdao, China), Oct. 2017. https://doi.org/10.1109/3dv.2017.00030
  30. A. Romanoni and M. Matteucci, TAPA-MVS: textureless-aware PAtchMatch multi-view stereo, (IEEE/CVF International Conference on Computer Vision, Seoul, Rep. of Korea), 2019. https://doi.org/10.1109/iccv.2019.01051
  31. N. D. F. Campbell, G. Vogiatzis, C. Hernandez, and R. Cipolla, Using multiple hypotheses to improve depth-maps for multi-view stereo, (European Conference on Computer Vision, Marseille, France), 2008. https://doi.org/10.1007/978-3-540-88682-2_58
  32. R. Zhang, S. Zhu, T. Fang, and L. Quan, Distributed very large scale bundle adjustment by global camera consensus, (IEEE International Conference on Computer Vision, Venice, Italy), Oct. 2017. https://doi.org/10.1109/iccv.2017.13
  33. S. Zhu, R. Zhang, L. Zhou, T. Shen, T. Fang, P. Tan, and L. Quan, Very large-scale global SfM by distributed motion averaging, (IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA), 2018. https://doi.org/10.1109/cvpr.2018.00480
  34. R. Jensen, A. Dahl, G. Vogiatzis, E. Tola, and H. Aanaes, Large scale multi-view stereopsis evaluation, (IEEE Conference on Computer Vision and Pattern Recognition, Columbus OH, USA), 2014. https://doi.org/10.1109/cvpr.2014.59
  35. H. Aanaes, R. R. Jensen, G. Vogiatzis, E. Tola, and A. B. Dahl, Large-scale data for multiple-view stereopsis, Int. J. Comput. Vision, 120 (2016), no. 2, 153-168. https://doi.org/10.1007/s11263-016-0902-9
  36. P.-S. Wang, Y. Liu, Y.-X. Guo, C.-Y. Sun, and X. Tong, O-CNN: Octree-based convolutional neural networks for 3D shape analysis, ACM Trans. Graph. 36 (2017), no. 4, 1-11. https://doi.org/10.1145/3072959.3073608
  37. G. Riegler, A. O. Ulusoy, and A. Geiger, OctNet: Learning deep 3D representations at high resolutions, (IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA), 2017. https://doi.org/10.1109/cvpr.2017.701
  38. C. Farabet, C. Couprie, L. Najman, and Y. LeCun, Learning hierarchical features for scene labeling, IEEE Trans. Pattern Anal. Mach. Intell. 35 (2013), no. 8, 1915-1929. https://doi.org/10.1109/TPAMI.2012.231
  39. D. Eigen and R. Fergus, Predicting depth, surface normal, and semantic labels with a common multi-scale convolutional architecture, (IEEE International Conference on Computer Vision, Santiago, Chile), 2015. https://doi.org/10.1109/iccv.2015.304
  40. P. Pinheiro and R. Collobert, Recurrent convolutional neural networks for scene labeling, (Proceedings of the 31st International Conference on International Conference on Machine Learning, Beijing, China), June 2014, pp. 82-90.
  41. G. Lin, C. Shen, A. van den Hengel, and I. Reid, Efficient piecewise training of deep structured models for semantic segmentation, (IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas NV, USA), 2016. https://doi.org/10.1109/cvpr.2016.348
  42. V. Badrinarayanan, A. Kendall, and R. Cipolla, SegNet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 39 (2017), no. 12, 2481-2495. https://doi.org/10.1109/TPAMI.2016.2644615
  43. O. Ronneberger, P. Fischer, and T. Brox, U-Net: Convolutional networks for biomedical image segmentation, (International Conference Medical Image Computing and Computer-Assisted Intervention, Munich, Germany), Oct. 2015. https://doi.org/10.1007/978-3-319-24574-4_28
  44. G. Ghiasi and C. C. Fowlkes, Laplacian pyramid reconstruction and refinement for semantic segmentation, arXiv Preprint, 2016. https://doi.org/10.48550/arXiv.1605.02264
  45. G. Lin, A. Milan, C. Shen, and I. Reid, RefineNet: Multi-path refinement networks for high-resolution semantic segmentation, (IEEE Conference on Computer Vision and Pattern Recognition, Hololulu, HI, USA), 2017. https://doi.org/10.1109/cvpr.2017.549
  46. T. Pohlen, A. Hermans, M. Mathias, and B. Leibe, Full-resolution residual networks for semantic segmentation in street scenes, (IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA), 2017. https://doi.org/10.1109/cvpr.2017.353
  47. C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun, Large kernel matters - Improve semantic segmentation by global convolutional network, (IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA), 2017. https://doi.org/10.1109/cvpr.2017.189
  48. M. A. Islam, M. Rochan, N. D. B. Bruce, and Y. Wang, Gated feedback refinement network for dense image labeling, (IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA), 2017. https://doi.org/10.1109/cvpr.2017.518
  49. P. Krahenbuhl and V. Koltun, Efficient inference in fully connected CRFs with Gaussian edge potentials, Neural Inform Process. Syst. 24 (2011), 109-117.
  50. L.-C. Chen, Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, (International Conference on Learning Representations, San Diego, CA, USA), May 2015.
  51. S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. S. Torr, Conditional random fields as recurrent neural networks, (IEEE International Conference on Computer Vision, Santiago, Chile), Dec. 2015. https://doi.org/10.1109/iccv.2015.179
  52. A. G. Schwing, and R. Urtasun, Fully Connected deep structured networks, arXiv preprint, 2015. https://doi.org/10.48550/arXiv.1503.02351
  53. Z. Liu, X. Li, P. Luo, C.-C. Loy, and X. Tang, Semantic image segmentation via deep parsing network, (IEEE International Conference on Computer Vision, Santiago, Chile), 2015. https://doi.org/10.1109/iccv.2015.162
  54. F. Yu and V. Koltun, Multi-scale context aggregation by dilated convolutions, arXiv preprint, ICLR, 2016. https://doi.org/10.48550/arXiv.1511.07122
  55. H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, Pyramid scene parsing network, (IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA), 2017. https://doi.org/10.1109/cvpr.2017.660
  56. Z. Wei, H. Yi, M. Ding, R. Zhang, Y. Chen, G. Wang, and Y.-W. Tai, Dense hybrid recurrent multi-view stereo net with dynamic consistency checking, (ECCV 2020: 16th European Conference, Glasgow, UK). Aug. 2020. https://doi.org/10.1007/978-3-030-58548-8_39