DOI QR코드

DOI QR Code

Improved Sliding Shapes for Instance Segmentation of Amodal 3D Object

  • Lin, Jinhua (Computer Application Technology, Changchun University of Technology) ;
  • Yao, Yu (Computer Application Technology, Changchun University of Technology) ;
  • Wang, Yanjie (Machinery & Electronics Engineering, Chinese Academy of Sciences University)
  • Received : 2017.11.21
  • Accepted : 2018.04.05
  • Published : 2018.11.30

Abstract

State-of-art instance segmentation networks are successful at generating 2D segmentation mask for region proposals with highest classification score, yet 3D object segmentation task is limited to geocentric embedding or detector of Sliding Shapes. To this end, we propose an amodal 3D instance segmentation network called A3IS-CNN, which extends the detector of Deep Sliding Shapes to amodal 3D instance segmentation by adding a new branch of 3D ConvNet called A3IS-branch. The A3IS-branch which takes 3D amodal ROI as input and 3D semantic instances as output is a fully convolution network(FCN) sharing convolutional layers with existing 3d RPN which takes 3D scene as input and 3D amodal proposals as output. For two branches share computation with each other, our 3D instance segmentation network adds only a small overhead of 0.25 fps to Deep Sliding Shapes, trading off accurate detection and point-to-point segmentation of instances. Experiments show that our 3D instance segmentation network achieves at least 10% to 50% improvement over the state-of-art network in running time, and outperforms the state-of-art 3D detectors by at least 16.1 AP.

Keywords

References

  1. S. Gupta, R. Girshick, P. Arbelaez and J. Malik, "Learning Rich Features from RGB-D Images for Object Detection and Segmentation," in Proc. of the 13th European Conference on Computer Vision, pp. 345-360, September 6-12, 2014.
  2. S. Gupta, P. Arbelaez, R. Girshick and J. Malik, "Aligning 3d models to rgb-d images of cluttered scenes," in Proc. of the 28th IEEE Conference on Computer Vision and Pattern Recognition, pp. 4731-4740, June 7-12, 2015.
  3. S. Song, J. Xiao, "Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images," in Proc. of the 29th IEEE Conference on Computer Vision and Pattern Recognition, pp. 808-816, June 27-30, 2016.
  4. Ross Girshick, Jeff Donahue, Trevor Darrell and Jitendra Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," in Proc. of the 27th IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, June 23-28, 2014.
  5. J. R. Uijlings, K. E. Sande, T. Gevers and A. W. Smeulders, "Selective Search for Object Recognition," International Journal of Computer Vision, vol. 104, no. 2, pp. 154-171, September, 2013. https://doi.org/10.1007/s11263-013-0620-5
  6. S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp:1137-1149, June, 2017. https://doi.org/10.1109/TPAMI.2016.2577031
  7. D. Maturana and S. Scherer, "VoxNet: A 3D Convolutional Neural Network for real-time object recognition," in Proc. of IEEE Conf. on Intelligent Robots and Systems, pp.250-257, September 28-October 2, 2015.
  8. H. Su, S. Maji, E. Kalogerakis and E. Learnedmiller, "Multi-view Convolutional Neural Networks for 3D Shape Recognition," in Proc. of the 28th IEEE Conference on Computer Vision and Pattern Recognition, pp. 945-953, December 7-13, 2015.
  9. Charles Ruizhongtai Qi, Hao Su and Kaichun Mo, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation," in Proc. of the 29th IEEE Conference on Computer Vision and Pattern Recognition, pp. 201-210, June 27-30, 2016.
  10. N. Silberman, D. Hoiem, P. Kohli and R. Fergus, "Indoor Segmentation and Support Inference from RGBD Images," in Proc. of the 11th European Conference on Computer Vision, pp. 746-760, September 6-12, 2012.
  11. Z. Wu, S. Song, A. Khosla and F. Yu. "3D ShapeNets: A deep representation for volumetric shapes," in Proc. of the 27th IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912-1920, June 23-28, 2014.
  12. K. He, G. Gkioxari, P. Dollar and R. Girshick, "Mask R-CNN," in Proc. of the 17th International Conference on Computer Vision, pp. 746-760, October 22-29, 2017.
  13. Girshick R, "Fast R-CNN," in Proc. of the 15th International Conference on Computer Vision, pp. 1440-1448, December 7-13, 2015.
  14. K. He, X. Zhang, S. Ren and J. Sun, "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition," IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 37, no. 9, pp:1904-1916, September, 2015. https://doi.org/10.1109/TPAMI.2015.2389824
  15. S. Ren, R. Girshick, R. Girshick and J Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp:1137-1149, June, 2017. https://doi.org/10.1109/TPAMI.2016.2577031
  16. Yuesheng Zhu, Yifeng Jiang, Zhuandi Huang and Guibo Luo, "SuperDepthTransfer: Depth Extraction from Image Using Instance-Based Learning with Superpixels," KSII Transactions on Internet and Information Systems, vol. 11, no. 10, pp. 4968-4986, 2017. https://doi.org/10.3837/tiis.2017.10.015
  17. Yiyu Hong and Jongweon Kim, "Retrieval of Non-rigid 3D Models Based on Approximated Topological Structure and Local Volume," KSII Transactions on Internet and Information Systems, vol. 11, no. 8, pp. 3950-3964, 2017. https://doi.org/10.3837/tiis.2017.08.011
  18. T. Xi, W. Zhao, H. Wang and W. Lin, "Salient object detection with spatiotemporal background priors for video," IEEE Transactions on Image Processing, vol. 26, no. 7, pp:3425-3436, July, 2017. https://doi.org/10.1109/TIP.2016.2631900
  19. EshedOhn-Bar and Mohan ManubhaiTrivedi, "Multi-scale volumes for deep object detection and localization," Pattern Recognition, vol. 61, no. 1, pp:557-572, January, 2017. https://doi.org/10.1016/j.patcog.2016.06.002
  20. Xiao Li, Ming Fang, JuJie Zhang and Jinqiao Wu, "Learning Coupled Classifiers with RGB images for RGB-D object recognition," Pattern Recognition, vol. 61, no. 1, pp:433-446, January, 2017. https://doi.org/10.1016/j.patcog.2016.08.016
  21. S. Gupta, R. Girshick and J. Malik, "Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation," International Journal of Computer Vision, vol. 112, no. 2, pp:133-149, April, 2015. https://doi.org/10.1007/s11263-014-0777-6
  22. U. Asif, M. Bennamoun and F.A. Sohel, "RGB-D Object Recognition and Grasp Detection Using Hierarchical Cascaded Forests," IEEE Transactions on Robotics, vol. 33, no. 3, pp:547-564, June, 2017. https://doi.org/10.1109/TRO.2016.2638453
  23. X. Xu, Y. Li, G. Wu and J. Luo, "Multi-modal Deep Feature Learning for RGB-D Object Detection," Pattern Recognition, vol. 72, no. 4, pp:300-313, December, 2017. https://doi.org/10.1016/j.patcog.2017.07.026
  24. C.Y. Ren, V.A. Prisacariu, O. Kahler, ID Reid and DW Murray, "Real-Time Tracking of Single and Multiple Objects from Depth-Colour Imagery Using 3D Signed Distance Functions," International Journal of Computer Vision, vol.124, no. 1, pp:1-16, August, 2017. https://doi.org/10.1007/s11263-017-1028-4
  25. Syed Afaq Ali Shah, Mohammed Bennamoun and Farid Boussaid, "Keypoints-based surface representation for 3D modeling and 3D object recognition," Pattern Recognition, vol. 64, no. 3, pp:29-38, April, 2017. https://doi.org/10.1016/j.patcog.2016.10.028
  26. ZehuanYuan, Tong Lu and Chew LimTan, "Learning Discriminated and Correlated Patches for Multi-View Object Detection using Sparse Coding," Pattern Recognition, vol. 69, no. 4, pp:26-38, September, 2017. https://doi.org/10.1016/j.patcog.2017.03.033
  27. PengShuai Wang, Yang Liu, YuXiao Guo and Xin Tong, "O-CNN: octree-based convolutional neural networks for 3D shape analysis," ACM Transactions on Graphics, vol. 36, no. 4, pp:1-11, July, 2017.
  28. Radu Bogdan Rusu and Steve Cousins, "3D is here: Point Cloud Library (PCL)," in Proc. of IEEE International Conference on Robotics and Automation, pp. 1-4, May 9-13, 2011.
  29. J. Digne and J.M. Morel, "Numerical analysis of differential operators on raw point clouds," Numerische Mathematik, vol. 127, no. 2, pp:255-289, June, 2014. https://doi.org/10.1007/s00211-013-0584-y
  30. W. Cheng, W. Lin, X.Zhang, M. Goesele and M.T. Sun, "A Data-Driven Point Cloud Simplification Framework for City-Scale Image-Based Localization," IEEE Transactions on Image Processing, vol. 26, no. 1, pp:262-275, January, 2016. https://doi.org/10.1109/TIP.2016.2623488
  31. Zhenyu Shu, Chengwu Qi, Ligang Liu, Shiqing Xin, Chao Hu, Li Wang and Yu Zhang, "Unsupervised 3D shape segmentation and co-segmentation via deep learning," Computer Aided Geometric Design, vol. 43, no. C, pp:39-52, March, 2016. https://doi.org/10.1016/j.cagd.2016.02.015
  32. Kaan Yucer, Alexander Sorkine-Hornung, Oliver Wang and Olga Sorkine-Hornung, "Efficient 3D Object Segmentation from Densely Sampled Light Fields with Applications to 3D Reconstruction," ACM Transactions on Graphics, vol. 35, no. 3, pp:22, June, 2016.
  33. Anurag Arnab and Philip H. S. Torr, "Pixelwise Instance Segmentation with a Dynamically Instantiated Network," in Proc. of the 30th IEEE Conference on Computer Vision and Pattern Recognition, pp. 879-888, July 21-26, 2017.
  34. Jonathan Long, Evan Shelhamer and Trevor Darrell, "Fully convolutional networks for semantic segmentation," Computer Vision and Pattern Recognition. in Proc. of the 28th IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, June 7-12, 2015.
  35. Daniel Maturana and Sebastian Scherer, "VoxNet: A 3D Convolutional Neural Network for real-time object recognition," in Proc. of IEEE International Conference on Intelligent Robots and Systems, pp:922-928, September 28- October 2, 2015.
  36. G. Hackenberg, R. McCall and W. Broll, "Lightweight palm and finger tracking for real-time 3D gesture control," in Proc. of the 11th IEEE Conf. on Virtual Reality, pp:19-26, March 19-23, 2011.
  37. Luis A. Alexandre, "3D Object Recognition Using Convolutional Neural Networks with Transfer Learning Between Input Channels," in Proc. of the 13th International Conference on Advances in Intelligent Systems and Computing, pp:889-898, January, 2016.
  38. Z. Cai, X. He, J. Sun and N, "Vasconcelos. Deep Learning with Low Precision by Half-wave Gaussian Quantization," in Proc. of the 30th IEEE Conference on Computer Vision and Pattern Recognition, pp. 5406-5414, July 23-28, 2017.
  39. S. Han, H. Mao andW.J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," Fiber, vol. 56, no. 4, pp:3-7, October, 2016.
  40. I. Lenz, H. Lee and A. Saxena, "Deep Learning for Detecting Robotic Grasps," International Journal of Robotics Research, vol. 34, no. 4-5, pp:705-724, January, 2013. https://doi.org/10.1177/0278364914549607