Improved Sliding Shapes for Instance Segmentation of Amodal 3D Object

Lin, Jinhua;Yao, Yu;Wang, Yanjie;

doi:10.3837/tiis.2018.11.021

KSII Transactions on Internet and Information Systems (TIIS)

제12권11호
/
Pages.5555-5567
/
2018
/
1976-7277(pISSN)
/
1976-7277(eISSN)

한국인터넷정보학회 (Korean Society for Internet Information)

DOI QR Code

Improved Sliding Shapes for Instance Segmentation of Amodal 3D Object

Lin, Jinhua (Computer Application Technology, Changchun University of Technology) ;
Yao, Yu (Computer Application Technology, Changchun University of Technology) ;
Wang, Yanjie (Machinery & Electronics Engineering, Chinese Academy of Sciences University)

투고 : 2017.11.21
심사 : 2018.04.05
발행 : 2018.11.30

https://doi.org/10.3837/tiis.2018.11.021 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

State-of-art instance segmentation networks are successful at generating 2D segmentation mask for region proposals with highest classification score, yet 3D object segmentation task is limited to geocentric embedding or detector of Sliding Shapes. To this end, we propose an amodal 3D instance segmentation network called A3IS-CNN, which extends the detector of Deep Sliding Shapes to amodal 3D instance segmentation by adding a new branch of 3D ConvNet called A3IS-branch. The A3IS-branch which takes 3D amodal ROI as input and 3D semantic instances as output is a fully convolution network(FCN) sharing convolutional layers with existing 3d RPN which takes 3D scene as input and 3D amodal proposals as output. For two branches share computation with each other, our 3D instance segmentation network adds only a small overhead of 0.25 fps to Deep Sliding Shapes, trading off accurate detection and point-to-point segmentation of instances. Experiments show that our 3D instance segmentation network achieves at least 10% to 50% improvement over the state-of-art network in running time, and outperforms the state-of-art 3D detectors by at least 16.1 AP.

키워드

참고문헌

S. Gupta, R. Girshick, P. Arbelaez and J. Malik, "Learning Rich Features from RGB-D Images for Object Detection and Segmentation," in Proc. of the 13th European Conference on Computer Vision, pp. 345-360, September 6-12, 2014.
S. Gupta, P. Arbelaez, R. Girshick and J. Malik, "Aligning 3d models to rgb-d images of cluttered scenes," in Proc. of the 28th IEEE Conference on Computer Vision and Pattern Recognition, pp. 4731-4740, June 7-12, 2015.
S. Song, J. Xiao, "Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images," in Proc. of the 29th IEEE Conference on Computer Vision and Pattern Recognition, pp. 808-816, June 27-30, 2016.
Ross Girshick, Jeff Donahue, Trevor Darrell and Jitendra Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," in Proc. of the 27th IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, June 23-28, 2014.
J. R. Uijlings, K. E. Sande, T. Gevers and A. W. Smeulders, "Selective Search for Object Recognition," International Journal of Computer Vision, vol. 104, no. 2, pp. 154-171, September, 2013. https://doi.org/10.1007/s11263-013-0620-5
S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp:1137-1149, June, 2017. https://doi.org/10.1109/TPAMI.2016.2577031
D. Maturana and S. Scherer, "VoxNet: A 3D Convolutional Neural Network for real-time object recognition," in Proc. of IEEE Conf. on Intelligent Robots and Systems, pp.250-257, September 28-October 2, 2015.
H. Su, S. Maji, E. Kalogerakis and E. Learnedmiller, "Multi-view Convolutional Neural Networks for 3D Shape Recognition," in Proc. of the 28th IEEE Conference on Computer Vision and Pattern Recognition, pp. 945-953, December 7-13, 2015.
Charles Ruizhongtai Qi, Hao Su and Kaichun Mo, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation," in Proc. of the 29th IEEE Conference on Computer Vision and Pattern Recognition, pp. 201-210, June 27-30, 2016.
N. Silberman, D. Hoiem, P. Kohli and R. Fergus, "Indoor Segmentation and Support Inference from RGBD Images," in Proc. of the 11th European Conference on Computer Vision, pp. 746-760, September 6-12, 2012.
Z. Wu, S. Song, A. Khosla and F. Yu. "3D ShapeNets: A deep representation for volumetric shapes," in Proc. of the 27th IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912-1920, June 23-28, 2014.
K. He, G. Gkioxari, P. Dollar and R. Girshick, "Mask R-CNN," in Proc. of the 17th International Conference on Computer Vision, pp. 746-760, October 22-29, 2017.
Girshick R, "Fast R-CNN," in Proc. of the 15th International Conference on Computer Vision, pp. 1440-1448, December 7-13, 2015.
K. He, X. Zhang, S. Ren and J. Sun, "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition," IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 37, no. 9, pp:1904-1916, September, 2015. https://doi.org/10.1109/TPAMI.2015.2389824
S. Ren, R. Girshick, R. Girshick and J Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp:1137-1149, June, 2017. https://doi.org/10.1109/TPAMI.2016.2577031
Yuesheng Zhu, Yifeng Jiang, Zhuandi Huang and Guibo Luo, "SuperDepthTransfer: Depth Extraction from Image Using Instance-Based Learning with Superpixels," KSII Transactions on Internet and Information Systems, vol. 11, no. 10, pp. 4968-4986, 2017. https://doi.org/10.3837/tiis.2017.10.015
Yiyu Hong and Jongweon Kim, "Retrieval of Non-rigid 3D Models Based on Approximated Topological Structure and Local Volume," KSII Transactions on Internet and Information Systems, vol. 11, no. 8, pp. 3950-3964, 2017. https://doi.org/10.3837/tiis.2017.08.011
T. Xi, W. Zhao, H. Wang and W. Lin, "Salient object detection with spatiotemporal background priors for video," IEEE Transactions on Image Processing, vol. 26, no. 7, pp:3425-3436, July, 2017. https://doi.org/10.1109/TIP.2016.2631900
EshedOhn-Bar and Mohan ManubhaiTrivedi, "Multi-scale volumes for deep object detection and localization," Pattern Recognition, vol. 61, no. 1, pp:557-572, January, 2017. https://doi.org/10.1016/j.patcog.2016.06.002
Xiao Li, Ming Fang, JuJie Zhang and Jinqiao Wu, "Learning Coupled Classifiers with RGB images for RGB-D object recognition," Pattern Recognition, vol. 61, no. 1, pp:433-446, January, 2017. https://doi.org/10.1016/j.patcog.2016.08.016
S. Gupta, R. Girshick and J. Malik, "Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation," International Journal of Computer Vision, vol. 112, no. 2, pp:133-149, April, 2015. https://doi.org/10.1007/s11263-014-0777-6
U. Asif, M. Bennamoun and F.A. Sohel, "RGB-D Object Recognition and Grasp Detection Using Hierarchical Cascaded Forests," IEEE Transactions on Robotics, vol. 33, no. 3, pp:547-564, June, 2017. https://doi.org/10.1109/TRO.2016.2638453
X. Xu, Y. Li, G. Wu and J. Luo, "Multi-modal Deep Feature Learning for RGB-D Object Detection," Pattern Recognition, vol. 72, no. 4, pp:300-313, December, 2017. https://doi.org/10.1016/j.patcog.2017.07.026
C.Y. Ren, V.A. Prisacariu, O. Kahler, ID Reid and DW Murray, "Real-Time Tracking of Single and Multiple Objects from Depth-Colour Imagery Using 3D Signed Distance Functions," International Journal of Computer Vision, vol.124, no. 1, pp:1-16, August, 2017. https://doi.org/10.1007/s11263-017-1028-4
Syed Afaq Ali Shah, Mohammed Bennamoun and Farid Boussaid, "Keypoints-based surface representation for 3D modeling and 3D object recognition," Pattern Recognition, vol. 64, no. 3, pp:29-38, April, 2017. https://doi.org/10.1016/j.patcog.2016.10.028
ZehuanYuan, Tong Lu and Chew LimTan, "Learning Discriminated and Correlated Patches for Multi-View Object Detection using Sparse Coding," Pattern Recognition, vol. 69, no. 4, pp:26-38, September, 2017. https://doi.org/10.1016/j.patcog.2017.03.033
PengShuai Wang, Yang Liu, YuXiao Guo and Xin Tong, "O-CNN: octree-based convolutional neural networks for 3D shape analysis," ACM Transactions on Graphics, vol. 36, no. 4, pp:1-11, July, 2017.
Radu Bogdan Rusu and Steve Cousins, "3D is here: Point Cloud Library (PCL)," in Proc. of IEEE International Conference on Robotics and Automation, pp. 1-4, May 9-13, 2011.
J. Digne and J.M. Morel, "Numerical analysis of differential operators on raw point clouds," Numerische Mathematik, vol. 127, no. 2, pp:255-289, June, 2014. https://doi.org/10.1007/s00211-013-0584-y
W. Cheng, W. Lin, X.Zhang, M. Goesele and M.T. Sun, "A Data-Driven Point Cloud Simplification Framework for City-Scale Image-Based Localization," IEEE Transactions on Image Processing, vol. 26, no. 1, pp:262-275, January, 2016. https://doi.org/10.1109/TIP.2016.2623488
Zhenyu Shu, Chengwu Qi, Ligang Liu, Shiqing Xin, Chao Hu, Li Wang and Yu Zhang, "Unsupervised 3D shape segmentation and co-segmentation via deep learning," Computer Aided Geometric Design, vol. 43, no. C, pp:39-52, March, 2016. https://doi.org/10.1016/j.cagd.2016.02.015
Kaan Yucer, Alexander Sorkine-Hornung, Oliver Wang and Olga Sorkine-Hornung, "Efficient 3D Object Segmentation from Densely Sampled Light Fields with Applications to 3D Reconstruction," ACM Transactions on Graphics, vol. 35, no. 3, pp:22, June, 2016.
Anurag Arnab and Philip H. S. Torr, "Pixelwise Instance Segmentation with a Dynamically Instantiated Network," in Proc. of the 30th IEEE Conference on Computer Vision and Pattern Recognition, pp. 879-888, July 21-26, 2017.
Jonathan Long, Evan Shelhamer and Trevor Darrell, "Fully convolutional networks for semantic segmentation," Computer Vision and Pattern Recognition. in Proc. of the 28th IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, June 7-12, 2015.
Daniel Maturana and Sebastian Scherer, "VoxNet: A 3D Convolutional Neural Network for real-time object recognition," in Proc. of IEEE International Conference on Intelligent Robots and Systems, pp:922-928, September 28- October 2, 2015.
G. Hackenberg, R. McCall and W. Broll, "Lightweight palm and finger tracking for real-time 3D gesture control," in Proc. of the 11th IEEE Conf. on Virtual Reality, pp:19-26, March 19-23, 2011.
Luis A. Alexandre, "3D Object Recognition Using Convolutional Neural Networks with Transfer Learning Between Input Channels," in Proc. of the 13th International Conference on Advances in Intelligent Systems and Computing, pp:889-898, January, 2016.
Z. Cai, X. He, J. Sun and N, "Vasconcelos. Deep Learning with Low Precision by Half-wave Gaussian Quantization," in Proc. of the 30th IEEE Conference on Computer Vision and Pattern Recognition, pp. 5406-5414, July 23-28, 2017.
S. Han, H. Mao andW.J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," Fiber, vol. 56, no. 4, pp:3-7, October, 2016.
I. Lenz, H. Lee and A. Saxena, "Deep Learning for Detecting Robotic Grasps," International Journal of Robotics Research, vol. 34, no. 4-5, pp:705-724, January, 2013. https://doi.org/10.1177/0278364914549607

KSII Transactions on Internet and Information Systems (TIIS)

Improved Sliding Shapes for Instance Segmentation of Amodal 3D Object

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)