DOI QR코드

DOI QR Code

Learning-Based Multiple Pooling Fusion in Multi-View Convolutional Neural Network for 3D Model Classification and Retrieval

  • Zeng, Hui (Beijing Engineering Research Center of Industrial Spectrum Imaging, School of Automation and Electrical Engineering, University of Science and Technology Beijing) ;
  • Wang, Qi (Beijing Engineering Research Center of Industrial Spectrum Imaging, School of Automation and Electrical Engineering, University of Science and Technology Beijing) ;
  • Li, Chen (School of Computer Science and Technology, North China University of Technology) ;
  • Song, Wei (School of Computer Science and Technology, North China University of Technology)
  • 투고 : 2017.06.09
  • 심사 : 2018.01.28
  • 발행 : 2019.10.31

초록

We design an ingenious view-pooling method named learning-based multiple pooling fusion (LMPF), and apply it to multi-view convolutional neural network (MVCNN) for 3D model classification or retrieval. By this means, multi-view feature maps projected from a 3D model can be compiled as a simple and effective feature descriptor. The LMPF method fuses the max pooling method and the mean pooling method by learning a set of optimal weights. Compared with the hand-crafted approaches such as max pooling and mean pooling, the LMPF method can decrease the information loss effectively because of its "learning" ability. Experiments on ModelNet40 dataset and McGill dataset are presented and the results verify that LMPF can outperform those previous methods to a great extent.

키워드

참고문헌

  1. M. Ankerst, G. Kastenmuller, H. P. Kriegel, and T. Seidl, "3D shape histograms for similarity search and classification in spatial databases," in Advances in Spatial Databases. Heidelberg: Springer, 1999, pp. 207-226.
  2. M. T. Suzuki, T. Kato, and N. Otsu, "A similarity retrieval of 3D polygonal models using rotation invariant shape descriptors," in Proceedings of 2000 IEEE International Conference on Systems, Man and Cybernetics, Nashville, TN, 2000, pp. 2946-2952.
  3. R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin, "Shape distributions," ACM Transactions on Graphics (TOG), vol. 21, no. 4, pp. 807-832, 2002. https://doi.org/10.1145/571647.571648
  4. B. K. P. Horn, "Extended Gaussian images," Proceedings of the IEEE, vol. 72, no. 12, pp. 1671-1686, 1984. https://doi.org/10.1109/PROC.1984.13073
  5. M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz, "Rotation invariant spherical harmonic representation of 3D shape descriptors," in Proceedings of the 2003 Eurographics Symposium on Geometry Processing, Aachen, Germany, 2003, pp. 156-164.
  6. S. K. Vipparthi and S. K. Nagar, "Color directional local quinary patterns for content based indexing and retrieval," Human-centric Computing and Information Sciences, vol. 4, article no. 6, 2014.
  7. D. G. Lowe, "Distinctive image features from scale-invariant keypoints,"International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004. https://doi.org/10.1023/B:VISI.0000029664.99615.94
  8. H. Bay, T. Tuytelaars, and L. Van Gool, "Surf: speeded up robust features," in Computer Vision-ECCV 2006. Heidelberg: Springer, 2006, pp. 404-417.
  9. J. Zhu, R. San-Segundo, and J. M. Pardo, "Feature extraction for robust physical activity recognition," Human-centric Computing and Information Sciences, vol. 7, article no. 16, 2017.
  10. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998. https://doi.org/10.1109/5.726791
  11. H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, "Multi-view convolutional neural networks for 3D shape recognition," in Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 2015, pp. 945-953.
  12. G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 504-507, 2006. https://doi.org/10.1126/science.1127647
  13. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Advances in Neural Information Processing Systems, vol. 25, pp. 1097-1105, 2012.
  14. M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in Computer Vision- ECCV 2014. Cham: Springer, pp. 818-833.
  15. K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," in Computer Vision-ECCV 2014. Cham: Springer, pp. 346-361.
  16. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," 2014 [Online]. Available: https://arxiv.org/abs/1409.1556.
  17. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, 2015, pp. 1-9.
  18. M. D. Zeiler and R. Fergus, "Stochastic pooling for regularization of deep convolutional neural networks," 2013 [Online]. https://arxiv.org/abs/1301.3557.
  19. Z. Zhong, L. Jin, and Z. Feng, "Multi-font printed Chinese character recognition using multi-pooling convolutional neural network," in Proceedings of 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 2015, pp. 96-100.
  20. C. Y. Lee, P. W. Gallagher, and Z. Tu, "Generalizing pooling functions in convolutional neural networks: mixed, gated, and tree," Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), Cadiz, Spain, 2016, pp. 464-472.
  21. M. Zouina and B. Outtaj, "A novel lightweight URL phishing detection system using SVM and similarity index," Human-centric Computing and Information Sciences, vol. 7, article no. 17, 2017.
  22. The Princeton ModelNet [Online]. Available: http://modelnet.cs.princeton.edu.
  23. McGill 3D Shape Benchmark [Online]. Available: http://www.cim.mcgill.ca/-shape/benchMark.
  24. A. Vedaldi and K. Lenc, "Matconvnet: Convolutional neural networks for matlab," in Proceedings of the 23rd ACM International Conference on Multimedia, Brisbane, Australia, 2015, pp. 689-692.
  25. D. Y. Chen, X. P. Tian, Y. T. Shen, and M. Ouhyoung, "On visual similarity based 3D model retrieval," Computer Graphics Forum, vol. 22, no. 3, pp. 223-232, 2003. https://doi.org/10.1111/1467-8659.00669
  26. Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, "3D ShapeNets: a deep representation for volumetric shapes," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, 2015, pp. 1912-1920.
  27. L. Wan, T. T. Wong, C. S. Leung, "Isocube spherical mapping," Journal of Computer-Aided Design & Computer Graphics, vol. 20, no. 8, pp. 978-985, 2008.
  28. H. Tabia, H. Laga, D. Picard, and P. H. Gosselin, "Covariance descriptors for 3D shape matching and retrieval," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, 2014, pp. 4185-4192.
  29. A. Agathos, I. Pratikakis, P. Papadakis, S. J. Perantonis, P. N. Azariadis, and N. S. Sapidis, "Retrieval of 3D articulated objects using a graph-based representation," in Proceedings of the Eurographics Workshop on 3D Object Retrieval (3DOR), Munich, Germany, 2009, pp. 29-36.
  30. H. Tabia, D. Picard, H. Laga, and P. H. Gosselin, "Compact vectors of locally aggregated tensors for 3D shape retrieval," in Proceedings of the Eurographics Workshop on 3D Object Retrieval (3DOR), Girona, Spain, 2013, pp. 17-24.
  31. P. Papadakis, I. Pratikakis, T. Theoharis, G. Passalis, and S. Perantonis, "3D object retrieval using an efficient and compact hybrid shape descriptor," in Proceedings of the Eurographics Workshop on 3D Object Retrieval (3DOR), Crete, Greece, 2008, pp. 9-16.