Pose Estimation with Binarized Multi-Scale Module

  • Choi, Yong-Gyun (Department of Computer Engineering, Dongseo University) ;
  • Lee, Sukho (Department of Computer Engineering, Dongseo University)
  • Received : 2018.05.15
  • Accepted : 2018.05.28
  • Published : 2018.06.30


In this paper, we propose a binarized multi-scale module to accelerate the speed of the pose estimating deep neural network. Recently, deep learning is also used for fine-tuned tasks such as pose estimation. One of the best performing pose estimation methods is based on the usage of two neural networks where one computes the heat maps of the body parts and the other computes the part affinity fields between the body parts. However, the convolution filtering with a large kernel filter takes much time in this model. To accelerate the speed in this model, we propose to change the large kernel filters with binarized multi-scale modules. The large receptive field is captured by the multi-scale structure which also prevents the dropdown of the accuracy in the binarized module. The computation cost and number of parameters becomes small which results in increased speed performance.


Deep Learning;Pose Estimation;Binarized Network;Multi-Scale


Supported by : National Research Foundation of Korea(NRF)


  1. H.M. Kwon, V. Kumaran, and S. Gupta, "Real-time Tracking and Identification for Multi-Camera Surveillance System," The Journal of the Institute of Internet, Broadcasting and Communication(JIIBC), Vol. 10, No. 1, pp. 16-22, Feb. 2018. DOI:
  2. J. Deutscher and I. Reid. "Articulated body motion capture by stochastic search," International Journal of Computer Vision, Vol. 61, No. 2, pp.185-205, 2005. DOI:
  3. Y. Du, Y. Wong, Y. Liu, F. Han, Y. Gui, Z. Wang, M. Kankanhalli, and W. Geng, "Marker-less 3d human motion capture with monocular image sequence and height-maps," In European Conference on Computer Vision, pp. 20-36, 2016. DOI:
  4. J. Gall, B. Rosenhahn, T. Brox, and H.-P. Seidel, "Optimization and filtering for human motion capture," International Journal of Computer Vision, Vol. 87, No. 1, pp.75-92, 2010. DOI:
  5. M. Trumble, A. Gilbert, A. Hilton, and J. Collomosse, "Deep convolutional networks for marker-less human pose estimation from multiple views," In Proceedings of CVMP 2016. The 13th European Conference on Visual Media Production, 2016. DOI:
  6. Z. Cao, T. Simon, S-E Wei, Y. Sheikh, "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields," in Proc. Computer Vision and Pattern Recognition, pp. 7291-7299, July 21-26, 2017.
  7. M. Rastegari V. Ordonez, J. Redmon, and A. Farhadi, "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks," In European Conference on Computer Vision, pp. 525-542, Sep. 2016. DOI:
  8. A. Bulat and G. Tzimiropoulos, "Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources," Proc. International Conference on Computer Vision, March 2017. DOI: