DOI QR코드

DOI QR Code

Deep Learning-Based Human Motion Denoising

딥 러닝 기반 휴먼 모션 디노이징

  • Kim, Seong Uk (Dept. of Computer Science, Kangwon National University) ;
  • Im, Hyeonseung (Dept. of Computer Science, Kangwon National University) ;
  • Kim, Jongmin (Dept. of Computer Science, Kangwon National University)
  • Received : 2019.12.10
  • Accepted : 2019.12.26
  • Published : 2019.12.31

Abstract

In this paper, we propose a novel method of denoising human motion using a bidirectional recurrent neural network (BRNN) with an attention mechanism. The corrupted motion captured from a single 3D depth sensor camera is automatically fixed in the well-established smooth motion manifold. Incorporating an attention mechanism into BRNN achieves better optimization results and higher accuracy than other deep learning frameworks because a higher weight value is selectively given to a more important input pose at a specific frame for encoding the input motion. Experimental results show that our approach effectively handles various types of motion and noise, and we believe that our method can sufficiently be used in motion capture applications as a post-processing step after capturing human motion.

본 논문에서는 어텐션 기법을 적용한 양방향 순환신경망을 이용하여 새로운 휴먼 모션 디노이징 방법을 제안한다. 본 방법을 이용하면, 단일 3D 깊이 센서 카메라에서 캡처된 노이즈가 포함된 사람의 움직임이 잘 교정된 자연스러운 움직임으로 자동 조정된다. 양방향 순환신경망에 어텐션 기법을 도입하면, 입력으로 들어온 움직임을 인코딩할 때 여러 자세 중에 더 중요한 자세가 있는 프레임에 더 높은 어텐션 가중치를 부여함으로써, 다른 딥 러닝 네트워크와 비교해 더 나은 최적화 결과와 더 높은 정확도를 보인다. 실험을 통해 본 논문에서 제시한 방법이 다양한 스타일의 움직임과 노이즈를 효과적으로 처리함을 확인하였으며, 제시한 방법은 모션 캡처 후처리 단계의 애플리케이션으로 충분히 사용 가능할 것으로 기대된다.

Keywords

References

  1. J. Chai and J. K. Hodgins, "Performance animation from low-dimensional control signals," ACM Trans. Graph, Vol.24, no.3, pp.686-696, 2005. DOI: 10.1145/1073204.1073248
  2. Y. Lee, K. Wampler, G. Bernstein, J. Popovic, and Z. Popovic, "Motion fields for interactive character locomotion," ACM Trans. Graph, 29, 6, Article 138, 2010.
  3. C. F. Rose III, P.-P. J. Sloan, and M. F. Cohen, "Artist‐directed inverse kinematics using radial basis function interpolation," Computer Graphics Forum, Vol.20. No.3. pp.239-250, 2001. DOI: 10.1111/1467-8659.00516
  4. T. Mukai and S. Kuriyama, "Geostatistical motion interpolation," ACM Trans. Graph., vol.24, no.3, pp.1062-1070, 2005. DOI: 10.1145/1073204.1073313
  5. N. D. Lawrence, "Gaussian process latent variable models for visualisation of high dimensional data," In Proceedings of the 16th International Conference on Neural Information Processing Systems (NIPS'03), pp.329-336, 2004.
  6. K. Grochow, S. L. Martin, A. Hertzmann, and Z. Popovic, "Style-based inverse kinematics," ACM Trans. Graph, vol.23, no.3, pp.522-531, 2004. DOI: 10.1145/1015706.1015755
  7. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, pp.1097-1105, 2012. DOI: 10.1145/3065386
  8. D. Cireşan, U. Meier, and J. Schmidhuber, "Multi-column deep neural networks for image classification," arXiv preprint arXiv:1202.2745, 2012. DOI: 10.1109/CVPR.2012.6248110
  9. A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, "Large-Scale Video Classification with Convolutional Neural Networks," The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1725-1732, 2014. DOI: 10.1109/CVPR.2014.223
  10. F. Nasse, C. Thurau, and G. A. Fink, "Face detection using gpu-based convolutional neural networks," International Conference on Computer Analysis of Images and Patterns, pp.83-90, 2009. DOI: 10.1007/978-3-642-03767-2_10
  11. S. Ji, W. Xu, M. Yang, and K. Yu, "3D Convolutional Neural Networks for Human Action Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.35, no.1, pp.221-231, 2013. DOI: 10.1109/TPAMI.2012.59
  12. J. Fan, W. Xu, Y. Wu, and Y. Gong, "Human Tracking Using Convolutional Neural Networks," IEEE Transactions on Neural Networks, vol.21, no.10, pp.1610-1623, 2010. DOI: 10.1109/TNN.2010.2066286
  13. O. Abdel-Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu, "Convolutional Neural Networks for Speech Recognition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.22, no.10, pp.1533-1545, 2014. DOI: 10.1109/TASLP.2014.2339736
  14. D. Holden, J. Saito, T. Komura, and T. Joyce, "Learning motion manifolds with convolutional autoencoders," SIGGRAPH Asia 2015 Technical Briefs, Article 18, 2015.
  15. A. Pandey, and D. Wang, "TCNN: Temporal Convolutional Neural Network for Real-Time Speech Enhancement in The Time Domain," 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.6875-6879, 2019. DOI: 10.1109/ICASSP.2019.8683634
  16. C. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager, "Temporal convolutional networks for action segmentation and detection," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.156-165, 2017.
  17. L. Sun, K. Jia, D. Yeung, and B. E. Shi, "Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks," 2015 IEEE International Conference on Computer Vision (ICCV), pp4597-4605. 2015.
  18. J. Guo and H. Chao, "Building an end-to-end spatial-temporal convolutional network for video super-resolution," Thirty-First AAAI Conference on Artificial Intelligence, 2017.
  19. CMU, "Carnegie-Mellon Motion Capture Database," http://mocap.cs.cmu.edu/.
  20. S. R. Buss, "Introduction to inverse kinematics with jacobian transpose, pseudoinverse and damped least squares methods," IEEE Journal of Robotics and Automation, Vol.17, No.16, pp.1-19, 2004.
  21. S. U. Kim, H. Jang, and J. Kim, "Human Motion Denoising Using Attention-Based Bidirectional Recurrent Neural Network," In SIGGRAPH Asia 2019 Posters (SA '19), Article 2, 2019. DOI: 10.1145/3355056.3364577