DOI QR코드

DOI QR Code

A Robust Approach for Human Activity Recognition Using 3-D Body Joint Motion Features with Deep Belief Network

  • Uddin, Md. Zia (Department of Informatics, University of Oslo) ;
  • Kim, Jaehyoun (Department of Computer Education, Sungkyunkwan University)
  • Received : 2016.08.29
  • Accepted : 2017.02.09
  • Published : 2017.02.28

Abstract

Computer vision-based human activity recognition (HAR) has become very famous these days due to its applications in various fields such as smart home healthcare for elderly people. A video-based activity recognition system basically has many goals such as to react based on people's behavior that allows the systems to proactively assist them with their tasks. A novel approach is proposed in this work for depth video based human activity recognition using joint-based motion features of depth body shapes and Deep Belief Network (DBN). From depth video, different body parts of human activities are segmented first by means of a trained random forest. The motion features representing the magnitude and direction of each joint in next frame are extracted. Finally, the features are applied for training a DBN to be used for recognition later. The proposed HAR approach showed superior performance over conventional approaches on private and public datasets, indicating a prominent approach for practical applications in smartly controlled environments.

Keywords

References

  1. S.-S. Cho, A-Reum Lee, H.-I. Suk, J.-S. Park, and S.-W. Lee, "Volumetric spatial feature representation for view invariant human action recognition using a depth camera," Optical Engineering, vol. 54(3), no. 033102, pp. 1-8, 2015.
  2. A. Jalal ., M.Z. Uddin, J.T. Kim, and T.S. Kim, "Recognition of human home activities via depth silhouettes and R transformation for smart homes," Indoor and Built Environment, vol. 21, no. 1, pp. 184-190, 2011. https://doi.org/10.1177/1420326X11423163
  3. M. Z. Uddin and T.-S. Kim, "Independent Shape Component-based Human Activity Recognition via Hidden Markov Model," Applied Intelligence, vol. 2, pp. 193-206, 2010.
  4. N. Robertson and I. Reid, "A General Method for Human Activity Recognition in Video," Computer Vision and Image Understanding, Vol. 104, No. 2. pp. 232-248, 2006. https://doi.org/10.1016/j.cviu.2006.07.006
  5. H. Kang, C. W. Lee, and K. Jung, "Recognition-based gesture spotting in video games," Pattern Recognition Letters, Vol. 25, pp. 1701-1714, 2004. https://doi.org/10.1016/j.patrec.2004.06.016
  6. F.S. Chen, C.M. Fu, and C.L. Huang, "Hand gesture recognition using a real-time tracking method and Hidden Markov Models," Image and Vision Computing, vol. 21. pp.745-758, 2005.
  7. J. Yamato, J. Ohya, and K. Ishii, "Recognizing human action in time-sequential images using hidden markov model," in Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 379-385, 1992.
  8. F. Niu and M. Abdel-Mottaleb, "View-invariant human activity recognition based on shape and motion Features," in Proc. of IEEE Sixth International Symposium on Multimedia Software Engineering, pp. 546-556, 2004.
  9. F. Niu and M. Abdel-Mottaleb, "HMM-based segmentation and recognition of human activities from video sequences," in Proc. of IEEE International Conference on Multimedia & Expo. , pp. 804-807, 2005.
  10. P. Simari, D. Nowrouzezahrai, E. Kalogerakis, and K. Singh, "Multi-objective shape segmentation and labeling," in Proc. of Eurographics Symposium on Geometry Processing, Vol. 28, pp. 1415-1425, 2009.
  11. V. Ferrari, M.-M. Jimenez, and A. Zisserman, "2D Human Pose Estimation in TV Shows," Visual Motion Analysis, LNCS 2009, Vol. 5604, pp. 128-147, 2009.
  12. H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, "A Full-Body Layered Deformable Model for Automatic Model-Based Gait Recognition," EURASIP Journal on Advances in Signal Processing, Vol. 1, pp. 1-13, 2008.
  13. J. Wright and G. Hua, "Implicit Elastic Matching with Random Projections for Pose-Variant face recognition," in Proc. of IEEE conf. on Computer Vision and Pattern Recognition, pp. 1502-1509, 2009.
  14. A. Bosch, A. Zisserman, and X. Munoz, "Image classification using random forests and ferns," in Proc. of IEEE Int. Conf. on Computer Vision , pp. 1-8, 2007.
  15. P. S. Aleksic and A. K. Katsaggelos, "Automatic facial expression recognition using facial animation parameters and multistream HMMs," IEEE Transaction on Information and Security, vol. 1, pp. 3-11, 2006. https://doi.org/10.1109/TIFS.2005.863510
  16. M. Z. Uddin and M. M. Hassan, "A Depth Video-Based Facial Expression Recognition System Using Radon Transform, Generalized Discriminant Analysis, and Hidden Markov Model," Multimedia Tools And Applications, Vol. 74, No. 11, pp. 3675-3690, 2015. https://doi.org/10.1007/s11042-013-1793-1
  17. W Li., Z. Zhang, and. Z. Liu, "Action recognition based on a bag of 3d points," in Proc. of Workshop on Human Activity Understanding from 3D Data, pp. 9-14, 2010.
  18. O. Oreifej and Z. Liu, "Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 716-723, 2013.
  19. X. Yang, C. Zhang, and Y. Tian, "Recognizing actions using depth motion mapsbased histograms of oriented gradients," in Proc. of ACM International Conference on Multimedia, pp. 1057-1060, 2012.
  20. H.S. Koppula, R. Gupta, and A. Saxena, "Learning human activities and object affordances from RGB-D videos," International Journal of Robotics Research, vol. 32, no. 8, pp. 951-970, 2013. https://doi.org/10.1177/0278364913478446
  21. J. Sung, C. Ponce, B. Selman, and A. Saxena, "Unstructured human activity detection from rgbd images," in Proc. of IEEE International Conference on Robotics and Automation, pp. 842-849, 2012.
  22. H. Hamer, K. Schindler, E. Koller-Meier, and L. Van Gool, "Tracking a hand manipulating an object," in Proc. of IEEE International Conference on Computer Vision, pp. 1475-1482, 2009.
  23. P. Dreuw, H. Ney, G. Martinez, O. Crasborn, J. Piater, J.M. Moya, and M. Wheatley, "The signspeak project - bridging the gap between signers and speakers," in Proc. of International Conference on Language Resources and Evaluation, pp. 476-481, 2010.
  24. M.Z. Uddin, N.D. Thang, and T.S. Kim, "Human Activity Recognition Using Body Joint Angle Features and Hidden Markov Model," ETRI Journal, pp. 569-579, 2011.
  25. G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.R. Mohamed, N. Jaitly, A. Vanhoucke, Nguyen, P., T.N. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition," IEEE Signal Processing Magazine, vol. 29, no. 6, pp.82-97, 2012. https://doi.org/10.1109/MSP.2012.2205597
  26. G. E. Hinton, S. Osindero, Y. The, "A fast learning algorithm for deep belief nets," Neural Computation, vol. 18, pp. 1527-1554, 2006. https://doi.org/10.1162/neco.2006.18.7.1527
  27. S. Izadi, "KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera," in Proc. of ACM User Interface and Software Technologies, pp. 559-568, 2011.
  28. Y. m. Song, S. Noh, J. Yu, C. w. Park, B. g. Lee, "Background subtraction based on Gaussian mixture models using color and depth information," in Proc. of International Conference on Control, Automation and Information Sciences (ICCAIS), pp. 132-135, 2014.
  29. J. Wang, Z. Liu, Y. Wu, and J. Yuan, "Mining actionlet ensemble for action recognition with depth cameras," in Proc. of 2012 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 1290-1297, 2012.
  30. S. Fothergill, H. Mentis, P. Kohli, and S. Nowozin, "Instructing people for training gestural interactive systems," in Proc. of ACM Conference on Human Factors in Computing Systems, pp. 1737-1746, 2012.
  31. J. Wang, Z. Liu, Y. Wu , and Junsong Yuan, "Mining actionlet ensemble for action recognition with depth cameras," in Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 1290-1297, IEEE, Providence, 2012.
  32. P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior recognition via sparse spatio-temporal features," in Proc. of 2nd Joint IEEE Int. Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65-72, IEEE, Washington, 2005.
  33. I. Laptev, R. Rennes, M. Marszalek, C. Schmid, and B. Rozenfeld, "Learning realistic human actions from movies," in Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1-8, IEEE, Anchorage, 2008.
  34. X. Lu and J. Aggarwal, "Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera," in Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 2834-2841, IEEE, Portland , 2013.
  35. S. Yang, C. Yuan, W. Hu, and X. Ding, "A hierarchical model based on latent dirichlet allocation for action recognition," in Proc. of IEEE 22nd International Conference on In Pattern Recognition (ICPR), pp. 2613-2618. 2014.
  36. L. Zhou, W. Li, Y. Zhang, P. Ogunbona, D. T. Nguyen, and H. Zhang, "Discriminative key pose extraction using extended lc-ksvd for action recognition," in Proc. of International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp. 1-8. IEEE, 2014.
  37. M. E. Hussein, M. Torki, M. A. Gowayyed, and M. ElSaban, "Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations," in Proc. of International Joint Conference on Artificial Intelligence (IJCAI), pp. 2466-2472, 2013.

Cited by

  1. Multi-input 1-dimensional deep belief network: action and activity recognition as case study vol.78, pp.13, 2019, https://doi.org/10.1007/s11042-018-7076-0
  2. Sign Language Translation Using Deep Convolutional Neural Networks vol.14, pp.2, 2017, https://doi.org/10.3837/tiis.2020.02.009