DOI QR코드

DOI QR Code

Improvement of Accuracy for Human Action Recognition by Histogram of Changing Points and Average Speed Descriptors

  • Vu, Thi Ly (School of Information and Communication Engineering, Inha University) ;
  • Do, Trung Dung (School of Information and Communication Engineering, Inha University) ;
  • Jin, Cheng-Bin (School of Information and Communication Engineering, Inha University) ;
  • Li, Shengzhe (School of Information and Communication Engineering, Inha University) ;
  • Nguyen, Van Huan (School of Information and Communication Engineering, Inha University) ;
  • Kim, Hakil (School of Information and Communication Engineering, Inha University) ;
  • Lee, Chongho (School of Information and Communication Engineering, Inha University)
  • Received : 2014.12.09
  • Accepted : 2015.03.09
  • Published : 2015.03.30

Abstract

Human action recognition has become an important research topic in computer vision area recently due to many applications in the real world, such as video surveillance, video retrieval, video analysis, and human-computer interaction. The goal of this paper is to evaluate descriptors which have recently been used in action recognition, namely Histogram of Oriented Gradient (HOG) and Histogram of Optical Flow (HOF). This paper also proposes new descriptors to represent the change of points within each part of a human body, caused by actions named as Histogram of Changing Points (HCP) and so-called Average Speed (AS) which measures the average speed of actions. The descriptors are combined to build a strong descriptor to represent human actions by modeling the information about appearance, local motion, and changes on each part of the body, as well as motion speed. The effectiveness of these new descriptors is evaluated in the experiments on KTH and Hollywood datasets.

Keywords

References

  1. H. Wang, A. Klaser, C. Schmid, and C. Liu, "Dense trajectories and motion boundary descriptors for action recognition," International Journal of Computer Vision, vol. 103, no. 1, pp. 60-79, 2013. https://doi.org/10.1007/s11263-012-0594-8
  2. L. Nanni, S. Brahnam, and A. Lumini, "Local ternary patterns from three orthogonal planes for human action classification," Expert Systems with Applications, vol. 38, no. 5, pp. 5125-5128, 2011. https://doi.org/10.1016/j.eswa.2010.09.137
  3. R. Mattivi and L. Shao, "Human action recognition using LBP-TOP as sparse spatio-temporal feature descriptor," in Proceedings of the 13th International Conference on Computer Analysis Images and Patterns (CAIP), Munster, Germany, 2009, pp. 740-747.
  4. I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, "Learning realistic human actions from movies," in Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, 2008, pp. 1-8.
  5. M. Jain, H. Jegou, and P. Bouthemy, "Better exploiting motion for better action recognition," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, 2013, pp. 2555-2562.
  6. A. H. Shabani, D. A. Clausi, and J. S. Zelek, "Improved Spatio-temporal Salient feature detection for action recognition," in Proceedings of British Machine Vision Conference (BMVC), Dundee, UK, 2011, pp. 1-12.
  7. I. Laptev, "On space-time interest points," International Journal of Computer Vision, vol. 64, no. 2-3, pp. 107-123, 2005. https://doi.org/10.1007/s11263-005-1838-7
  8. P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, "Behavior recognition via sparse spatio-temporal features," in Proceedings of the 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS), Beijing, China, 2005, pp. 65-72.
  9. H. Jhuang, T. Serre, L. Wolf, and T. Poggio, "A biologically inspired system for action recognition," in Proceedings of IEEE 11th International Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil, 2007, pp. 1-8.
  10. K. Y. K. Wong and R. Cipolla, "Extracting spatio-temporal interest points using global information," in Proceedings of IEEE 11th International Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil, 2007, pp. 1-8.
  11. G. Willems, T. Tuytelaars, and L. Van Gool, "An efficient dense and scale-invariant spatio-temporal interest point detector," in Proceedings of 10th European Conference on Computer Vision (ECCV2008), Marseille, France, 2008, pp. 650-663.
  12. P. Scovanner, S. Ali, and M. Shah, "A 3-dimensional SIFT descriptor and its application to action recognition," in Proceedings of the 15th ACM International Conference on Multimedia, Augsburg, Germany, 2007, pp. 357-360.
  13. A. Klaser, M. Marsza ek, and C. Schmid, "A spatio-temporal descriptor based on 3D-gradients," in Proceedings of British Machine Vision Conference (BMVC), Leeds, UK, 2008, p. 1-10.
  14. T. Ahonen. A. Hadid, and M. Pictikainen, "Face recognition with local binary patterns," in Proceedings of the 8th European Conference on Computer Vision (ECCV2004), Prague, Czech Republic, 2004, pp. 469-481.
  15. P. Matikainen, M. Hebert, and R. Sukthankar, "Trajectons: action recognition through the motion analysis of tracked features," in Proceedings of IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshop), Kyoto, Japan, 2009, pp, 514-521.
  16. R. Messing, C. Pal, and H. Kautz, "Activity recognition using the velocity histories of tracked key points," in Proceedings of IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 2009, pp. 104-111.
  17. J. Sun, X. Wu, S. Yan, L. F. Cheong, T. S. Chua, and J. Li, "Hierarchical spatio-temporal context modeling for action recognition," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, 2009, pp. 2004-2011.
  18. L. Fei-Fei, and P. Perona, "A Bayesian hierarchical model for learning natural scene categories," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, 2005, pp. 524-531.
  19. H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid, "Evaluation of local spatio-temporal features for action recognition," in Proceedings of British Machine Vision Conference (BMVC), London, UK, 2009, pp. 1-11.
  20. J. M.Odobez and P. Bouthemy, "Robust multiresolution estimation of parametric motion models," Journal of Visual Communication and Image Representation, vol. 6, no. 4, pp. 348-365, 1995. https://doi.org/10.1006/jvci.1995.1029
  21. M. Eichner and V. Ferrari, "Calvin upper-body detector v1.04," http://groups.inf.ed.ac.uk/calvin/calvin_upperbody_detector/.
  22. N. Deng, Y. Tian, and C. Zhang, Support Vector Machines-Optimization Based Theory, Algorithms, and Extensions, Boca Raton, FL: CRC Press, 2013.
  23. J. Shi and C. Tomasi, "Good features to track," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, 2005, pp. 593-600.
  24. G. Farneback, "Two-frame motion estimation based on polynomial expansion," in Proceedings of the 13th Scandinavian Conference on Image Analysis (SCIA), Halmstad, Sweden, 2003, pp. 363-370.
  25. J. S. Perez, E. Meinhardt-Llopis, and G. Facciolo, "TV-L1 optical flow estimation," Image Processing On Line, vol. 3, pp. 137-150, 2013. https://doi.org/10.5201/ipol.2013.26
  26. Q. V. Le, W. Y. Zou, S. Y. Yeung, and A. Y. Ng, "Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, 2011, pp. 3361-3368.
  27. Recognition of human actions, "http://www.nada.kth.se/cvap/actions/."