DOI QR코드

DOI QR Code

A Tree Regularized Classifier-Exploiting Hierarchical Structure Information in Feature Vector for Human Action Recognition

  • Luo, Huiwu (National Key Laboratory of Automatic Target Recognition (ATR), School of Electronic Science and Engineering, National University of Defense Technology) ;
  • Zhao, Fei (National Key Laboratory of Automatic Target Recognition (ATR), School of Electronic Science and Engineering, National University of Defense Technology) ;
  • Chen, Shangfeng (National Key Laboratory of Automatic Target Recognition (ATR), School of Electronic Science and Engineering, National University of Defense Technology) ;
  • Lu, Huanzhang (National Key Laboratory of Automatic Target Recognition (ATR), School of Electronic Science and Engineering, National University of Defense Technology)
  • Received : 2016.10.03
  • Accepted : 2016.12.27
  • Published : 2017.03.31

Abstract

Bag of visual words is a popular model in human action recognition, but usually suffers from loss of spatial and temporal configuration information of local features, and large quantization error in its feature coding procedure. In this paper, to overcome the two deficiencies, we combine sparse coding with spatio-temporal pyramid for human action recognition, and regard this method as the baseline. More importantly, which is also the focus of this paper, we find that there is a hierarchical structure in feature vector constructed by the baseline method. To exploit the hierarchical structure information for better recognition accuracy, we propose a tree regularized classifier to convey the hierarchical structure information. The main contributions of this paper can be summarized as: first, we introduce a tree regularized classifier to encode the hierarchical structure information in feature vector for human action recognition. Second, we present an optimization algorithm to learn the parameters of the proposed classifier. Third, the performance of the proposed classifier is evaluated on YouTube, Hollywood2, and UCF50 datasets, the experimental results show that the proposed tree regularized classifier obtains better performance than SVM and other popular classifiers, and achieves promising results on the three datasets.

Keywords

References

  1. Wang X, Wang. L and Qiao. Y, "A comparative study of encoding, pooling and normalization methods for action recognition," in Proc. of Asian Conference on Computer Vision, 2012.
  2. K. Yu, J. Yang, Y. Gong, "Linear Spatial Pyramid Matching Using Sparse Coding," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2009.
  3. J.C. van Gemert, C.J. Veenman, A.W.M. Smeulders and J.M. Geusebroek, "Visual word ambiguity," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 7, pp.1271-1283, 2010. https://doi.org/10.1109/TPAMI.2009.132
  4. J. Wang, J. Yang, K. Yu, F. Lv and T. Huang, "Locality-constrained Linear Coding for Image Classification," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2010.
  5. J. Sanchez, F. Perronnin, T. Mensink and J. Verbeek, "Image Classification with the Fisher Vector: Theory and Practice," International Journal of Computer Vision, vol. 105, no. 3, pp. 222-245, 2013. https://doi.org/10.1007/s11263-013-0636-x
  6. A. Kovashka, K. Grauman, "Learning a hierarchy of discriminative space-time neighborhood features for human action recognition," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2010.
  7. J. Wang, Z. Chen, Y. Wu, "Action recognition with multiscale spatio-temporal contexts," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2011.
  8. H. Wang, C. Yuan, W. Hu, H. Ling, W. Yang, and C. Sun, "Action Recognition Using Nonnegative Action Component Representation and Sparse Basis Selection," IEEE Transaction on Image Processing, vol. 23, pp. 570-581, Feb. 2014. https://doi.org/10.1109/TIP.2013.2292550
  9. H.Wang, A.Kläser, C.Schmid, C.Liu, "Dense trajectories and motion boundary descriptors for action recognition," International Journal of Computer Vision, vol. 103, no. 1, pp. 60-79, 2013. https://doi.org/10.1007/s11263-012-0594-8
  10. I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, "Learning realistic human actions from movies," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2008.
  11. M. Ullah, SN. Parizi, I. Laptev, "Improving bag-of features action recognition with non-local cues," in Proc. of British Machine Vision Conference, 2010.
  12. S. Lazebnik, C. Schmid, J. Ponce, "Beyond bags of features: Spatio-temporal pyramid matching for recognizing natural scene categories," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2006.
  13. A. F. T. Martins, D. Yogatama, N.A. Smith and M. A. T. Figueiredo, "Structured Sparsity in Natural Language Processing: Models, Algorithms, and Applications," in Proc. of the European Chapter of the Association for Computational Linguistics: Tutorials, 2014.
  14. M. Yuan, Y. Lin, "Model selection and estimation in regression with grouped variables," Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 1, pp. 49-67, 2006. https://doi.org/10.1111/j.1467-9868.2005.00532.x
  15. P. Zhao, G. Rocha, B. Yu, "The composite absolute penalties family for grouped and hierarchical variable selection," The Annals of Statistics, vol. 37, no. 6A, pp. 3468-3497, 2009. https://doi.org/10.1214/07-AOS584
  16. Yogatama. D, Smith. N. A, "Linguistic structured sparsity in text categorization," in Proc. of the Annual Meeting of the Association for Computational Linguistics, 2014.
  17. Yogatama. D, Smith. N. A, "Making the most of bag of words: Sentence regularization with alternating direction method of multipliers," in Proc. of the 31st International Conference on Machine Learning, 2014.
  18. L. Yan, W. Li, G. Xue, and D. Han, "Coupled Group Lasso for Web-Scale CTR Prediction in Display Advertising," in Proc. of the 31st International Conference on Machine Learning, 2014.
  19. W. Deng, W. Yin, Y. Zhang, "Group sparse optimization by alternating direction method," in Proc. of SPIE Optical Engineering+ Applications. International Society for Optics and Photonics, 2013.
  20. N. Parikh, S. Boyd, "Proximal algorithms," Foundations and Trends in optimization, vol. 1, no. 3, pp. 123-231, 2013.
  21. Bach F, Jenatton R, Mairal J, Obozinski. G, "Optimization with sparsity-inducing penalties," Foundations and Trends in Machine Learning, vol, 1,no. 4, pp. 1-106, 2012.
  22. Jenatton R, Mairal J, Obozinski G, Bach. F, "Proximal methods for hierarchical sparse coding," The Journal of Machine Learning Research, vol. 1,no. 12, pp. 2297-2334, 2011.
  23. Qin Z, Goldfarb D, "Structured sparsity via alternating direction methods," The Journal of Machine Learning Research, vol.1, no. 13, pp. 1435-1468, 2012.
  24. S. Boyd, N. Parikh, E. Chu, B. Peleato, "Distributed optimization and statistical learning via the alternating direction method of multipliers," Foundations and Trends in Machine Learning, vol.3 ,no. 1, pp. 1-122, 2011. https://doi.org/10.1561/2200000016
  25. M. Marszalek, I. Laptev, and C. Schmid, "Actions in context," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2009.
  26. J. Liu, J. Luo and M. Shah, "Recognizing realistic actions from videos "in the wild"," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2009.
  27. J. Platt, "Fast training of support vector machines using sequential minimal optimization," Advances in kernel methods-support vector learning, vol. 3, no. 1, pp. 32-37, 1999.
  28. C. Chang and C. Lin, "LIBSVM : a library for support vector machines," ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, pp. 1-27, 2011.
  29. Duda. R, Hart. P, Stork. D, Pattern classification, 2nd. Ed, John Wiley & Sons, New York, 2012.
  30. Z. Lu and Y. Peng, "Latent semantic learning with structured sparse representation for human action recognition," Pattern Recognition, vol. 46, no. 7, pp. 1799-1809, 2013. https://doi.org/10.1016/j.patcog.2012.09.027
  31. L. Liu, L. Shao, X. Li and K. Lu, "Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach," IEEE Transactions on Cybernetics, vol. 46, no. 1, pp. 158-170. 2016. https://doi.org/10.1109/TCYB.2015.2399172
  32. Kishore K. Reddy, and Mubarak Shah, "Recognizing 50 Human Action Categories of Web Videos," Machine Vision and Applications, vol. 24, no. 5, pp. 971-987. 2013. https://doi.org/10.1007/s00138-012-0450-4
  33. Y G. Jiang, Dai Q, Liu W, X Y Xue, and C W .NGO, "Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling," IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 3781-3795. 2015. https://doi.org/10.1109/TIP.2015.2456412
  34. C. Beaudry, R. Péteri, and L Mascarilla, "An efficient and sparse approach for large scale human action recognition in videos," Machine Vision and Applications, vol. 27, no. 4, pp. 529-543. 2016. https://doi.org/10.1007/s00138-016-0760-z
  35. C. Liu, J. Liu, Z. He, Y. Zhai, Q. Hu, and Y Huang, "Convolutional neural random fields for action recognition," Pattern Recognition, vol. 59, pp. 213-224. 2016. https://doi.org/10.1016/j.patcog.2016.03.019
  36. M. Ranzato, F. Huang, Y. Boureau, and Y. LeCun, "Unsupervised learning of invariant feature hierarchies with applications to object recognition," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2007.
  37. G. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006. https://doi.org/10.1162/neco.2006.18.7.1527
  38. J. Mairal, F. Bach and J. Ponce, "Sparse Modeling for Image and Vision Processing," Foundations and Trends in Computer Graphics and Vision, vol. 8, no.2-3, pp. 85-283, 2014. https://doi.org/10.1561/0600000058
  39. J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, "Robust face recognition via sparse representation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no.2, pp:210-227, 2009. https://doi.org/10.1109/TPAMI.2008.79
  40. L. Zhang, M. Yang, and X. Feng, "Sparse representation or collaborative representation: Which helps face recognition?," in Proc. of International Conference on Computer Vision, pp: 471-478, 2011.
  41. R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin, "LIBLINEAR: A library for large linear classification," Journal of machine learning research, vol.9 no.8, pp:1871-1874, 2008.
  42. L. Wang, Y. Qiao, and X. Tang, "MoFAP: A Multi-Level Representation for Action Recognition," International Journal of Computer Vision, vol 119, no.3, pp.254-271, 2016. https://doi.org/10.1007/s11263-015-0859-0
  43. S. J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, "A interior-point method for large-scale l1-regularized least squares," IEEE Journal on Selected Topics in Signal Processing, vol 1, no.4, pp: 606-617, 2007. https://doi.org/10.1109/JSTSP.2007.910971
  44. K. Simonyan, A. Zisserman, "Two-stream convolutional networks for action recognition in videos," Advances in Neural Information Processing Systems, 2014.
  45. O. Kihl, D. Picard, and P.H. Gosselin, "Local polynomial space-time descriptors for action classification," Machine Vision and Applications, vol.27, no.3, pp: 351-361, 2016. https://doi.org/10.1007/s00138-014-0652-z