DOI QR코드

DOI QR Code

Human Action Recognition in Still Image Using Weighted Bag-of-Features and Ensemble Decision Trees

가중치 기반 Bag-of-Feature와 앙상블 결정 트리를 이용한 정지 영상에서의 인간 행동 인식

  • 홍준혁 (계명대학교 컴퓨터공학과 멀티미디어통신 연구실) ;
  • 고병철 (계명대학교 컴퓨터공학과 컴퓨터비전&패턴인식 연구실) ;
  • 남재열 (컴퓨터공학과 멀티미디어통신 연구실)
  • Received : 2012.11.13
  • Accepted : 2012.12.24
  • Published : 2013.01.31

Abstract

This paper propose a human action recognition method that uses bag-of-features (BoF) based on CS-LBP (center-symmetric local binary pattern) and a spatial pyramid in addition to the random forest classifier. To construct the BoF, an image divided into dense regular grids and extract from each patch. A code word which is a visual vocabulary, is formed by k-means clustering of a random subset of patches. For enhanced action discrimination, local BoF histogram from three subdivided levels of a spatial pyramid is estimated, and a weighted BoF histogram is generated by concatenating the local histograms. For action classification, a random forest, which is an ensemble of decision trees, is built to model the distribution of each action class. The random forest combined with the weighted BoF histogram is successfully applied to Standford Action 40 including various human action images, and its classification performance is better than that of other methods. Furthermore, the proposed method allows action recognition to be performed in near real-time.

본 논문에서는 CS-LBP (Center-Symmetric Local Binary Pattern) 특징과 공간 피라미드를 이용한 BoF (Bag of Features)를 생성하고 이를 랜덤 포레스트(Random Forest) 분류기에 적용하여 인간의 행동을 인식하는 알고리즘을 제안한다. BoF를 생성하기 위해 영상을 균일한 패치로 나누고, 각 패치 마다 CS-LBP 특징을 추출한다. 행동 분류 성능을 향상시키기 위해 패치들마다 추출한 특징벡터들에 대해 K-mean 클러스터링을 적용하여 코드 북을 생성한다. 본 논문에서는 영상의 지역적인 특성을 고려하기 위해 공간 피라미드 방법을 적용하고 각 공간 레벨에서 추출된 BoF에 대해 가중치를 적용하여 최종적으로 하나의 특징 벡터로 결합한다. 행동 분류를 위해 결정트리의 앙상블로 이루어진 랜덤 포레스트는 학습 단계에서 각 행동 클래스를 위한 분류 모델을 만든다. 가중 BoF가 적용된 랜덤 포레스트는 다양한 인간 행동 영상을 포함하고 있는 Standford Actions 40 데이터를 성공적으로 분류하였다. 또한 기존 방법에 비해 분류 성능이 유사하거나 우수하며, 한 장의 영상에 대해 빠른 인식속도를 보였다.

Keywords

References

  1. V. Delaitre, I. Laptev, and J. Sivic. "Recognizing human action in still images: a study of bag-of-features and partial-based representations," in Proc. British Machine Vision Conf., pp. 1-11, Wales, UK, Sep. 2010.
  2. B. Yao, X. Jiang, A. Khosla, A. L. Lin, L. Guibas, and L. Fei-Fei. "Human action recognition by learning bases of action attributes and parts," in Proc. Int. Conf. on Computer Vision, pp. 1331-1338, Barcelona, Spain, Nov. 2011
  3. S. Maji, L. Bourdev, and J. Malik. "Action recognition from a distributed representation of pose and appearance," in Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition, pp. 3177-3184, Providence, RI, Jun. 2011.
  4. N. Ikizler, R.G. Cinbis, S. Pehlivan, and P. Duygulu. "Recognizing actions from still images," in Proc. Int. Conf. of Pattern Recognition, pp. 1-4, Tampa, Florida, Dec. 2008
  5. C. Thurau and V. Hlavac. "Pose primitive based human action recognition in videos or still images," in Proc. IEEE Int. Conf. on Pattern Recognition, pp. 1-8, Tampa, Florida, Dec. 2008
  6. W. Yang, Y. Wang, and G. Mori. "Recognizing human actions from still images with latent poses," in Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition, pp. 2030-2037, San Francisco, USA, Jun. 2010.
  7. S. Lazebnik, C. Schmid, and J. Ponce, "Beyond bags of features: spatial pyramid matching for recognizing natural scene categories," in Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition, pp. 2169-2178, NY, USA, Jun. 2006.
  8. M. Heikkilä, M. Pietikäinen, and C. Schmid, "Description of interest regions with local binary patterns," Pattern Recogn., vol. 42, no. 3, pp. 425-436, Mar. 2009. https://doi.org/10.1016/j.patcog.2008.08.014
  9. Y. G. Jiang C. W. Ngo, and J. Yang, "Towards optimal bag-of-features for object categorization and semantic video retrieval," in Proc. ACM Int. Conf. on Image and Video Retrieval, pp. 494-501, Amsterdam, Netherlands, Jul. 2007.
  10. B. C. Ko, J. Y. Kwak, and J. Y. Nam, "Object tracking using particle filters in moving camera," J. KICS, vol. 37A, no. 5, pp. 35-40, May 2012.
  11. L. Breiman. "Random forests," Mach. Learn., vol. 45, no. 1, pp. 5-32, Oct. 2001 https://doi.org/10.1023/A:1010933404324
  12. B. C. Ko, S. H. Kim, and J. Y. Nam, "X-ray image classification using random forests with local wavelet-based CS-local binary patterns," J. Digit. Imaging, vol. 24, no. 16, pp. 1141-1151, Oct. 2011 https://doi.org/10.1007/s10278-011-9380-3
  13. L. Bourdev and J. Malik, "Poselets: bady part detectors trained using 3d human pose annotations," in Proc. European Conf. on Computer Vision, pp. 3178-3179, Kyoto, Japan, Sep. 2009
  14. J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, "Locality-constrained linear coding for image classification," in Proc. IEEE Int. Conf. on Pattern Recognition, pp. 3360-3367, Istanbul, Turkey, Aug. 2010.