Deep Convolutional Neural Networks를 이용한 객체 검출 성능의 발전 동향

  • Published : 2017.01.30

Abstract

새로운 영상 미디어 서비스 기술의 발전으로 인해 다양한 영상 인식 기술이 요구되고 있으며, 특히 영상으로부터 특정 객체를 검출하는 기술은 객체와 관련된 광고나 서비스 등의 다양한 활용 분야를 창출하는 핵심 기술이다. 객체 검출 기술이 방송미디어 기술에 적극적으로 활용되기 위해서는 빠르면서도 정확한 성능을 가진 알고리즘 개발이 필수적이다. 본 논문에서는 전통적인 객체 검출 방법들에 비해 우수한 성능을 가지는 Deep Convolutional Neural Networks 기반 객체 검출 방법들을 분석한다. 최근에 소개된 주요 객체 검출 방법들의 연구 배경과 발전 동향을 소개하고, 각 방법의 핵심 알고리즘 및 장단점에 대해 분석한다. 또한 객체 검출의 성능을 평가하기 위해 사용되는 대표적인 데이터셋을 소개하고, 다양한 네트워크 구조/크기 및 학습 데이터 등의 관점에서 각 방법들의 성능을 비교한다. 마지막으로 기존의 객체 검출 방법들을 분석한 내용을 바탕으로 향후 객체 검출 방법들의 발전 방향 및 활용 가능성을 예측해보고자 한다.

Keywords

References

  1. C. Szegedy, A. Toshev, and D. Erhan. Deep neural networks for object detection. In NIPS, 2013.
  2. H. A. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. TPAMI, 1998.
  3. P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. LeCun. Pedestrian detection with unsupervised multi-stage feature learning. In CVPR, 2013.
  4. R. Vaillant, C. Monrocq, and Y. LeCun. One approach for the localization of objects in images. IEEE Proc on Vision, Image, and Signal Processing, 1994.
  5. C. Gu, J. J. Lim, P. Arbelaez, and J. Malik. Recognition using regions. In CVPR, 2009.
  6. J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object recognition. IJCV, 2013.
  7. J. Carreira and C. Sminchisescu. CPMC: Automatic object segmentation using constrained parametric min-cuts. TPAMI, 2012.
  8. R. Girshick, J. Donahue, T. Darrel, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
  9. P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. OverFeat: Integrated recognition, localization and detection using convolutional networks. In ICLR, 2014.
  10. K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV, 2014.
  11. R. Girshick. Fast R-CNN. In ICCV, 2015.
  12. S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015.
  13. A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, 2012.
  14. Dalal and Triggs, Histograms of Oriented Gradients for Human Detection. In CVPR 2005.
  15. J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object recognition. In IJCV, 2013.
  16. B. Alexe, T. Deselaers, and V. Ferrari. Measuring the objectness of image windows. TPAMI, 2012.
  17. C. L. Zitnick and P. Dollar. Edge boxes: Locating object proposals from edges. In ECCV, 2014.
  18. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proc. Of the IEEE, 1998.
  19. K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4):193-202, 1980. https://doi.org/10.1007/BF00344251
  20. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, A. Rabinovich. Going deeper with convolutions. Technical report, 2014.
  21. M. D. Zeiler, R. Fergus. Visualizing and understanding convolutional networks. CoRR, 2013.
  22. K. Simonyan, A. Vedaldi, A. Zisserman. Deep fisher networks for large-scale image classification. In NIPS, 2013.
  23. K. He, X. Zhang, S. Ren, J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
  24. D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 2004.
  25. P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. TPAMI, 2010.
  26. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
  27. J. Hosang, M. Omran, R. Benenson, and B. Schiele. Taking a deeper look at pedestrians. In CVPR, 2015.
  28. M. Everingham, S. M. Ali Eslami, L. V. Gool, C. K. I. Williams, J. Winn, A. Zisserman. The PASCAL visual object classes challenge: a retrospective. IJCV, 2015.
  29. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, L. Fei-Fei. ImageNet large scale visual recognition challenge. IJCV, 2015.
  30. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollr, C. L. Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014.
  31. R. Girshick, P. Felzenszwalb, D. McAllester. Discriminatively trained deformable part models, release 5. In http://people.cs.uchicago.edu/rbg/latent-release5/.
  32. K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In BMVC, 2014.
  33. K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
  34. J. Redmon, S. Divvala, R. Girshick, A. Farhadi. You Only Look Once: unified, real-time object detection. In CVPR, 2016.