DOI QR코드

DOI QR Code

Object Tracking Method using Deep Learning and Kalman Filter

딥 러닝 및 칼만 필터를 이용한 객체 추적 방법

  • 김기철 (한밭대학교 정보통신전문대학원 멀티미디어공학과) ;
  • 손소희 (한밭대학교 정보통신전문대학원 멀티미디어공학과) ;
  • 김민섭 (한밭대학교 정보통신전문대학원 멀티미디어공학과) ;
  • 전진우 (한국전자통신연구원) ;
  • 이인재 (한국전자통신연구원) ;
  • 차지훈 (한국전자통신연구원) ;
  • 최해철 (한밭대학교 정보통신전문대학원 멀티미디어공학과)
  • Received : 2019.04.15
  • Accepted : 2019.05.24
  • Published : 2019.05.30

Abstract

Typical algorithms of deep learning include CNN(Convolutional Neural Networks), which are mainly used for image recognition, and RNN(Recurrent Neural Networks), which are used mainly for speech recognition and natural language processing. Among them, CNN is able to learn from filters that generate feature maps with algorithms that automatically learn features from data, making it mainstream with excellent performance in image recognition. Since then, various algorithms such as R-CNN and others have appeared in object detection to improve performance of CNN, and algorithms such as YOLO(You Only Look Once) and SSD(Single Shot Multi-box Detector) have been proposed recently. However, since these deep learning-based detection algorithms determine the success of the detection in the still images, stable object tracking and detection in the video requires separate tracking capabilities. Therefore, this paper proposes a method of combining Kalman filters into deep learning-based detection networks for improved object tracking and detection performance in the video. The detection network used YOLO v2, which is capable of real-time processing, and the proposed method resulted in 7.7% IoU performance improvement over the existing YOLO v2 network and 20 fps processing speed in FHD images.

딥 러닝의 대표 알고리즘에는 영상 인식에 주로 사용되는 CNN(Convolutional Neural Networks), 음성인식 및 자연어 처리에 주로 사용되는 RNN(Recurrent Neural Networks) 등이 있다. 이 중 CNN은 데이터로부터 자동으로 특징을 학습하는 알고리즘으로 특징 맵을 생성하는 필터까지 학습할 수 있어 영상 인식 분야에서 우수한 성능을 보이면서 주류를 이루게 되었다. 이후, 객체 탐지 분야에서는 CNN의 성능을 향상하고자 R-CNN 등 다양한 알고리즘이 등장하였으며, 최근에는 검출 속도 향상을 위해 YOLO(You Only Look Once), SSD(Single Shot Multi-box Detector) 등의 알고리즘이 제안되고 있다. 하지만 이러한 딥러닝 기반 탐지 네트워크는 정지 영상에서 탐지의 성공 여부를 결정하기 때문에 동영상에서의 안정적인 객체 추적 및 탐지를 위해서는 별도의 추적 기능이 필요하다. 따라서 본 논문에서는 동영상에서의 객체 추적 및 탐지 성능 향상을 위해 딥 러닝 기반 탐지 네트워크에 칼만 필터를 결합한 방법을 제안한다. 탐지 네트워크는 실시간 처리가 가능한 YOLO v2를 이용하였으며, 실험 결과 제안한 방법은 기존 YOLO v2 네트워크에 비교하여 7.7%의 IoU 성능 향상 결과를 보였고 FHD 영상에서 20 fps의 처리 속도를 보였다.

Keywords

BSGHC3_2019_v24n3_495_f0001.png 이미지

그림 1. 칼만 필터의 흐름도 Fig. 1. Overall flowchart of Kalman filter

BSGHC3_2019_v24n3_495_f0002.png 이미지

그림 2. 시스템 전체 흐름도 Fig. 2. System flowchart

BSGHC3_2019_v24n3_495_f0003.png 이미지

그림 3. 제안 방법의 흐름도 Fig. 3. Flowchart of proposed method

BSGHC3_2019_v24n3_495_f0004.png 이미지

그림 4. 다양한 배경의 학습 데이터 Fig. 4. Examples of training data

BSGHC3_2019_v24n3_495_f0005.png 이미지

그림 5. IoU 계산 및 다양한 IoU 값에 대한 평가 Fig. 5. Calculation of IoU and evaluation of the various IoU values

BSGHC3_2019_v24n3_495_f0006.png 이미지

그림 6. YOLO v2 네트워크 IoU 값 그래프 Fig. 6. IoU results of YOLO v2 network

BSGHC3_2019_v24n3_495_f0007.png 이미지

그림 7. YOLO v2 네트워크와 제안 방법의 IoU 값 비교 그래프 Fig. 7. Comparison of proposed method and YOLO v2 with IoU results

BSGHC3_2019_v24n3_495_f0008.png 이미지

그림 8. YOLO v2의 탐지 실패 결과와 제안 방법의 추적 성공 결과의 예 Fig. 8. Examples of detection failure (YOLO v2) and tracking success (Proposed method)

표 1. 탐지 시스템 네트워크 구성 Table 1. The detection system network

BSGHC3_2019_v24n3_495_t0001.png 이미지

References

  1. Teal Group, 2014 Market Profile and Forecast, World Unmanned aerial Vehicle Systems, 2014
  2. Choi Youngchul, Ahn Hyosung. (2015). Dron's current and technology development trends and prospects. The world of electricity, 64(12), 20-25.
  3. Eric N. Johnson, Anthony J. Calise, Yoko Watanabe, Jincheol Ha, and James C. Neidhoefer, 2007, "Real-Time Vision-Based Relative Aircraft Navigation," Journal of Aerospace Computing, Information, and Communication, Vol.4, pp.707-738 https://doi.org/10.2514/1.23410
  4. John Lai, Luis Mejias, and Jason J. Ford, 2011, "Airborne Vision-Based Collision-Detection System," Journal of Field Robotics, Vol.28, Issue 2, pp.137-157. https://doi.org/10.1002/rob.20359
  5. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
  6. Schmidhuber, Jurgen. "Deep learning in neural networks: An overview." Neural networks 61 (2015): 85-117. https://doi.org/10.1016/j.neunet.2014.09.003
  7. Gidaris, Spyros, and Nikos Komodakis. "Object detection via a multi-region and semantic segmentation-aware cnn model." Proceedings of the IEEE International Conference on Computer Vision. 2015.
  8. Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  9. Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  10. Brown, Robert Grover, and Patrick YC Hwang. Introduction to random signals and applied Kalman filtering. Vol. 3. New York: Wiley, 1992.
  11. Ristic, Branko, Sanjeev Arulampalam, and Neil Gordon. "Beyond the Kalman filter." IEEE Aerospace and Electronic Systems Magazine 19.7 (2004): 37-38.
  12. Haykin, Simon. Kalman filtering and neural networks. Vol. 47. John Wiley & Sons, 2004.
  13. Peterfreund, Natan. "Robust tracking of position and velocity with Kalman snakes." IEEE transactions on pattern analysis and machine intelligence 21.6 (1999): 564-569. https://doi.org/10.1109/34.771328
  14. LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436. https://doi.org/10.1038/nature14539
  15. Deng, Li, and Dong Yu. "Deep learning: methods and applications." Foundations and Trends(R) in Signal Processing 7.3-4 (2014): 197-387. https://doi.org/10.1561/2000000039
  16. Mayer-Schonberger, Viktor, and Kenneth Cukier. Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, 2013.
  17. Nair, Vinod, and Geoffrey E. Hinton. "Rectified linear units improve restricted boltzmann machines." Proceedings of the 27th international conference on machine learning (ICML-10). 2010.
  18. Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
  19. Lowe, David G. "Object recognition from local scale-invariant features." Computer vision, 1999. The proceedings of the seventh IEEE international conference on. Vol. 2. Ieee, 1999.
  20. Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE, 2005.
  21. Bouguet, Jean-Yves. "Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm." Intel Corporation 5.1-10 (2001): 4.
  22. Lienhart, Rainer, and Jochen Maydt. "An extended set of haar-like features for rapid object detection." Proceedings. International Conference on Image Processing. Vol. 1. IEEE, 2002.
  23. He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." IEEE transactions on pattern analysis and machine intelligence 37.9 (2015): 1904-1916. https://doi.org/10.1109/TPAMI.2015.2389824
  24. Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015.
  25. Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
  26. Liu, Wei, et al. "Ssd: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.
  27. Everingham, Mark, et al. "The pascal visual object classes (voc) challenge." International journal of computer vision 88.2 (2010): 303-338. https://doi.org/10.1007/s11263-009-0275-4
  28. Abadi, Martin, et al. "Tensorflow: A system for large-scale machine learning." 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 2016.
  29. Bradski, Gary, and Adrian Kaehler. Learning OpenCV: Computer vision with the OpenCV library." O'Reilly Media, Inc.", 2008.