DOI QR코드

DOI QR Code

Fingertip Detection through Atrous Convolution and Grad-CAM

Atrous Convolution과 Grad-CAM을 통한 손 끝 탐지

  • Received : 2019.11.05
  • Accepted : 2019.11.29
  • Published : 2019.12.01

Abstract

With the development of deep learning technology, research is being actively carried out on user-friendly interfaces that are suitable for use in virtual reality or augmented reality applications. To support the interface using the user's hands, this paper proposes a deep learning-based fingertip detection method to enable the tracking of fingertip coordinates to select virtual objects, or to write or draw in the air. After cutting the approximate part of the corresponding fingertip object from the input image with the Grad-CAM, and perform the convolution neural network with Atrous Convolution for the cut image to detect fingertip location. This method is simpler and easier to implement than existing object detection algorithms without requiring a pre-processing for annotating objects. To verify this method we implemented an air writing application and showed that the recognition rate of 81% and the speed of 76 ms were able to write smoothly without delay in the air, making it possible to utilize the application in real time.

딥러닝 기술의 발전으로 가상 현실이나 증강 현실 응용에서 사용하기 적절한 사용자 친화적 인터페이스에 관한 연구가 활발히 이뤄지고 있다. 본 논문은 사용자의 손을 이용한 인터페이스를 지원하기 위하여 손 끝 좌표를 추적하여 가상의 객체를 선택하거나, 공중에 글씨나 그림을 작성하는 행위가 가능하도록 딥러닝 기반 손 끝 객체 탐지 방법을 제안한다. 입력 영상에서 Grad-CAM으로 해당 손 끝 객체의 대략적인 부분을 잘라낸 후, 잘라낸 영상에 대하여 Atrous Convolution을 이용한 합성곱 신경망을 수행하여 손 끝의 위치를 찾는다. 본 방법은 객체의 주석 전처리 과정을 별도로 요구하지 않으면서 기존 객체 탐지 알고리즘 보다 간단하고 구현하기에 쉽다. 본 방법을 검증하기 위하여 Air-Writing 응용을 구현한 결과 평균 81%의 인식률과 76 ms 속도로 허공에서 지연 시간 없이 부드럽게 글씨 작성이 가능하여 실시간으로 활용 가능함을 알 수 있었다.

Keywords

Acknowledgement

Supported by : 서경대학교

본 연구는 2019학년도 서경대학교 교내연구비 지원에 의하여 이루어졌음.

References

  1. D. Lowe, "Distinctive image features from scale invariant keypoints," IJCV, 60(2): pp. 91-110, 2004. https://doi.org/10.1023/B:VISI.0000029664.99615.94
  2. P. Viola, and M. Joncs, "Rapid object detection using a boosted cascade of simple features," CVPR, pp. 511-518, 2004.
  3. G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, "Visual categorization with bags of keypoints," Workshop on statistical learning in computer vision, ECCV, pp. 1-22, 2004.
  4. A. Krizhevsky, I. Sutskcver, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Advances in neural information processing systems, pp. 1097-1105, 2012.
  5. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
  6. R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," Proceedings of the IEEE conference on computer vision and pattern recognition, pp.580-587, 2014.
  7. R. Girshick, "Fast R-CNN," Proceedings af the IEEE international conference on computer vision, pp.1440-1448, 2015.
  8. C. Farabet, C. Couprie, L. Najaman, and Y. LeCun, "Learning hierarchical features for scene labeling," IEEE transactions on pattern analysis and machine intelligence, pp. 1915-1929, 2012,
  9. L. C. Chen, G. Papandreou, L Kokkinos, K. Murphy, and A. L. Yuille, "Semantic image segmentation with deep convolutional nets and fully connected CRFs," arXiv preprint arXiv: 1411.7061, 2014.
  10. S. Ren, K He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," Advances in neural information processing systems, pp. 91-99, 2015.
  11. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, real-time object detection," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016.
  12. R. R. Selvaraju, M. Cogswcll, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-CAM: Visual explanations from deep networks via gradient-based localization," Proeceedings of the IEEE international conference on computer vision, pp. 618-626, 2017.
  13. L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs," IEEE transaction on the pattern analysis and machine intelligence, 40(4): pp. 834-848, 2017. https://doi.org/10.1109/TPAMI.2017.2699184
  14. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, "Learning deep features for discriminative localization," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921-2929, 2016.
  15. M. Lin, Q. Chen, and S. Van, "Network in network," arXiv preprint arXiv: 1312.4400, 2013.
  16. V. Nair, G. E. Hinton, "Rectified linear units restricted boltzmann machines," Proceedings improve of the 27th international conference on machine learning(ICML-10), pp. 807-814, 2010.
  17. J. Hosang, R. Benenson, and B, Schiele, "Learning non-maximum suppression," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4507-4515, 2017.
  18. K. Simonyan, and A. Zisserman, "Very deep convolutional networks for image recognition," arXiv preprint arXiv: 1409,1556, 2014.
  19. W. Liu, O. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "SSD: Single shot multibox detector," European conference on computer vision, pp. 21-37, 2016.
  20. A. Krizhevsky, and G. Hinton, "Learning multiple layers of features from tiny images," Tech Report, 2009.