DOI QR코드

DOI QR Code

The Implementation of Fast Object Recognition Using Parallel Processing on CPU and GPU

CPU와 GPU의 병렬 처리를 이용한 고속 물체 인식 알고리즘 구현

  • Published : 2009.05.01

Abstract

This paper presents a fast feature extraction method for autonomous mobile robots utilizing parallel processing and based on OpenMP, SSE (Streaming SIMD Extension) and CUDA programming. In the first step on CPU version, the algorithms and codes are optimized and then implemented by parallel processing. The parallel algorithms are debugged to maintain the same level of performance and the process for extracting key points and obtaining dominant orientation with respect to key points is parallelized. After extraction, a parallel descriptor via SSE instructions is constructed. And the GPU version also implemented by parallel processing using CUDA based on the SIFT. The GPU-Parallel descriptor achieves an acceleration up to five times compared with the CPU-Parallel descriptor, but it shows the lower performance than CPU version. CPU version also speed-up the four and half times compared with the original SIFT while maintaining robust performance.

Keywords

References

  1. S. Ullman, 'High-level Vision - Object Recognition and Visual Cognition,' MIT Press, 2000
  2. D. G Lowe, Distinctive image features from scale invariant keypoints, International Journal of Computer Vision, vol. 60, no. 2,pp.91-110,2004 https://doi.org/10.1023/B:VISI.0000029664.99615.94
  3. Y. Ke and R. Sukthankar, 'PCA-SIFT: A more distinctive representation for local image descriptors,' Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 511-517,2004 https://doi.org/10.1109/CVPR.2004.183
  4. H. Bay, Y. Tuytelaars, and G L. Van, 'SURF: Speeded up robust features,' Computer VISion and Image Understanding, vol. 110, pp. 346-359, 2008 https://doi.org/10.1016/j.cviu.2007.09.014
  5. S. Cagnoni, F. Bergenti, M. Mordonini, and G Adorni, 'Evolving binary classifiers through Parallel computation of multiple fitness cases,' IEEE Trans. on Systems, Man, and Cybernetics, part B, vol. 35, no. 3, 2005 https://doi.org/10.1109/TSMCB.2005.846671
  6. E. N. Mortensen, D. Hongli, and L. Shapiro, 'A SIFT descriptor with global context,' Proc. of the Conforence on Computer Vision and Pattern Recognition, pp. 184-190,2005 https://doi.org/10.1109/CVPR.2005.45
  7. J. Shotton, A. Blake, and R. Cipolla, 'Multiscale categorical object recognition using contour fragments,' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 7, pp. 1270-1281,2008 https://doi.org/10.1109/TPAMI.2007.70772
  8. M. Brown and D. G Lowe, 'Invariant features from interest point groups,' Proc. of the British Machine Vision Conference, pp. 656-665, 2002
  9. K. Mikolajczyk, et aI., 'A comparison of affine region detectors,' International Journal of Computer Vision, vol. 65, no. 1, pp. 43-72, 2005 https://doi.org/10.1007/s11263-005-3848-x
  10. K. Mikolajczyk and C. Schmid, 'A performance evaluation of local descriptors,' IEEE Trans. on Pattern Analysis and Machine Intelligence, pp. 1615-1630,2005 https://doi.org/10.1109/TPAMI.2005.188
  11. OpenMP Architecture Review Board, OpenMP Application Program Interface, ver. 2.5, 2005, http://www.openmp.org
  12. R. Chandra, L. Dagurn, D. Kohr, D. Maydan, J. McDonald, and R. Menon, Parallel Programming in OpenMP, Morgan Kaufinann,2005
  13. C. Nicolescu and P. Jonker, 'A data and task parallel image processing environment,' Parallel Computing, vol. 28, pp. 945- 965,2005 https://doi.org/10.1016/S0167-8191(02)00105-9
  14. Intel${\circledR}$64 and lA-32 Architectures Software Developer's Manual, Intel Corporation, vol 2A,B, Instruction Set Reference., 2007, http://www.intel.com
  15. Intel${\circledR}$64 and lA-32 Architectures Optimization Reference manual, Intel Corporation, 2007, http://www.intel.com
  16. S. Asadollah, B. Juurlink, and S. Vassiliadis, 'Performance comparison of SIMD implementations of the discrete wavelet transform,' Proc. of the 16th IEEE International Conforence on Application-Specific Systems, Architecture Processors, pp. 393-398,2005 https://doi.org/10.1109/ASAP.2005.51
  17. NVIDIA CUDA, Programming Guide, v2.1, 2008, http://www.nvidia.comlobjectlcuda_home.html
  18. L. Yuancheng and R. Duraiswami, 'Canny edge detection on NVIDIA CUDA,' Computer Vision and Pattern Recognition Workshops, 2008
  19. J. D. Owens, D. Luebke, N. govindaraju, M. Harris, J. Kruuger, A. E. Lefohn, and T. J. Purcell, ' A survey of general-purpose computaion on graphics hardware,' In Eurographics 2005, State of the Art Reports, pp. 21-51, 2005
  20. V. Garcia, E. Debreuve, and M. Barlaud, 'Fast k nearest neighbor search using GPU,' Computer Vision and Pattern Recognition Workshops, 2008 https://doi.org/10.1109/CVPRW.2008.4563100
  21. N. S. Sudipta, J.-M. Frahm, M. Pollefeys, and Y Genc, 'Feature tracking and matching in video using programmable graphics hardware,' Machine Vision and Applications, 2007 https://doi.org/10.1007/s00138-007-0105-z