DOI QR코드

DOI QR Code

Efficient CUDA Implementation of Multiple Planes Fitting Using RANSAC

RANSAC을 이용한 다중 평면 피팅의 효율적인 CUDA 구현

  • Cho, Tai-Hoon (School of Computer Science and Engineering, Korea University of Technology and Education)
  • Received : 2019.02.06
  • Accepted : 2019.03.08
  • Published : 2019.04.30

Abstract

As a fiiting method to data with outliers, RANSAC(RANdom SAmple Consensus) based algorithm is widely used in fitting of line, circle, ellipse, etc. CUDA is currently most widely used GPU with massive parallel processing capability. This paper proposes an efficient CUDA implementation of multiple planes fitting using RANSAC with 3d points data, of which one set of 3d points is used for one plane fitting. The performance of the proposed algorithm is demonstrated compared with CPU implementation using both artificially generated data and real 3d heights data of a PCB. The speed-up of the algorithm over CPU seems to be higher in data with lower inlier ratio, more planes to fit, and more points per plane fitting. This method can be easily applied to a wide variety of other fitting applications.

외란(Outlier)이 있는 데이터를 피팅(Fitting)하는 방법으로 RANSAC(RANdom SAmple Consensus)알고리즘이 선, 원, 타원 등 의 피팅에 많이 사용되고 있다. 본 논문은 다수의 평면에 대한 3차원 포인트 데이터가 주어질 때 각 평면에 대해 RANSAC기반 평면 피팅을 최근 딥러닝 등에 많이 사용되는 GPU의 하나인 CUDA를 이용하여 효율적으로 수행하는 알고리즘을 제안한다. 모의 데이터와 실제 데이터를 이용하여 제안된 알고리즘의 성능을 CPU와 비교하여 보인다. 외란이 많고 인라이어(inlier) 비율이 낮을수록 CPU대비 속도가 향상되고 평면의 개수가 많을수록 평면당 데이터개수가 많을수록 병렬처리에 의한 속도가 가속됨을 보인다. 제안된 방법은 다중 평면 피팅외의 다른 피팅에도 쉽게 적용할 수 있다.

Keywords

HOJBC0_2019_v23n4_388_f0001.png 이미지

Fig. 1 Thread Hierarchy in CUDA Programming [10]

HOJBC0_2019_v23n4_388_f0002.png 이미지

Fig. 2 Heights data of a PCB. [um]

HOJBC0_2019_v23n4_388_f0003.png 이미지

Fig. 3 3d data in each white rectangle area are used to fit to a plane. The number of rectangles is 58. The number of points in each rectangle varies from 12390 to 66528.

Table. 1 Execution time of fitting 400 planes, 40000 points per plane. (a=-0.1, b=0.1, c=3) [ms]

HOJBC0_2019_v23n4_388_t0001.png 이미지

Table. 2 Execution time of fitting 400 planes, 40000 points per plane. (a=1, b=2, c=3) [ms]

HOJBC0_2019_v23n4_388_t0002.png 이미지

Table. 3 Breakdown of execution time of fitting 400 planes, 40000 points per plane. (a=-0.1, b=0.1, c=3) [ms]

HOJBC0_2019_v23n4_388_t0003.png 이미지

Table. 4 Execution time for 40 planes fitting, 40000 points per plane. (a=1, b=2, c=3) [ms]

HOJBC0_2019_v23n4_388_t0004.png 이미지

Table. 5 Execution time for 400 planes fitting, 4000 points per plane. (a=1, b=2, c=3) [ms]

HOJBC0_2019_v23n4_388_t0005.png 이미지

Table. 6 Execution time for heights data of a PCB [ms]

HOJBC0_2019_v23n4_388_t0006.png 이미지

References

  1. M. A. Fischler, and R. C. Bolles, "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography," Comm. of the ACM, vol. 24, pp. 381-395, 1981. https://doi.org/10.1145/358669.358692
  2. Wikipedia. Description of GPGPU [Internet]. Available: https://ko.wikipedia.org/wiki/GPGPU.
  3. Wikipedia. Description of CUDA [Internet]. Available: https://ko.wikipedia.org/wiki/CUDA
  4. NVIDIA: CUDA C Programming Guide (Version 5.5) (2013)
  5. K. Tyagi, CUDA implementation of circle detection using RANSAC. Available: https://github.com/kunaltyagi/cuRANSAC
  6. D. Koguciuk, "Parallel RANSAC for point cloud registration," Foundations of Computing and Decision Sciences, vol. 42, no. 3, pp. 203-217, 2017. https://doi.org/10.1515/fcds-2017-0010
  7. J. Roters, and X. Jiang, "FestGPU: A framework for fast robust estimation on GPU," Journal of Real-Time Image Processing, vol. 13, no. 4, pp. 759-772, 2017. https://doi.org/10.1007/s11554-014-0439-5
  8. X. Zhi, J. Yan, Y. Hang, and S. Wang, "Realization of CUDA-based real-time registration and target localization for high-resolution video images," J. Real-Time Image Proc, Apr. 2016.
  9. J. Lan, Y. Tian, W. Song, S. Fong, and Z. Su, "A Fast Planner Detection Method in LiDAR Point Clouds Using GPU-based RANSAC," KDD 2018 Workshop on Knowledge Discovery and User Modelling for Smart Cities, Aug. 2018.
  10. Wikipedia. Description of Thread Block [Internet]. Available: https://en.wikipedia.org/wiki/Thread_block
  11. Optimizing Parallel Reduction in CUDA [Internet]. Available: https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf