• Title/Summary/Keyword: Warp Divergence

Search Result 1, Processing Time 0.016 seconds

Analysis of GPU-based Parallel Shifted Sort Algorithm by comparing with General GPU-based Tree Traversal (일반적인 GPU 트리 탐색과의 비교실험을 통한 GPU 기반 병렬 Shifted Sort 알고리즘 분석)

  • Kim, Heesu;Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.18 no.6
    • /
    • pp.1151-1156
    • /
    • 2017
  • It is common to achieve lower performance in traversing tree data structures in GPU than one expects. In this paper, we analyze the reason of lower-than-expected performance in GPU tree traversal and present that the warp divergences is caused by the branch instructions ("if${\ldots}$ else") which appear commonly in tree traversal CUDA codes. Also, we compare the parallel shifted sort algorithm which can reduce the number of warp divergences with a kd-tree CUDA implementation to show that the shifted sort algorithm can work faster than the kd-tree CUDA implementation thanks to less warp divergences. As the analysis result, the shifted sort algorithm worked about 16-fold faster than the kd-tree CUDA implementation for $2^{23}$ query points and $2^{23}$ data points in $R^3$ space. The performance gaps tend to increase in proportion to the number of query points and data points.