Search | Korea Science

Park, Taejung
- Journal of Digital Contents Society
- /
- v.18 no.3
- /
- pp.613-619
- /
- 2017
This paper presents a CUDA (Compute Unified Device Architecture) code to achieve correct GPU parallel segmented prefix operation results with less than 32 segment length for large data arrays. Mark Harris and Michael Garland had published CUDA code to address the tasks. This paper shows that their code does not generate correct results when the local segment length is less than 32, discusses the cause of the problem, and presents a CUDA code that generates correct results. The segmented parallel prefix operation presented in this paper can be applied as a building block to various large parallel processing algorithms including the k-nearest neighbor search problems.
https://doi.org/10.9728/dcs.2017.18.3.613 인용 PDF KSCI

Ha, Sang-Won;Choi, Moon-Hee;Jun, Tae-Joon;Kim, Jin-Woo;Byun, Hye-Ran;Han, Tack-Don
- Journal of Korea Game Society
- /
- v.12 no.5
- /
- pp.67-78
- /
- 2012
Summed area table allows filtering of arbitrary-width box regions for every pixel in constant time per pixel. This characteristic makes it beneficial in image processing applications where the sum or average of the surrounding pixel intensity is required. Although calculating the summed area table of an image data is primarily a memory bound job consisting of row or column-wise summation, previous works had to endure excessive access to the high latency global memory in order to exploit data parallelism. In this paper, we propose an efficient algorithm for generating the summed area table in the GPGPU environment where the input is decomposed into square sub-images with intermediate data that are propagated between them. By doing so, the global memory access is almost halved compared to the previous methods making an efficient use of the available memory bandwidth. The results show a substantial increase in performance.
https://doi.org/10.7583/JKGS.2012.12.5.67 인용 PDF KSCI