• Title/Summary/Keyword: GPU algorithm

Search Result 265, Processing Time 0.029 seconds

Image-based Collision Detection on GPU (GPU를 이용한 이미지 기반 충돌검사)

  • Jang, Han-Young;Jung, Taek-Sang;Han, Jung-Hyun
    • 한국HCI학회:학술대회논문집
    • /
    • 2006.02a
    • /
    • pp.812-817
    • /
    • 2006
  • This paper presents an image-space algorithm to real-time collision detection, which is run completely by GPU. For a single object or for multiple objects with no collision, the front and back faces appear alternately along the view direction. However, such alternation is violated when objects collide. Based on these observations, the algorithm has been devised, and the implementation utilizes the state-of-the-art functionalities of GPU such as framebuffer objects(FBO), vertex buffer object(VBO) and occlusion query. The experimental results show the feasibility of GPU-intensive collision detection and its performance gain in real-time applications such as 3D games.

  • PDF

A Parallel Algorithm for Measuring Graph Similarity Using CUDA on GPU (GPU에서 CUDA를 이용한 그래프 유사도 측정을 위한 병렬 알고리즘)

  • Son, Min-Young;Kim, Young-Hak;Choi, Sung-Ja
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.3
    • /
    • pp.156-164
    • /
    • 2017
  • Measuring the similarity of two graphs is a basic tool to solve graph problems in various applications. Most graph algorithms have a high time complexity according to the number of vertices and edges. Because Graphics Processing Units (GPUs) have a high computational power and can be obtained at a low cost, these have been widely used in graph applications to improve execution time. This paper proposes an efficient parallel algorithm to measure graph similarity using the CUDA on a GPU environment. The experimental results show that the proposed approach brings a considerable improvement in performance and efficiency when compared to CPU-based results. Our results also show that the performance is improved significantly as the size of the graph increases.

APBT-JPEG Image Coding Based on GPU

  • Wang, Chengyou;Shan, Rongyang;Zhou, Xiao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.4
    • /
    • pp.1457-1470
    • /
    • 2015
  • In wireless multimedia sensor networks (WMSN), the latency of transmission is an increasingly problem. With the improvement of resolution, the time cost in image and video compression is more and more, which seriously affects the real-time of WMSN. In JPEG system, the core of the system is DCT, but DCT-JPEG is not the best choice. Block-based DCT transform coding has serious blocking artifacts when the image is highly compressed at low bit rates. APBT is used in this paper to solve that problem, but APBT does not have a fast algorithm. In this paper, we analyze the structure in JPEG and propose a parallel framework to speed up the algorithm of JPEG on GPU. And we use all phase biorthogonal transform (APBT) to replace the discrete cosine transform (DCT) for the better performance of reconstructed image. Therefore, parallel APBT-JPEG is proposed to solve the real-time of WMSN and the blocking artifacts in DCT-JPEG in this paper. We use the CUDA toolkit based on GPU which is released by NVIDIA to design the parallel algorithm of APBT-JPEG. Experimental results show that the maximum speedup ratio of parallel algorithm of APBT-JPEG can reach more than 100 times with a very low version GPU, compared with conventional serial APBT-JPEG. And the reconstructed image using the proposed algorithm has better performance than the DCT-JPEG in terms of objective quality and subjective effect. The proposed parallel algorithm based on GPU of APBT also can be used in image compression, video compression, the edge detection and some other fields of image processing.

GPU-based Image-space Collision Detection among Closed Objects (GPU를 이용한 이미지 공간 충돌 검사 기법)

  • Jang, Han-Young;Jeong, Taek-Sang;Han, Jung-Hyun
    • Journal of the HCI Society of Korea
    • /
    • v.1 no.1
    • /
    • pp.45-52
    • /
    • 2006
  • This paper presents an image-space algorithm to real-time collision detection, which is run completely by GPU. For a single object or for multiple objects with no collision, the front and back faces appear alternately along the view direction. However, such alternation is violated when objects collide. Based on these observations, the algorithm propose the depth peeling method which renders the minimal surface of objects, not whole surface, to find colliding. The Depth peeling method utilizes the state-of-the-art functionalities of GPU such as framebuffer object, vertexbuffer object, and occlusion query. Combining these functions, multi-pass rendering and context switch can be done with low overhead. Therefore proposed approach has less rendering times and rendering overhead than previous image-space collision detection. The algorithm can handle deformable objects and complex objects, and its precision is governed by the resolution of the render-target-texture. The experimental results show the feasibility of GPU-based collision detection and its performance gain in real-time applications such as 3D games.

  • PDF

Implementation of CUDA-based Octree Algorithm for Efficient Search for LiDAR Point Cloud (라이다 점군의 효율적 검색을 위한 CUDA 기반 옥트리 알고리듬 구현)

  • Kim, Hyung-Woo;Lee, Yang-Won
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.6_1
    • /
    • pp.1009-1024
    • /
    • 2018
  • With the increased use of LiDAR (Light Detection and Ranging) that can obtain over millions of point dataset, methodologies for efficient search and dimensionality reduction for the point cloud became a crucial technique. The existing octree-based "parametric algorithm" has proved its efficiency and contributed as a part of PCL (Point Cloud Library). However, the implementation of the algorithm on GPU (Graphics Processing Unit) is considered very difficult because of structural constraints of the octree implemented in PCL. In this paper, we present a method for the parametric algorithm on GPU environment and implement a projection of the queried points on four directions with an improved noise reduction.

A Study on GPU-based Iterative ML-EM Reconstruction Algorithm for Emission Computed Tomographic Imaging Systems (방출단층촬영 시스템을 위한 GPU 기반 반복적 기댓값 최대화 재구성 알고리즘 연구)

  • Ha, Woo-Seok;Kim, Soo-Mee;Park, Min-Jae;Lee, Dong-Soo;Lee, Jae-Sung
    • Nuclear Medicine and Molecular Imaging
    • /
    • v.43 no.5
    • /
    • pp.459-467
    • /
    • 2009
  • Purpose: The maximum likelihood-expectation maximization (ML-EM) is the statistical reconstruction algorithm derived from probabilistic model of the emission and detection processes. Although the ML-EM has many advantages in accuracy and utility, the use of the ML-EM is limited due to the computational burden of iterating processing on a CPU (central processing unit). In this study, we developed a parallel computing technique on GPU (graphic processing unit) for ML-EM algorithm. Materials and Methods: Using Geforce 9800 GTX+ graphic card and CUDA (compute unified device architecture) the projection and backprojection in ML-EM algorithm were parallelized by NVIDIA's technology. The time delay on computations for projection, errors between measured and estimated data and backprojection in an iteration were measured. Total time included the latency in data transmission between RAM and GPU memory. Results: The total computation time of the CPU- and GPU-based ML-EM with 32 iterations were 3.83 and 0.26 see, respectively. In this case, the computing speed was improved about 15 times on GPU. When the number of iterations increased into 1024, the CPU- and GPU-based computing took totally 18 min and 8 see, respectively. The improvement was about 135 times and was caused by delay on CPU-based computing after certain iterations. On the other hand, the GPU-based computation provided very small variation on time delay per iteration due to use of shared memory. Conclusion: The GPU-based parallel computation for ML-EM improved significantly the computing speed and stability. The developed GPU-based ML-EM algorithm could be easily modified for some other imaging geometries.

Implementation of Lattice Reduction-aided Detector using GPU on SDR System (SDR 시스템에서 GPU를 사용한 Lattice Reduction-aided 검출기 구현)

  • Kim, Tae Hyun;Leem, Hyun Seok;Choi, Seung Won
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.7 no.3
    • /
    • pp.55-61
    • /
    • 2011
  • This paper presents an implementation of Lattice Reduction (LR)-aided detector for Multiple-Input Multiple-Output (MIMO) system using Graphics Processing Unit (GPU). GPU is a parallel processor which has a number of Arithmetic Logic Units (ALUs), thus, it can minimize the operation time of LR algorithm through the parallelization using multiple threads in the GPU. Through the implemented LR-aided detector, we verify that the LR-aided detector operates a lot faster than Maximum Likelihood (ML) detector. The implemented LR-aided detector has been applied to WiMAX system to show the feasibility of its real-time processing. In addition, we demonstrate that the processing time can be reduced at the cost of 3dB SNR loss by limiting the repeating loop in Lenstra-Lenstra-Lovasz (LLL) algorithm which is frequently used in LR-aided detector.

Design of Omok AI using Genetic Algorithm and Game Trees and Their Parallel Processing on the GPU (유전 알고리즘과 게임 트리를 병합한 오목 인공지능 설계 및 GPU 기반 병렬 처리 기법)

  • Ahn, Il-Jun;Park, In-Kyu
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.2
    • /
    • pp.66-75
    • /
    • 2010
  • This paper proposes an efficient method for design and implementation of the artificial intelligence (AI) of 'omok' game on the GPU. The proposed AI is designed on a cooperative structure using min-max game tree and genetic algorithm. Since the evaluation function needs intensive computation but is independently performed on a lot of candidates in the solution space, it is computed on the GPU in a massive parallel way. The implementation on NVIDIA CUDA and the experimental results show that it outperforms significantly over the CPU, in which parallel game tree and genetic algorithm on the GPU runs more than 400 times and 300 times faster than on the CPU. In the proposed cooperative AI, selective search using genetic algorithm is performed subsequently after the full search using game tree to search the solution space more efficiently as well as to avoid the thread overflow. Experimental results show that the proposed algorithm enhances the AI significantly and makes it run within the time limit given by the game's rule.

An Improved CYK Algorithm based on GPGPU (GPGPU 기반의 개선된 CYK 알고리즘)

  • Kim, Kyoung-Hwan;Han, Yo-Sub
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.409-410
    • /
    • 2012
  • 범용 계산에 GPU를 활용하는 GPGPU 연구가 활발히 이루어지고 있다. 기존 연구에서 사용된 병렬화 기법은 데이터 이동시 GPU의 유휴자원을 잘 활용하지 못한다. 우리는 스트림 기법을 이용하여 CPU-GPU간 데이터 이동과 GPU내 연산을 동시에 실행시켜 데이터 이동시 GPU의 유휴자원을 최대한 활용하여 성능을 향상한다. 제안된 방식은 기존의 병렬화 방법에 비해 약 1.1배 향상된 성능을 나타낸다.

Technique of Sea-fog Removal base on GPU (GPU 기반의 해무제거 기술)

  • Choi, Woonsik;Ha, Jun;Youn, Woosang;Kwak, Jaemin;Choi, Hyunjun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.05a
    • /
    • pp.576-578
    • /
    • 2015
  • This paper propose the help of the secure a clear view and safe navigation of the coastal ship through the sea-fog removal algorithm. Interest in marine accidents and vessel safety has increased in recent Sewol ferry event. According to statistics coastal ship cause of the marine accident when sea fog on the sea did not secure clear view the ship's occur several incidents of collisions between ships and can see that accounts for a high percentage. Algorithm for image exist sea fog is number of studies. but, such studies take up a lot of calculation quantity in the course of performing the algorithm. In this paper, we improve the computational speed of sea fog over the GPU-based technique was removed to suit real-time video. Furthermore, by using GPU, we succeeded in accelerating the simulation 250 times.

  • PDF