• Title/Summary/Keyword: GPU algorithm

Search Result 266, Processing Time 0.022 seconds

Parallel Processing Algorithm of JPEG2000 Using GPU (GPU를 이용한 JPEG2000 병렬 알고리즘)

  • Lee, Dong-Ha;Cho, Shi-Won;Lee, Dong-Wook
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.6
    • /
    • pp.1075-1080
    • /
    • 2008
  • Most modem computers or game consoles are well equipped with powerful graphics processing units(GPUs) to accelerate graphics operations. However, since the graphics engines in these GPUs are specially designed for graphics operations, we could not take advantage of their computing power for more general nongraphic operations. In this paper, we studied the GPUs graphics engine in order to accelerate the image processing capability. Specifically, we implemented a JPEC2000 decoding/encoding framework that involves both OpenMP and GPU. Initial experimental results show that significant speed-up can be achieved by utilizing the GPU power.

Faster Fingerprint Matching Algorithm Using GPU (GPU를 이용한 보다 빠른 지문 인식 알고리즘)

  • Riaz, Sidra;Lee, Sang-Woong
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2012.05a
    • /
    • pp.43-45
    • /
    • 2012
  • This paper is based on embedding the biometrics techniques on GPU for better computational efficiency and fast matching process using the parallel nature of the GPU processors to compare thousands of images for fingerprint recognition in a fraction of a second. In this paper we worked on GPU (INVIDIA GeForce GTX 260 with compute capability 1.3 and dual core-2-dou processor) for fingerprint matching and found that the efficiency is better than the results with related work already done on CMOS, CPU, ARM9, MATLAB Neural Networks etc which shows the better performance of our system in terms of computational time. The features matching process proposed for fingerprint recognition and the verification procedure is done on 5,000 images which are available online in the databases FVC2000, 2002, 2004 [1].

  • PDF

GPU-based Object Extraction for Real-time Analysis of Large-scale Radar Signal (대규모 레이더 신호 데이터의 실시간 분석을 위한 GPU 기반 객체 추출 기법)

  • Kang, Young-Min
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.8
    • /
    • pp.1297-1309
    • /
    • 2016
  • In this paper, an efficient connected component labeling (CCL) method was proposed. The proposed method is based on GPU parallelism. The CCL is very important in various applications where images are analysed. However, the label of each pixel is dependent on the connectivity of adjacent pixels so that it is not very easy to be parallelized. In this paper, a GPU-based parallel CCL techniques were proposed and applied to the analysis of radar signal. Since the radar signals contains complex and large data, the efficiency of the algorithm is crucial when realtime analysis is required. The experimental results show the proposed method is efficient enough to be successfully applied to this application.

Accelerating Soft-Decision Reed-Muller Decoding Using a Graphics Processing Unit

  • Uddin, Md. Sharif;Kim, Cheol Hong;Kim, Jong-Myon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.4 no.2
    • /
    • pp.369-378
    • /
    • 2014
  • The Reed-Muller code is one of the efficient algorithms for multiple bit error correction, however, its high-computation requirement inherent in the decoding process prohibits its use in practical applications. To solve this problem, this paper proposes a graphics processing unit (GPU)-based parallel error control approach using Reed-Muller R(r, m) coding for real-time wireless communication systems. GPU offers a high-throughput parallel computing platform that can achieve the desired high-performance decoding by exploiting massive parallelism inherent in the algorithm. In addition, we compare the performance of the GPU-based approach with the equivalent sequential approach that runs on the traditional CPU. The experimental results indicate that the proposed GPU-based approach exceedingly outperforms the sequential approach in terms of execution time, yielding over 70× speedup.

Molecular Docking System using Parallel GPU (병렬 GPU를 이용한 분자 도킹 시스템)

  • Park, Sung-Jun
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.12
    • /
    • pp.441-448
    • /
    • 2008
  • The molecular docking system needs a large amount of computation and requires super-computing power. Since the experiment requires a large amount of time, the experiment is conducted in the distributed environment or in the grid environment. Recently, researches on using parallel GPU of far higher performance than that of CPU in scientific computing have been very actively conducted. CUDA is an open technique by which a parallel GPU programming is made possible. This study proposes the molecular docking system using CUDA. It also proposes algorithm that parallels energy-minimizing-computation. To verify such experiments, this study conducted a comparative analysis on the time required for experimenting molecular docking in general CPU and the time and performance of the parallel GPU-based molecular docking which is proposed in this study.

Efficient Parallel CUDA Random Number Generator on NVIDIA GPUs (NVIDIA GPU 상에서의 난수 생성을 위한 CUDA 병렬프로그램)

  • Kim, Youngtae;Hwang, Gyuhyeon
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1467-1473
    • /
    • 2015
  • In this paper, we implemented a parallel random number generation program on GPU's, which are known for high performance computing, using LCG (Linear Congruential Generator). Random numbers are important in all fields requiring the use of randomness, and LCG is one of the most widely used methods for the generation of pseudo-random numbers. We explained the parallel program using the NVIDIA CUDA model and MPI(Message Passing Interface) and showed uniform distribution and performance results. We also used a Monte Carlo algorithm to calculate pi(${\pi}$) comparing the parallel random number generator with cuRAND, which is a CUDA library function, and showed that our program is much more efficient. Finally we compared performance results using multi-GPU's with those of ideal speedups.

A CPU and GPU Heterogeneous Computing Techniques for Fast Representation of Thin Features in Liquid Simulations (액체 시뮬레이션의 얇은 특징을 빠르게 표현하기 위한 CPU와 GPU 이기종 컴퓨팅 기술)

  • Kim, Jong-Hyun
    • Journal of the Korea Computer Graphics Society
    • /
    • v.24 no.2
    • /
    • pp.11-20
    • /
    • 2018
  • We propose a new method particle-based method that explicitly preserves thin liquid sheets for animating liquids on CPU-GPU heterogeneous computing framework. Our primary contribution is a particle-based framework that splits at thin points and collapses at dense points to prevent the breakup of liquid on GPU. In contrast to existing surface tracking methods, the our method does not suffer from numerical diffusion or tangles, and robustly handles topology changes on CPU-GPU framework. The thin features are detected by examining stretches of distributions of neighboring particles by performing PCA(Principle component analysis), which is used to reconstruct thin surfaces with anisotropic kernels. The efficiency of the candidate position extraction process to calculate the position of the fluid particle was rapidly improved based on the CPU-GPU heterogeneous computing techniques. Proposed algorithm is intuitively implemented, easy to parallelize and capable of producing quickly detailed thin liquid animations.

A QoS-Aware Energy Optimization Technique for Smartphone GPUs (QoS를 고려한 스마트폰 GPU의 에너지 최적화 기법)

  • Kim, Dohan;Song, Wook;Kim, HyungHoon;Kim, Jihong
    • Journal of KIISE
    • /
    • v.42 no.5
    • /
    • pp.566-572
    • /
    • 2015
  • We proposed a novel energy optimization technique for smartphone GPUs, more aggressively lowering the GPU frequency while obtaining higher energy efficiency with a negligible negative impact on the GPU performance. In order to achieve the Quality of Service (QoS) specified by the smartphone application, the proposed optimization technique employed the minimal acceptable GPU frequency based on average Frames per Second (FPS) for each GPU frequency level. Our experimental results on a smartphone development board showed that the proposed technique can reduce the GPU energy consumption by up to 23% over the default DVFS algorithm with only a 0.45 frame drop.

Optimizing Skyline Query Processing Algorithms on CUDA Framework (CUDA 프레임워크 상에서 스카이라인 질의처리 알고리즘 최적화)

  • Min, Jun;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.275-284
    • /
    • 2010
  • GPUs are stream processors based on multi-cores, which can process large data with a high speed and a large memory bandwidth. Furthermore, GPUs are less expensive than multi-core CPUs. Recently, usage of GPUs in general purpose computing has been wide spread. The CUDA architecture from Nvidia is one of efforts to help developers use GPUs in their application domains. In this paper, we propose techniques to parallelize a skyline algorithm which uses a simple nested loop structure. In order to employ the CUDA programming model, we apply our optimization techniques to make our skyline algorithm fit into the performance restrictions of the CUDA architecture. According to our experimental results, we improve the original skyline algorithm by 80% with our optimization techniques.

Image Feature-Based Real-Time RGB-D 3D SLAM with GPU Acceleration (GPU 가속화를 통한 이미지 특징점 기반 RGB-D 3차원 SLAM)

  • Lee, Donghwa;Kim, Hyongjin;Myung, Hyun
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.19 no.5
    • /
    • pp.457-461
    • /
    • 2013
  • This paper proposes an image feature-based real-time RGB-D (Red-Green-Blue Depth) 3D SLAM (Simultaneous Localization and Mapping) system. RGB-D data from Kinect style sensors contain a 2D image and per-pixel depth information. 6-DOF (Degree-of-Freedom) visual odometry is obtained through the 3D-RANSAC (RANdom SAmple Consensus) algorithm with 2D image features and depth data. For speed up extraction of features, parallel computation is performed with GPU acceleration. After a feature manager detects a loop closure, a graph-based SLAM algorithm optimizes trajectory of the sensor and builds a 3D point cloud based map.