• Title/Summary/Keyword: CPU-GPU

Search Result 347, Processing Time 0.036 seconds

Quantitative Analysis on Performance and Power Consumption of GPU varying to Frequency (GPU의 성능과 소비전력에 대한 동작 주파수의 영향 분석)

  • Joo, Se-Yoon;Choi, Hong-Jun;Kim, Cheol-Hong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.203-205
    • /
    • 2012
  • 최근 컴퓨터 시스템에서는 동작 주파수 증가에 따른 전력 소모량과 높은 온도문제로 인해 CPU의 성능에만 의존할 수는 없는 상황이다. 이에 따라 GPU 병렬처리 연산능력을 CPU의 범용 데이터 처리에 이용하는 기술에 대한 관심이 높아지고 있다. 하지만 CPU와 GPU의 모든 자원을 활용하기에는 이에 따른 높은 온도와 전력 상승이 문제가 된다. 따라서 본 논문에서는 GPU의 전력효율과 성능 측면에서 최적이 되는 동작 주파수에 대한 분석을 수행하고자 한다. GPU를 활용하는 API인 CUDA를 이용하여 GPU의 동작 주파수 변화에 따른 성능 변화, 전력 변화 그리고 Energy Delay에 대해서 분석한다. 실험을 통한 분석 결과 동작 주파수의 증가에 따라 성능은 최대 30%이상 증가했고, 전력소모량은 최대 약18%의 증가를 보여주었다. 또한 Energy Delay도 최대 21% 향상되는 것을 확인할 수 있었다.

Development of Real-Time Image Processing System Using GPU (GPU를 이용한 실시간 이미지 프로세싱 시스템)

  • Oh Jae-Hong;Kang Hoon;Lee Ja-Yong
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.11 no.5
    • /
    • pp.393-397
    • /
    • 2005
  • When a real-time image processing application is implemented with a general-purpose computer, CPU (Central Processing Unit) is usually heavily loaded and in many cases that CPU alone cannot meet the real-time requirement at all. Most modern computers are equipped with powerful Graphics Processing Units (GPUs) to accelerate graphics operations. There is a trend that the power of GPU outgrows that of CPU. If we take advantage of the powerful GPU for more general operations other than pure graphics operations, the processing time can be reduced. In this study, we will present techniques that apply GPU to general operations such as image processing procedures. Our experiment results show that significant speed-up can be achieved by using GPU.

Software Method for Improving the Performance of Real-time Rendering (실시간 렌더링의 속도 향상을 위한 소프트웨어적 기법)

  • Han, Young-Min;Hwang, Seok-Min;Sung, Mee-Young
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11a
    • /
    • pp.757-759
    • /
    • 2005
  • 일반적인 렌더링 방식은 응용$\rightarrow$기하$\rightarrow$래스터화로 진행되는 렌더링 파이프라인 상에서 진행된다. 그래픽 카드의 발전으로 기하 단계의 연산을 GPU가 담당함에 따라 CPU의 연산을 줄여 CPU가 많은 연산을 할 수 있게 되었다. 그러나 이 같은 분배로 인해 CPU와 GPU가 서로 끝나기를 기다리는 병목현상이 발생하게 되었다. 이러한 병목 현상은 효율적인 렌더링을 저해하는 요인이다. 본 연구의 목적은 CPU와 GPU의 병렬처리 과정에서 발생하는 병목현상을 줄여 실시간 렌더링에서 그래픽 출력을 더욱 빠르게 하는데 있다. 이를 위해 본 논문에서는 그래픽 출력 과정 중 CPU 와 GPU 사이에서 하드웨어적으로 처리되고 있는 동기적 처리 과정을 소프트웨어적인 기법을 이용하여 비동기적으로 처리함으로써 성능을 향상시킬 수 있음을 말하고자 한다.

  • PDF

A Study on GPU-based Iterative ML-EM Reconstruction Algorithm for Emission Computed Tomographic Imaging Systems (방출단층촬영 시스템을 위한 GPU 기반 반복적 기댓값 최대화 재구성 알고리즘 연구)

  • Ha, Woo-Seok;Kim, Soo-Mee;Park, Min-Jae;Lee, Dong-Soo;Lee, Jae-Sung
    • Nuclear Medicine and Molecular Imaging
    • /
    • v.43 no.5
    • /
    • pp.459-467
    • /
    • 2009
  • Purpose: The maximum likelihood-expectation maximization (ML-EM) is the statistical reconstruction algorithm derived from probabilistic model of the emission and detection processes. Although the ML-EM has many advantages in accuracy and utility, the use of the ML-EM is limited due to the computational burden of iterating processing on a CPU (central processing unit). In this study, we developed a parallel computing technique on GPU (graphic processing unit) for ML-EM algorithm. Materials and Methods: Using Geforce 9800 GTX+ graphic card and CUDA (compute unified device architecture) the projection and backprojection in ML-EM algorithm were parallelized by NVIDIA's technology. The time delay on computations for projection, errors between measured and estimated data and backprojection in an iteration were measured. Total time included the latency in data transmission between RAM and GPU memory. Results: The total computation time of the CPU- and GPU-based ML-EM with 32 iterations were 3.83 and 0.26 see, respectively. In this case, the computing speed was improved about 15 times on GPU. When the number of iterations increased into 1024, the CPU- and GPU-based computing took totally 18 min and 8 see, respectively. The improvement was about 135 times and was caused by delay on CPU-based computing after certain iterations. On the other hand, the GPU-based computation provided very small variation on time delay per iteration due to use of shared memory. Conclusion: The GPU-based parallel computation for ML-EM improved significantly the computing speed and stability. The developed GPU-based ML-EM algorithm could be easily modified for some other imaging geometries.

Implementation of GPU System for SDR in WiBro Environment (WiBro 환경에서 SDR을 위한 GPU 시스템 구현)

  • Ahn, Sung-Soo;Lee, Jung-Suk
    • 전자공학회논문지 IE
    • /
    • v.48 no.3
    • /
    • pp.20-25
    • /
    • 2011
  • We developed a method of accelerating the operation speed of communication systems for SDR(Software Defined Radio) systems in WiBro environment. In this paper, we propose a new scheme of using GPU(Graphics Processing Unit) for implementing the communication system which perform with the functionality of SDR. In general, communication systems is made by DSP(Digital Signalling Processor) or FPGA(Field Programmable Gate Array). However, in this case, there are exist the problem of implementation and debugging caused by each CPU characteristic. The GPU is optimized for vector processing because it usually consists of multiple processors and each processor in GPU is composed of a set of threads. We also developed Framework to use GPU and CPU resources effectively for reducing the operation time. From the various simulation, it is confirmed that GPU system have good performance in WiBro system.

Efficient Task Distribution for Pig Monitoring Applications Using OpenCL (OpenCL을 이용한 돈사 감시 응용의 효율적인 태스크 분배)

  • Kim, Jinseong;Choi, Younchang;Kim, Jaehak;Chung, Yeonwoo;Chung, Yongwha;Park, Daihee;Kim, Hakjae
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.10
    • /
    • pp.407-414
    • /
    • 2017
  • Pig monitoring applications consisting of many tasks can take advantage of inherent data parallelism and enable parallel processing using performance accelerators. In this paper, we propose a task distribution method for pig monitoring applications into a heterogenous computing platform consisting of a multicore-CPU and a manycore-GPU. That is, a parallel program written in OpenCL is developed, and then the most suitable processor is determined based on the measured execution time of each task. The proposed method is simple but very effective, and can be applied to parallelize other applications consisting of many tasks on a heterogeneous computing platform consisting of a CPU and a GPU. Experimental results show that the performance of the proposed task distribution method on three different heterogeneous computing platforms can improve the performance of the typical GPU-only method where every tasks are executed on a deviceGPU by a factor of 1.5, 8.7 and 2.7, respectively.

High-Speed Implementations of Block Ciphers on Graphics Processing Units Using CUDA Library (GPU용 연산 라이브러리 CUDA를 이용한 블록암호 고속 구현)

  • Yeom, Yong-Jin;Cho, Yong-Kuk
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.3
    • /
    • pp.23-32
    • /
    • 2008
  • The computing power of graphics processing units(GPU) has already surpassed that of CPU and the gap between their powers is getting wider. Thus, research on GPGPU which applies GPU to general purpose becomes popular and shows great success especially in the field of parallel data processing. Since the implementation of cryptographic algorithm using GPU was started by Cook et at. in 2005, improved results using graphic libraries such as OpenGL and DirectX have been published. In this paper, we present skills and results of implementing block ciphers using CUDA library announced by NVIDIA in 2007. Also, we discuss a general method converting source codes of block ciphers on CPU to those on GPU. On NVIDIA 8800GTX GPU, the resulting speeds of block cipher AES, ARIA, and DES are 4.5Gbps, 7.0Gbps, and 2.8Gbps, respectively which are faster than the those on CPU.

Parallel Algorithm of Conjugate Gradient Solver using OpenGL Compute Shader

  • Va, Hongly;Lee, Do-keyong;Hong, Min
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.1
    • /
    • pp.1-9
    • /
    • 2021
  • OpenGL compute shader is a shader stage that operate differently from other shader stage and it can be used for the calculating purpose of any data in parallel. This paper proposes a GPU-based parallel algorithm for computing sparse linear systems through conjugate gradient using an iterative method, which perform calculation on OpenGL compute shader. Basically, this sparse linear solver is used to solve large linear systems such as symmetric positive definite matrix. Four well-known matrix formats (Dense, COO, ELL and CSR) have been used for matrix storage. The performance comparison from our experimental tests using eight sparse matrices shows that GPU-based linear solving system much faster than CPU-based linear solving system with the best average computing time 0.64ms in GPU-based and 15.37ms in CPU-based.

A 2D GPU-Accelerated High Resolution Numerical Scheme for Solving Diffusive Wave Equation (고해상도 수치기법을 이용한 GPU 기반 2D 확산파 모형)

  • Park, Seonryang;Kim, Dae-Hong
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.109-109
    • /
    • 2019
  • 본 연구에서는 강우-유출 과정 모의를 위한 GPU 기반 확산파 모형을 개발하였다. 확산파 방정식을 풀기위한 수치기법으로는 유한체적법을 이용하였으며, van Leer TVD limiter를 적용한 MUSCL 기법을 이용하여 각 셀의 인터페이스의 물리적 성질을 재구성하여 구하였다. 또한, 침투를 고려하기 위하여 Horton 침투 모형을 이용하였다. 개발된 모형을 이용하여 1D single overland plane과 2D V-shaped overland에서 강우-유출 과정을 모의실험을 하였으며, 각각 해석해와 dynamic wave model을 이용하여 계산된 수치 결과와 비교하여 본 모형의 정확성을 검증하였다. 또한, 1D와 2D의 기복이 심한 지형에 적용하여 강우-유출과정이 본 모형을 통하여 물리적으로 타당한 해석이 가능함을 검증하였다. 마지막으로 복잡한 실제 지형에 적용하였으며, 측정값과의 비교를 통하여 실제 유역에서의 확산파 모형의 적정성을 검증하였다. 또한, 본 연구에서는 NVIDIA사의 GPU인 Geforce GTX 1050과 GPU의 병렬 연산 처리 능력을 활용할 수 있는 NVIDIA사의 CUDA-Fortran을 이용하여 GPU 기반 확산파 모형을 개발하였다. PC windows에서 CPU(Intel i7, 4.70 GHz) 기반 모형 대비 GPU 기반 모형의 계산속도 성능을 비교한 결과, 격자 간격이 증가할수록 CPU 기반 모형 대비 GPU 기반 모형의 연산 효율이 증가하였으며, 격자 간격이 $3200{\times}3200$일 때, CPU 기반 모형 대비 GPU 기반 모형의 연산 효율이 최대 약 150배 증가하였다.

  • PDF

CPU Parallel Processing and GPU-accelerated Processing of UHD Video Sequence using HEVC (HEVC를 이용한 UHD 영상의 CPU 병렬처리 및 GPU가속처리)

  • Hong, Sung-Wook;Lee, Yung-Lyul
    • Journal of Broadcast Engineering
    • /
    • v.18 no.6
    • /
    • pp.816-822
    • /
    • 2013
  • The latest video coding standard HEVC was developed by the joint work of JCT-VC(Joint Collaborative Team on Video Coding) from ITU-T VCEG and ISO/IEC MPEG. The HEVC standard reduces the BD-Bitrate of about 50% compared with the H.264/AVC standard. However, using the various methods for obtaining the coding gains has increased complexity problems. The proposed method reduces the complexity of HEVC by using both CPU parallel processing and GPU-accelerated processing. The experiment result for UHD($3840{\times}2144$) video sequences achieves 15fps encoding/decoding performance by applying the proposed method. Sooner or later, we expect that the H/W speedup of data transfer rates between CPU and GPU will result in reducing the encoding/decoding times much more.