• Title/Summary/Keyword: GP-GPU

Search Result 20, Processing Time 0.023 seconds

GP-GPU based Parallelization for Urban Terrain Atmospheric Model CFD_NIMR (도시기상모델 CFD_NIMR의 GP-GPU 실행을 위한 병렬 프로그램의 구현)

  • Kim, Youngtae;Park, Hyeja;Choi, Young-Jeen
    • Journal of Internet Computing and Services
    • /
    • v.15 no.2
    • /
    • pp.41-47
    • /
    • 2014
  • In this paper, we implemented a CUDA Fortran parallel program to run the CFD_NIMR model on GP-GPU's, which simulates air diffusion on urban terrains. A GP-GPU is graphic processing unit in the form of a PCI card, and a general calculation accelerator to perform a large amount of high speed calculations with low cost and electric power. The GP-GPU gives performance enhancement of speed by 15 times to compare the Nvidia Tesla C1060 GPU with Intel XEON 2.0 GHz CPU. In addition, the program on a GP-GPU shows efficient performance compared to an MPI parallel program on multiple CPU's. It is expected that a proposed programming method on the GP-GPU parallel program can be used for numerical models with a similar structure.

Thread Distribution Method of GP-GPU for Accelerating Parallel Algorithms (병렬 알고리즘의 가속화를 위한 GP-GPU의 Thread할당 기법)

  • Lee, Kwan-Ho;Kim, Chi-Yong
    • Journal of IKEEE
    • /
    • v.21 no.1
    • /
    • pp.92-95
    • /
    • 2017
  • In this paper, we proposed a way to improve function of small scale GP-GPU. Instead of using superscalar which increase scheduling-complexity, we suggested the application of simple core to maximize GP-GPU performance. Our studies also demonstrated that simplified Stream Processor is one of the way to achieve functional improvement in GP-GPU. In addition, we found that developing of optimal thread-assigning method in Warp Scheduler for specific application improves functional performance of GP-GPU. For examination of GP-GPU functional performance, we suggested the thread-assigning way which coordinated with Deep-Learning system; a part of Neural Network. As a result, we found that functional index in algorithm of Neural Network was increased to 90%, 98% compared with Intel CPU and ARM cortex-A15 4 core respectively.

Implementation of a 3D Graphics Simulator for GP-GPU (GP-GPU 개발을 위한 3차원 그래픽 시뮬레이터 구현)

  • Yeo, Dong-young;Kim, Woo-young;Jung, Hyung-Ki;Lee, Kwang-Yeob
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.337-340
    • /
    • 2009
  • Since a hardware accelerator for 3D graphics processing GPU(Graphics Processing Unit)'s performance has been improving constantly. This is the efficient way was introduced for complex graphics application, but it is rarely used to utilize 100% resources on GPU. GP-GPU(general-purpose GPU), including operations on the GPU and supporting common operations can be handled by the processor, is noted by depending on the distribution of resources that can be effectively controlled. In this paper, the simulator was implemented that supports virtual environment of GP-GPU and available for program design and debugging. Through this, the co-design development environment support simultaneous design fast and reliable verification that are available to build the interface of three-dimensional graphics display.

  • PDF

Implementation of parallel blocked LU decomposition program for utilizing cache memory on GP-GPUs (GP-GPU의 캐시메모리를 활용하기 위한 병렬 블록 LU 분해 프로그램의 구현)

  • Kim, Youngtae;Kim, Doo-Han;Yu, Myoung-Han
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.41-47
    • /
    • 2013
  • GP-GPUs are general purposed GPUs for numerical computation based on multiple threads which are originally for graphic processing. GP-GPUs provide cache memory in a form of shared memory which user programs can access directly, unlikely typical cache memory. In this research, we implemented the parallel block LU decomposition program to utilize cache memory in GP-GPUs. The parallel blocked LU decomposition program designed with Nvidia CUDA C run 7~8 times faster than nun-blocked LU decomposition program in the same GP-GPU computation environment.

Pedestrian Inference Convolution Neural Network Using GP-GPU (GP-GPU를 이용한 보행자 추론 CNN)

  • Jeong, Junmo
    • Journal of IKEEE
    • /
    • v.21 no.3
    • /
    • pp.244-247
    • /
    • 2017
  • In this paper, we implemented a convolution neural network using GP-GPU. After defining the structure, CNN performed inferencing using the GP-GPU with 256 threads, which was the previous study, using the weight obtained from the training. Training used Intel i7-4470 CPU and Matlab. Dataset used Daimler Pedestrian Dataset. The GP-GPU is controlled by the PC using PCIe and operates as an FPGA. We assigned a thread according to the depth and size of each layer. In the case of the pooling layer, we used over warpping pooling to perform additional operations on the horizontal and vertical regions. One inferencing takes about 12 ms.

Implementation of IQ/IDCT in H.264/AVC Decoder Using GP-GPU (GP-GPU를 이용한 H.264/AVC 디코더의 IQ/IDCT구현)

  • Jeong, Jun-Mo;Lee, Kwang-Yeob
    • Journal of IKEEE
    • /
    • v.14 no.2
    • /
    • pp.76-81
    • /
    • 2010
  • The need for dedicated hardware continue to decrease as the mobile CPU's performance increases. But, there is a limit to a mobile CPU's performance. GP-GPU(General-Purpose computing on Graphics Processing Units) can improve performance without adding other dedicated hardware. This paper presents the implementation of Inverse Quantization, Inverse DCT and Color Space Conversion module in H.264/AVC decoder using GP-GPU for a mobile environments. The proposed architecture improves approximately 40% of performance when it use all the features.

Geometry Processing using Multi-Core GP-GPU (멀티코어 GP-GPU를 이용한 지오메트리 처리)

  • Lee, Kwang-Yeob;Kim, Chi-Yong
    • Journal of IKEEE
    • /
    • v.14 no.2
    • /
    • pp.69-75
    • /
    • 2010
  • A 3D graphics pipeline is largely divided into geometry stage and rendering stage. In this paper, we propose a method that accelerates a geometry processing in multi-core GP-GPU, using dual-phase structure. It can be improved by parallel data processing using SIMD of GP-GPU, dual-phase structure and memory prefetch. The proposed architecture improves approximately 19% of performance when it use all the features.

The parallelization of binarization using a GP-GPU

  • Han, Seong Hyeon;Yoo, Suk Won
    • International Journal of Advanced Culture Technology
    • /
    • v.4 no.4
    • /
    • pp.57-63
    • /
    • 2016
  • In this paper, we propose the optimized binarization in the GP-GPU. Because the binarinztion is esily paralledlized, we propose two ways of binary operations that utilize GP-GPU. The first method was to divide data load, subtraction and conversion, data store. The second method was processed collectibely. The second method was 2.52 times faster than the first method. After synthesizing the GP-GPU to the FPGA, the GP-GPU on the binarization were compared with the binarization on the ODROID XU. The binarization on the GP-GPU was 1.89 times faster than the binarization on the ODROID XU.

A Design of a High Performance Stream Processor without Superscalar Architecture (슈퍼스칼라 구조를 갖지 않는 고성능 Stream Processor 설계)

  • Lee, Kwan-Ho;Kim, Chi-Yong
    • Journal of IKEEE
    • /
    • v.21 no.1
    • /
    • pp.77-80
    • /
    • 2017
  • In this paper, we proposed a way to improve performance of GP-GPU by deletion of superscalar issue from its original form. At first, we simplified the structure of stream processor in order to eliminate superscalar issue. Under this condition, preservation of hardware size and increasing of thread number were followed by functional improvement of GP-GPU. As the number of thread was getting larger, we proposed the new model of warp scheduler which adjusts the group of thread. This superscalar issue-deleted warp scheduler transferred the instructions to warp which was activated by Round Robin Scheduling. Performance comparison was conducted by Gaussian filtering and the results indicated that our newly designed GP-GPU showing 7.89 times better in its performance than original one.

The Pixel Shading on Multi Core GP-GPU with Dual Phase Architecture (듀얼 페이즈 구조의 멀티 코어 GP-GPU를 이용한 픽셀 셰이딩)

  • Kim, Jun-Seo;Park, Tae-Ryong;Lee, Kwang-Yeob
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.10a
    • /
    • pp.339-342
    • /
    • 2010
  • 최근 프로세서가 클럭 향상의 한계에 부딪힘에 따라, 프로세서의 성능을 향상시키기 위해 멀티 코어 기반의 병렬처리를 이용한 방법들이 제안 되고 있다. 본 논문은 여러개의 연산기를 한 명령어 사이클에 동시에 사용할 수 있는 MIMD(Multiple Instruction, Multiple Data) 구조를 가지며, Scratch Counter를 이용해 멀티 코어와 멀티 스레드의 작업을 할당하는 구조의 GP-GPU(General Purpose - Graphics Processing Unit)를 활용해 멀티 코어, 멀티 스레드 환경에서의 효율적인 픽셀 셰이딩 방법을 설계 하였다. 선형 안개 픽셀 셰이딩의 경우 싱글코어에서 18.3 FPS이며 4개의 멀티코어 GP-GPU에서는 4배가 증가한 73.2 FPS 결과를 얻었다.

  • PDF