DOI QR코드

DOI QR Code

Thread Distribution Method of GP-GPU for Accelerating Parallel Algorithms

병렬 알고리즘의 가속화를 위한 GP-GPU의 Thread할당 기법

  • Received : 2017.03.24
  • Accepted : 2017.03.29
  • Published : 2017.03.31

Abstract

In this paper, we proposed a way to improve function of small scale GP-GPU. Instead of using superscalar which increase scheduling-complexity, we suggested the application of simple core to maximize GP-GPU performance. Our studies also demonstrated that simplified Stream Processor is one of the way to achieve functional improvement in GP-GPU. In addition, we found that developing of optimal thread-assigning method in Warp Scheduler for specific application improves functional performance of GP-GPU. For examination of GP-GPU functional performance, we suggested the thread-assigning way which coordinated with Deep-Learning system; a part of Neural Network. As a result, we found that functional index in algorithm of Neural Network was increased to 90%, 98% compared with Intel CPU and ARM cortex-A15 4 core respectively.

본 논문에서는 적은 면적의 GP-GPU에서 성능을 향상시키기 위한 방법을 제안한다. 본 논문에서는 superscalar와 같이 과도하게 스케줄링 복잡성을 증가시키지 않는 대신 단순한 코어의 수를 늘려 성능을 극대화 시키는 방법을 제안한다. GP-GPU를 구성하는 Stream Processor의 구조를 단순화한다. 또한, Warp Schedule에서 thread 할당을 어플리케이션에 적합한 방법을 개발하여 성능을 개선한다. 성능을 검증하는 방안으로 neural network의 한 분야인 딥러닝에 대한 스레드 할당방식을 제안한다. Neural Network 알고리즘의 경우 Intel CPU 대비 90%에서 ARM Cortex-A15 4 core 대비 98% 성능 향상을 확인할 수 있었다.

Keywords

References

  1. Shuai , Tao Li, Qiankun Dong, Xuechen Liu, Yule Yang, "CPU-assisted GPU thread pool model for dynamic task parallelism," Networking, Architecture and Storage (NAS), 2015 IEEE International Conference on, 2015 DOI: 10.1109/NAS.2015.7255234
  2. Seonghyeon Han, Sukwon Yoo, "The parallelization of binarization using a GP-GPU," The International Journal of Advanced Culture Technology, vol. 4, no. 4,, 2016
  3. Tariq Rashid, "Make Your Own Neural Network," Hanbit media, 2017
  4. Gyutaek Kyung, "A design of a SIMT architecture based GP-GPU using multi-banked cache memory structure," Master thesis, Seokyeong University, 2015.
  5. Yun-Seop Hwang, Hee-Kyeong Jeon, Kwan-ho Lee, Kwang-yeob Lee, "Implementation of the SIMT based image signal processor for the image processing," j.inst.Korean.electr.electron.eng, vol 20, no.1, pp89-93, Apr, 2016
  6. Odroid, "Odroid-XU," http://www.hardkernel.com
  7. Raspberrypi, "raspberrypi," http://www.raspberrypi.org