DOI QR코드

DOI QR Code

Design of an Optimized GPGPU for Data Reuse in DeepLearning Convolution

딥러닝 합성곱에서 데이터 재사용에 최적화된 GPGPU 설계

  • Received : 2021.12.13
  • Accepted : 2021.12.16
  • Published : 2021.12.31

Abstract

This paper proposes a GPGPU structure that can reduce the number of operations and memory access by effectively applying a data reuse method to a convolutional neural network(CNN). Convolution is a two-dimensional operation using kernel and input data, and the operation is performed by sliding the kernel. In this case, a reuse method using an internal register is proposed instead of loading kernel from a cache memory until the convolution operation is completed. The serial operation method was applied to the convolution to increase the effect of data reuse by using the principle of GPGPU in which instructions are executed by the SIMT method. In this paper, for register-based data reuse, the kernel was fixed at 4×4 and GPGPU was designed considering the warp size and register bank to effectively support it. To verify the performance of the designed GPGPU on the CNN, we implemented it as an FPGA and then ran LeNet and measured the performance on AlexNet by comparison using TensorFlow. As a result of the measurement, 1-iteration learning speed based on AlexNet is 0.468sec and the inference speed is 0.135sec.

본 논문은 합성곱 신경망에 데이터 재사용 방법을 효과적으로 적용하여 연산 횟수와 메모리 접근 횟수를 줄일 수 있는 GPGPU구조를 제안한다. 합성곱은 kernel과 입력 데이터를 이용한 2차원 연산으로 kernel이 slide하는 방법으로 연산이 이루어 진다. 이때, 합성곱 연산이 완료될 때 까지 kernel을 캐시메모리로 부터 전달 받는 것이 아니고 내부 레지스터를 이용하는 재사용 방법을 제안한다. SIMT방법으로 명령어가 실행되는 GPGPU의 원리 이용하여 데이터 재사용의 효과를 높이기 위해 합성곱에 직렬 연산 방식을 적용하였다. 본 논문에서는 레지스터기반 데이터 재사용을 위하여 kernel을 4×4로 고정하고 이를 효과적으로 지원하기 위한 warp 크기와 레지스터 뱅크를 갖는 GPGPU를 설계하였다. 설계된 GPGPU의 합성곱 신경망에 대한 성능을 검증하기 위해 FPGA로 구현한 뒤 LeNet을 실행시키고 TensorFlow를 이용한 비교 방법으로 AlexNet에 대한 성능을 측정하였다. 측정결과 AlexNet기준 1회 학습 속도는 0.468초이며 추론 속도는 0.135초이다.

Keywords

Acknowledgement

This work was supported by Seokyeong University in 2021 and by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIP). (No. 2016-0-00204, Development of mobile GPU hardware for photo-realistic realtime virtual reality)

References

  1. https://www.image-net.org/challenges/LSVRC/
  2. Alex Krizhevsky, Ilya Sutshever, Geoffrey E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," Communications of the ACM, Vol.60, No.6, pp.84-90, 2017. DOI: 10.1145/3065386.
  3. Kwang Yeob Lee, "Design of a High-Performance Mobile GPGPU with SIMT Architecture based on a Small-size Warp Scheduler," j.inst.Korean. electr.electron.eng, Vol.25, No.3, pp.479-484, 2021. DOI: 10.7471/ikeee.2021.25.3.479
  4. Ahmad Lashgar, A. Baniasadi, & A. Khonsari. "Investigating Warp Size Impact in GPUs. Computer Sciencear," ArXiv:1205.4967, 2012.
  5. Cheol-Won Jo, Kwang-Yeob Lee, Chi-Yong Kim, "Low-area DNN Core using data reuse technique," j.inst.Korean.electr.electron.eng, Vol.25, No.1, pp.229-233, 2021. DOI: 10.7471/ikeee.2021.25.1.229
  6. Chen, Yu-Hsin, Joel Emer, and Vivienne Sze. "Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks," 43rd ACM/IEEE International Symposium on Computer Architecture (ISCA), 2016. DOI: 10.1109/ISCA.2016.40
  7. Firas Al-Ali, Thilina Doremure Gamage, Hewa WTX Nanayakkara, Farhad Methdipour, Sayan Kumar Ray, "Novel Casestudy and Benchmarking of AlexNet for Edge AI: From CPU and GPU to FPGA," 2020 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), 2020. DOI: 10.1109/ CCECE477 87.2020.9255739,
  8. Sunayana Arya, Rajeev Singh, "A Comparative Study of CNN and AlexNet for Detection of Disease in Potato and Mango leaf," 2019 2nd International Conference on Issues and Challenges in Intelligent Computing Techniques(ICICT), Vol.1, pp.1-6, 2019. DOI: 10.1109/ICICT46931.2019.8977648,