DOI QR코드

DOI QR Code

Design and Implementation of Accelerator Architecture for Binary Weight Network on FPGA with Limited Resources

한정된 자원을 갖는 FPGA에서의 이진가중치 신경망 가속처리 구조 설계 및 구현

  • Received : 2020.03.06
  • Accepted : 2020.03.23
  • Published : 2020.03.31

Abstract

In this paper, we propose a method to accelerate BWN based on FPGA with limited resources for embedded system. Because of the limited number of logic elements available, a single computing unit capable of handling Conv-layer, FC-layer of various sizes must be designed and reused. Also, if the input feature map can not be parallel processed at one time, the output must be calculated by reading the inputs several times. Since the number of available BRAM modules is limited, the number of data bits in the BWN accelerator must be minimized. The image classification processing time of the BWN accelerator is superior when compared with a embedded CPU and is faster than a desktop PC and 50% slower than a GPU system. Since the BWN accelerator uses a slow clock of 50MHz, it can be seen that the BWN accelerator is advantageous in performance versus power.

본 연구에서는 임베디드 시스템에 적용하기 위해 자원이 제한된 조건의 FPGA를 기반으로 BWN 가속처리를 하는 방법을 제시하였다. 사용할 수 있는 로직의 개수가 제한적이기 때문에 다양한 크기의 Conv-layer, FC-layer를 처리할 수 있는 하나의 연산장치를 설계해서 재활용하였다. Input feature map 데이터를 한번에 병렬처리를 할 수 없는 경우 데이터를 여러 번 읽어서 중간결과를계산하고 합산하여 최종 출력을 계산하였다. 사용할 수 있는 BRAM 모듈 개수가 제한적이기 때문에 BWN 가속기내의 데이터 bit수를 최소화한 구조를 사용하였다. 구현한 BWN가속기의 이미지 분류 처리 시간은 소형 시스템과 비교하였을 때 처리시간 측면에서 우수함을 보였고 고성능 시스템과 비교하였을 때는 데스크탑 PC보다는 빠르고 높은 클럭속도의 GPU시스템의 50%정도 느렸다. BWN가속기는 50MHz의 느린 clock을 사용하므로 성능대비 전력측면에서 유리함을 확인할 수 있었다.

Keywords

References

  1. C. Zhang and P. Li, "Optimizing Fpga-based accelerator design for deep convolutional neural networks," in FPGA'15, pp.161-170, 2015. DOI: 10.1145/2684746.2689060
  2. J. Qiu and J. Wang, "Going deeper with embedded FPGA platform for convolutional neural network," in FPGA'16, pp.26-35, 2016. DOI: 10.1145/2847263.2847265
  3. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv:1409.1556, 2014.
  4. Y. H. Chen and T. Krishna, "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks," in 2016 IEEE Int. Solid-State Circuits Conf.(ISSCC), pp.262-263, 2016. DOI: 10.1109/JSSC.2016.2616357
  5. K. He and X. Zhang. "Deep residual learning for image recognition," arXiv:1512.0338, 2015.
  6. S. Gupta, A. Agrawal, et.al, "Deep learning with limited numerical precision," arXiv:1502.02551, 2015.
  7. D. Lin, S. Talathi, V. Annapureddy, "Fixed point quantization of deep convolutional networks," arXiv:1511.06393, 2016.
  8. M. Courbariaux, Y. Bengio, and J.-P. David, "BinaryConnect: Training deep neural networks with binary weights during propagations," in Proc. Adv. Neural Inf. Process. Syst., pp.3123-3131, 2015.
  9. M. Courbariaux and Y. Bengio, "Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1,"CoRR. 2016.
  10. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Binarized neural networks," in Proc. Adv. Neural Inf. Process. Syst., pp.4107-4115, 2016.
  11. M. Rastegari and V. Ordonez, "XNOR-Net: ImageNet classication using binary convolutional neural networks," In Proc. the European Conf. Computer Vision(ECCV'16), pp.525-542, 2016.
  12. R. Zhao and W. Song, "Accelerating binarized convolutional neural networks with softwareprogrammable FPGAs," in FPGA'17, pp.15-24, 2017. DOI: 10.1145/3020078.3021741
  13. Y. Umuroglu and N. J. Fraser, "FINN: a framework for fast, scalable binarized neural network inference," in FPGA'17, pp.65-74, 2017. DOI: 10.1145/3020078.3021744
  14. S. Liang and S. Yin, "FP-BNN: Binarized neural network on FPGA," Neurocomputing, vol. 275, pp.1072-1086, 2018. DOI: 10.1016/j.neucom.2017.09.046
  15. R. Andri and L. Cavigelli, "YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights," in ISVLSI '16, pp.236-241, 2016. DOI: 10.1109/ISVLSI.2016.111
  16. CIFAR-10 and CIFAR-100 datasets, https://www.cs.toronto.edu/-kriz/cifar.html
  17. J. H. Kim and S. K. Yun, "Accuracy analysis of fixed point arithmetic for hardware implementation of binary weight network," Journal of IKEEE, Vol.22, No.3, 805-809, 2019. DOI: 10.7471/ikeee.2018.22.3.805