DOI QR코드

DOI QR Code

Trends in Lightweight Neural Network Algorithms and Hardware Acceleration Technologies for Transformer-based Deep Neural Networks

Transformer를 활용한 인공신경망의 경량화 알고리즘 및 하드웨어 가속 기술 동향

  • H.J. Kim ;
  • C.G. Lyuh
  • 김혜지 (초거대AI반도체연구실) ;
  • 여준기 (초거대AI반도체연구실)
  • Published : 2023.10.01

Abstract

The development of neural networks is evolving towards the adoption of transformer structures with attention modules. Hence, active research focused on extending the concept of lightweight neural network algorithms and hardware acceleration is being conducted for the transition from conventional convolutional neural networks to transformer-based networks. We present a survey of state-of-the-art research on lightweight neural network algorithms and hardware architectures to reduce memory usage and accelerate both inference and training. To describe the corresponding trends, we review recent studies on token pruning, quantization, and architecture tuning for the vision transformer. In addition, we present a hardware architecture that incorporates lightweight algorithms into artificial intelligence processors to accelerate processing.

Keywords

Acknowledgement

본 연구는 과학기술정보통신부 및 정보통신기획평가원의 차세대지능형반도체 기술개발사업의 일환으로 하였음[2020-0-01308, 딥러닝 초소형 코어 어레이 기반 지능형 모바일 프로세서].

References

  1. T. Brown et al., "Language models are few-shot learners," in Proc. NeurIPS 2020, (Vancouver, Canada), Dec. 2020, pp. 1877-1901.
  2. C.H. Lin et al., "Magic3d: High-resolution text-to-3d content creation," in Proc. IEEE/CVF CVPR 2023, (Vancouver, Canada), June 2023, pp. 300-309.
  3. U. Singer et al., "Make-a-video: Text-to-video generation without text-video data," arXiv preprint, CoRR, 2022, arXiv: 2209.14792.
  4. R. Huang et al., "Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models," arXiv preprint, CoRR, 2023, arXiv: 2301.12661.
  5. A. Vaswani et al., "Attention is all you need," in Proc. NIPS 2017, (Long Beach, CA, USA), Dec. 2017.
  6. https://openai.com/blog/chatgpt
  7. R. Rombach et al., "High-resolution image synthesis with latent diffusion models," in Proc. IEEE/CVF CVPR 2022, (New Orleans, LA, USA), June 2022, pp. 10684-10695.
  8. C. Yu et al., "Boost Vision Transformer with GPU-Friendly Sparsity and Quantization," in Proc. IEEE/CVF CVPR 2023, (Vancouver, Canada), June 2023, pp. 22658-22668.
  9. J. Shin et al., "NIPQ: Noise proxy-based integrated pseudo-quantization," in Proc. IEEE/CVF CVPR 2023, (Vancouver, Canada), June 2023, pp. 3852-3861.
  10. G. Fang et al., "Depgraph: Towards any structural pruning," in Proc. IEEE/CVF CVPR 2023, (Vancouver, Canada), June 2023, pp. 16091-16101.
  11. S. Wei et al., "Joint token pruning and squeezing towards more aggressive compression of vision transformers," in Proc. IEEE/CVF CVPR 2023, (Vancouver, Canada), June 2023, pp. 2092-2101.
  12. Y. Rao et al., "Dynamicvit: Efficient vision transformers with dynamic token sparsification," in Proc. NeurIPS 2021, (Virtual-only), Dec. 2021, pp. 13937-13949.
  13. E. Jang, S. Gu, and B. Poole, "Categorical reparameterization with gumbel-softmax," arXiv preprint, CoRR, 2016, arXiv: 1611.01144.
  14. L. Youwei et al., "Not all patches are what you need: Expediting vision transformers via token reorganizations," arXiv preprint, CoRR, 2022, arXiv: 2202.07800.
  15. H. Yang et al., "Global vision transformer pruning with hessian-aware saliency," in Proc. IEEE/CVF CVPR 2023, (Vancouver, Canada), June 2023, pp. 18547-18557.
  16. S.M. Moosavi-Dezfooli et al., "Robustness via curvature regularization, and vice versa," in Proc. IEEE/CVF CVPR 2019, (Long Beach, CA, USA), June 2019, pp. 9078-9086.
  17. Y. Huanrui et al., "Hero: Hessian-enhanced robust optimization for unifying and improving generalization and quantization performance," arXiv preprint, CoRR, 2021, arXiv: 2111.11986.
  18. Y. Shixing et al., "Hessian-aware pruning and optimal neural implant," in Proc. IEEE/CVF WACVi 2022, (Waikoloa, HI, USA), Jan. 2022, pp. 3880-3891.
  19. H. Wang, Z. Zhang, and S. Han, "Spatten: Efficient sparse attention architecture with cascade token and head pruning," in Proc. IEEE HPCA 2021, (Seoul, Rep. of Korea), Feb. 2021.
  20. Y. Qin et al., "FACT: FFN-attention Co-optimized transformer architecture with eager correlation prediction," in Proc. ISCA 2023, (Orlando, FL, USA), June 2023, pp. 1-14.