Acknowledgement
Next-generation Intelligence Semiconductor Foundation grant funded by the Korea government (MSIT) (no. 2020-0-01308, Intelligent Mobile Processor based on Deep-Learning Micro Core Array).
References
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you FIGURE 11 Overall performance for block layer of GPT2-L. need, Adv. Neural Inf. Process. Syst. 30 (2017), 261-272.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, and S. Gelly, An image is worth 16 × 16 words: transformers for image recognition at scale, arXiv preprint, 2020. https://doi.org/10.48550/arXiv.2010.11929
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Swin transformer: hierarchical vision transformer using shifted windows, (Proceedings of the IEEE/CVF Int. Conf. on Computer Vision, Montreal, Canada), 2021, pp. pp. 10012-10022.
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, Language models are unsupervised multitask learners, OpenAI blog 1 (2019), no. 8, 9.
- A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, Zero-shot text-to-image generation, (Int. Conf. Mach. Learn., Virtual Only), 2021, pp. 8821-8831.
- Y. C. P. Cho, J. Chung, J. Yang, C.-G. Lyuh, H. Kim, C. Kim, J. Ham, M. Choi, K. Shin, J. Han, and Y. Kwon, AB9: A neural processor for inference acceleration, ETRI J. 42 (2020), no. 4, 491-504.
- J. Chung, H. Kim, K. Shin, C.-G. Lyuh, Y. C. P. Cho, J. Han, Y. Kwon, Y.-H. Gong, and S. W. Chung, A layer-wise frequency scaling for a neural processing unit, ETRI J. 44 (2022), no. 5, 849-858.
- W. Jeon, Y. C. P. Cho, H. M. Kim, H. Kim, J. Chung, J. Kim, M. Lee, C.-G. Lyuh, J. Han, and Y. Kwon, M3FPU: multiformat matrix multiplication FPU architectures for neural network computations, (IEEE 4th Int. Conf. Artif. Intell. Circuits Syst., Incheon, Rep. of Korea), 2022, pp. 150-153.
- N. Jouppi, G. Kurian, S. Li, P. Ma, R. Nagarajan, L. Nai, N. Patil, S. Subramanian, A. Swing, B. Towles, and C. Young, TPU v4: an optically reconfigurable supercomputer for machine learning with hardware support for embeddings, (Proc. 50th Annual Int. Symp. Comput Architecture, Orlando, FL, USA), 2023, pp. 1-14.
- H. Kim, C.-G. Lyuh, and Y. Kwon, Automated optimization for memory-efficient high-performance deep neural network accelerators, ETRI J. 42 (2020), no. 4, 505-517.
- H. Face, GPT-2, 2024. Available from: https://huggingface.co/openai-community/gpt2 [last accessed March 2024].
- E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, GPTQ: accurate post-training quantization for generative pre-trained transformers, arXiv preprint, 2022. https://doi.org/10.48550/arXiv.2210.17323
- J. Lin, J. Tang, H. Tang, S. Yang, W.-M. Chen, W.-C. Wang, G. Xiao, X. Dang, C. Gan, and S. Han, AWQ: activation-aware weight quantization for on-device LLM compression and acceleration, Proc. Mach. Learn. Syst. 6 (2024), 87-100.
- B. D. Rouhani, R. Zhao, A. More, M. Hall, A. Khodamoradi, S. Deng, D. Choudhary, M. Cornea, E. Dellinger, K. Denolf, and S. Dusan, Microscaling data formats for deep learning, arXiv preprint, 2023. https://doi.org/10.48550/arXiv.2310.10537
- X. Sun, J. Choi, C.-Y. Chen, N. Wang, S. Venkataramani, V. V. Srinivasan, X. Cui, W. Zhang, and K. Gopalakrishnan, Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks, Adv. Neural Inf. Process. Syst. 32 (2019), 4900-4909.
- H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Roziere, N. Goyal, E. Hambro, F. Azhar, and A. Rodriguez, LLAMA: open and efficient foundation language models, arXiv preprint, 2023. https://doi.org/10.48550/arXiv.2302.13971
- H. Face, GPT-2 medium, 2024. Available from: https://huggingface.co/openai-community/gpt2-medium [last accessed March 2024].
- H. Face, GPT-2 large, 2024. Available from: https://huggingface.co/openai-community/gpt2-large [last accessed March 2024].
- D. Kalamkar, D. Mudigere, N. Mellempudi, D. Das, K. Banerjee, S. Avancha, D. T. Vooturi, N. Jammalamadaka, J. Huang, H. Yuen, and J. Yang, A study of BFLOAT16 for deep learning training, arXiv preprint, 2019. https://doi.org/10.48550/arXiv.1905.12322
- P. Micikevicius, D. Stosic, N. Burgess, M. Cornea, P. Dubey, R. Grisenthwaite, S. Ha, A. Heinecke, P. Judd, J. Kamalu, and N. Mellempudi, FP8 formats for deep learning, arXiv preprint, 2022. https://doi.org/10.48550/arXiv.2209.05433
- X. Sun, N. Wang, C.-Y. Chen, J. Ni, A. Agrawal, X. Cui, S. Venkataramani, K. El Maghraoui, V. V. Srinivasan, and K. Gopalakrishnan, Ultra-low precision 4-bit training of deep neural networks, Adv. Neural Inf. Process. Syst. 33 (2020), 1796-1807.