Acknowledgement
This study is supported by a grant from the Institute of Information & Communications Technology Planning & Evaluation (IITP), funded by the Korean government (MSIT) (No. RS-2023-00277060, Development of OpenEdge AI SoC hardware and software platform).
References
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, PyTorch: an imperative style, high-performance deep learning library, (Proc. 33rd Int. Conf. Neural Inf. Process. Syst., Vol. 32, Curran Associates Inc., Red Hook, NY, USA), 2019.
- ONNX Contributors, Open Neural Network Exchange (ONNX), 2024. https://github.com/onnx/onnx,_2024. Accessed: 2024-03-18.
- M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, and M. Isard, TensorFlow: a system for large-scale machine learning, (12th USENIX Symp. Operating Syst. Des. Implementation (OSDI'16)., Savannah, GA, USA), 2016, pp. 265-283.
- M. Li, Y. Liu, X. Liu, Q. Sun, X. You, H. Yang, Z. Luan, L. Gan, G. Yang, and D. Qian, The deep learning compiler: a comprehensive survey, IEEE Trans. Parallel Distrib. Syst. 32 (2020), no. 3, 708-727.
- T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, and L. Ceze, TVM: an automated end-to-end optimizing compiler for deep learning, (13th USENIX Symp. Operating Syst. Des. Implementation (OSDI'18), Carlsbad, CA, USA), 2018, pp. 578-594.
- N. Rotem, J. Fix, S. Abdulrasool, G. Catron, S. Deng, R. Dzhabarov, N. Gibson, J. Hegeman, M. Lele, and R. Levenstein, Glow: graph lowering compiler techniques for neural networks, arXiv preprint, 2018. https://doi.org/10.48550/arXiv.1805.00907
- C. Leary and T. Wang, XLA: TensorFlow, compiled, 2017. TensorFlow Dev Summit.
- AiM Future, The future of artificial intelligence: AiM future's product lineup, 2023. https://aimfuture.ai. Accessed: 2024-03-22.
- OPENEDGES, Neural processing unit (NPU) IP-ENLIGHT, 2022. URL https://www.openedges.com/npu. Accessed: 2024-03-22.
- J.-W. Jang, S. Lee, D. Kim, H. Park, A. S. Ardestani, Y. Choi, C. Kim, Y. Kim, H. Yu, H. Abdel-Aziz, J.-S. Park, H. Lee, D. Lee, M. W. Kim, H. Jung, H. Nam, D. Lim, S. Lee, J.-H. Song, S. Kwon, J. Hassoun, S. Lim, and C. Choi, Sparsity-aware and re-configurable NPU architecture for Samsung Flagship Mobile SoC, (ACM/IEEE 48th Annu. Int. Symp. Comput. Archit., Valencia, Spain), 2021, pp. 15-28.
- Qualcomm, Unlocking on-device generative AI with an NPU and heterogeneous computing, 2024. https://www.qualcomm.com. Accessed: 2024-03-22.
- Apple, Deploying transformers on the Apple Neural Engine, 2023. https://machinelearning.apple.com/research/deployingtransformers-on-the-apple-neural-engine. Accessed: 2024-03-22.
- Y. Kwon, K. Vladimir, N. Kim, W. Shin, J. Won, M. Lee, H. Joo, H. Choi, G. Kim, and B. An, System architecture and software stack for GDDR6-AiM, (IEEE Hot Chips 34 Symp., Cupertino, CA, USA), 2022, pp. 1-25.
- Samsung, HBM-PIM: cutting-edge memory technology to accelerate next-generation AI, 2023. https://semiconductor.samsung.com/. Accessed: 2024-03-18.
- ETRI, NEST-C. https://gitlab.com/ones-ai/nest-compiler. Accessed: 2024-03-22.
- C. Lattner and V. Adve, LLVM: a compilation framework for lifelong program analysis & transformation, (Int. Symp. Code Gener. Optim., San Jose, CA, USA), 2004, pp. 75-86.
- J. Roesch, S. Lyubomirsky, L. Weber, J. Pollock, M. Kirisame, T. Chen, and Z. Tatlock, Relay: a new IR for machine learning frameworks, (Proc. 2nd ACM SIGPLAN Int. Workshop Mach. Learn. Program. Lang., Association for Computing Machinery, Philadelphia, PA, USA), 2018, pp. 58-68.
- J. Dean, Machine learning for systems and systems for machine learning, Presentation at 2017 Conf. Neural Inf. Process. Syst., Curran Associates, Long Beach, CA, USA, 2017.
- Meta, Glow's Graph IR optimization. https://github.com/pytorch/glow/blob/master/docs/Optimizations.md. Accessed: 2024-02-22.
- J. Lee, M. Yu, Y. Kwon, and T. Kim, Quantune: Post-training quantization of convolutional neural networks using extreme gradient boosting for fast deployment, Future Gener. Comput. Syst. 132 (2022), 124-135.
- Y. Misun, K. Yongin, L. Jemin, P. Jeman, P. Junmo, and K. Taeho, PartitionTuner: an operator scheduler for deep-learning compilers supporting multiple heterogeneous processing units, ETRI J. 45 (2023), no. 2, 318-328.
- R. Sousa, M. Pereira, Y. Kwon, T. Kim, N. Jung, C. S. Kim, M. Frank, and G. Araujo, Tensor slicing and optimization for multicore NPUs, J. Parallel Distrib. Comput. 175 (2023), 66-79.
- T. Moreau, T. Chen, L. Vega, J. Roesch, E. Yan, L. Zheng, J. Fromm, Z. Jiang, L. Ceze, C. Guestrin, and A. Krishnamurthy, A hardware-software blueprint for flexible deep learning specialization, IEEE Micro 39 (2019), no. 5, 8-16.
- J. Deng, W. Dong, and R. Socher, ImageNet: a large-scale hierarchical image database, (IEEE Conf. Comput. Vision Pattern Recognit., Miami, FL, USA), 2009, pp. 248-255.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, (Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), Boston, MA, USA), 2015, pp. 1-9.
- S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, Aggregated residual transformations for deep neural networks, (IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), Honolulu, HI, USA), 2017, pp. 1492-1500.
- K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, (IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), Las Vegas, NV, USA), 2016, pp. 770-778.
- L. Deng, The MNIST database of handwritten digit images for machine learning research, IEEE Signal Process. Mag. 29 (2012), no. 6, 141-142.
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86 (1998), no. 11, 2278-2324.
- F. N. Iandola, S. Han, and M. W. Moskewicz, SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size, arXiv preprint, 2016. https://doi.org/10.48550/arXiv.1602.07360