Acknowledgement
This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2018-0-00769: Neuromorphic Computing Software Platform for Artificial Intelligence Systems and No.2022-0-00454: Technology development of smart edge device SW development platform).
References
- HISILICON, Kirin, 2022. https://www.hisilicon.com/en/products/Kirin
- NVIDIA, Jetson, 2022. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/
- Samsung, Exynos, 2022. https://semiconductor.samsung.com/processor/mobile-processor/
- T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L. Ceze, and C. Guestrin, TVM: An automated end-to-end optimizing compiler for deep learning, (13th USENIX Symposium on Operating Systems Design and Implementation, Carlsbad, CA, USA), 2018, pp. 578-594.
- S. Cyphers, A. K. Bansal, A. Bhiwandiwalla, J. Bobba, M. Brookhart, A. Chakraborty, W. Constable, C. Convey, L. Cook, O. Kanawi, R. Kimball et al., Intel nGraph: An intermediate representation, compiler, and executor for deep learning, arXive preprint, 2018. https://doi.org/10.48550/arXiv.1801.08058
- C. Leary and T. Wang, XLA: TensorFlow, compiled, 2017. TensorFlow Dev Summit.
- W.-F. Lin, D.-Y. Tsai, L. Tang, C.-T. Hsieh, C.-Y. Chou, P.-H. Chang, and L. Hsu, ONNC: A compilation framework connecting ONNX to proprietary deep learning accelerators, (IEEE International Conference on Artificial Intelligence Circuits and Systems, Hsinchu, Taiwan), 2019, pp. 214-218.
- N. Rotem, J. Fix, S. Abdulrasool, G. Catron, S. Deng, R. Dzhabarov, N. Gibson, J. Hegeman, M. Lele, R. Levenstein, and J. Montgomery, Glow: Graph lowering compiler techniques for neural networks, arXive preprint, 2018. https://doi.org/10.48550/arXiv.1805.00907
- M. Zhang, Z. Hu, and M. Li, DUET: A compiler-runtime subgraph scheduling approach for tensor programs on a coupled CPU-GPU architecture, (IEEE International Parallel and Distributed Processing Symposium, IEEE Portland, OR, 2021, pp. 151-161.
- ETRI, NEST-C, 2021. https://github.com/etri/nest-compiler
- Y. Ding, L. Zhu, Z. Jia, G. Pekhimenko, and S. Han, IOS: Interoperator scheduler for CNN acceleration, Proc. Machine Learn. Syst. 3 (2021), 167-180.
- L. Ma, Z. Xie, Z. Yang, J. Xue, Y. Miao, W. Cui, W. Hu, F. Yang, L. Zhang, and L. Zhou, RAMMER: Enabling holistic deep learning compiler optimizations with rTasks, (14th USENIX Symposium on Operating Systems Design and Implementation), 2020, pp. 881-897.
- T. Moreau, T. Chen, Z. Jiang, L. Ceze, C. Guestrin, and A. Krishnamurthy, VTA: an open hardware-software stack for deep learning, arXive preprint, 2018. arXiv preprint arXiv: 1807.04188. https://doi.org/10.48550/arXiv.1807.04188
- ONNX, ONNX operators, 2022. https://github.com/onnx/onnx/blob/main/docs/Operators.md
- Y. Xing, S. Liang, L. Sui, X. Jia, J. Qiu, X. Liu, Y. Wang, Y. Shan, and Y. Wang, DNNVM: End-to-end compiler leveraging heterogeneous optimizations on FPGA-based CNN accelerators, IEEE Trans. Comput.-Aided Design Integrated Circ. Syst. 39 (2020), no. 10, 2668-2681. https://doi.org/10.1109/TCAD.2019.2930577
- J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei, ImageNet: A large-scale hierarchical image database, (IEEE Conference on Computer Vision and Pattern Recognition IEEE, Miami, FL, USA), 2009, pp. 248-255.
- M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, European Conference on Computer Vision, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, (eds.), Springer, Cham, 2014, pp. 818-833.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM. 60 (2017), no. 6, 84-90. https://doi.org/10.1145/3065386
- C. Szegedy, W. Liu, Y. Jia, et al., Going deeper with convolutions, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA), 2015, pp. 1-9.
- K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA), 2016, pp. 770-778.
- S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, Aggregated residual transformations for deep neural networks, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA), 2017, pp. 1492-1500.
- F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size, arXive preprint, 2016. https://doi.org/10.48550/arXiv.1602.07360
- ONNX, Onnx model zoo, 2022. https://github.com/onnx/models
- N. Vasilache, O. Zinenko, T. Theodoridis, P. Goyal, Z. DeVito, W. S. Moses, S. Verdoolaege, A. Adams, and A. Cohen, Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions, aeXive preprint, 2018. https://doi.org/10.48550/arXiv.1802.04730
- N. P. Jouppi, C. Young, N. Patil, et al., In-datacenter performance analysis of a tensor processing unit, (Proceedings of the 44th Annual International Symposium on Computer Architecture, Association for Computing Machinery, Toronto, Canada), 2017, pp. 1-12.
- Z. Chen, C. H. Yu, T. Morris, J. Tuyls, Y. H. Lai, J. Roesch, E. Delaye, V. Sharma, and Y. Wang, Bring your own codegen to deep learning compiler, arXive preprint, 2021. https://doi.org/10.48550/arXiv.2105.03215