Analysis of Deep learning Quantization Technology for Micro-sized IoT devices

YoungMin KIM;KyungHyun Han;Seong Oun Hwang;

doi:10.20465/KIOTS.2023.9.1.009

사물인터넷융복합논문지 (Journal of Internet of Things and Convergence)

제9권1호
/
Pages.9-17
/
2023
/
2799-4791(pISSN)

한국사물인터넷학회 (The Korea Internet of Things Society)

DOI QR Code

초소형 IoT 장치에 구현 가능한 딥러닝 양자화 기술 분석

Analysis of Deep learning Quantization Technology for Micro-sized IoT devices

김영민 (가천대학교 IT융합공학과) ;
한경현 (홍익대학교 전자전산공학과) ;
황성운 (가천대학교 컴퓨터공학과)

YoungMin KIM (Department of IT Convergence Engineering, Gachon University) ;
KyungHyun Han (Department of Electronics and Computer Engineering, Hongik University) ;
Seong Oun Hwang (Department of Computer Engineering, Gachon University)

투고 : 2022.11.03
심사 : 2022.12.14
발행 : 2023.02.28

https://doi.org/10.20465/KIOTS.2023.9.1.009 인용 PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

많은 연산량을 가진 딥러닝은 초소형 IoT 장치나 모바일 장치에 구현하기가 어렵다. 최근에는 이러한 장치에서도 딥러닝을 구현할 수 있도록 모델의 연산량을 줄이는 딥러닝 경량화 기술이 소개되었다. 양자화는 연속적인 분포를 가지는 파라미터 값들을 고정된 비트의 이산 값으로 표현하여 모델의 메모리 및 크기 등을 줄여 효율적으로 사용할 수 있는 경량화 기법이다. 그러나 양자화로 인한 이산 값 표현으로 인해 모델의 정확도가 낮아지게 된다. 본 논문에서는 정확도를 개선할 수 있는 다양한 양자화 기술을 소개한다. 먼저 기존 양자화 기술 중 APoT와 EWGS를 선택하여 동일한 환경에서 실험을 통해 결과를 비교 분석하였다. 선택된 기술은 ResNet모델에서 CIFAR-10 또는 CIFAR-100 데이터 세트로 훈련되고 테스트 되었다. 실험 결과 분석을 통해 기존 양자화 기술의 문제점을 파악하고 향후 연구에 대한 방향성을 제시하였다.

Deep learning with large amount of computations is difficult to implement on micro-sized IoT devices or moblie devices. Recently, lightweight deep learning technologies have been introduced to make sure that deep learning can be implemented even on small devices by reducing the amount of computation of the model. Quantization is one of lightweight techniques that can be efficiently used to reduce the memory and size of the model by expressing parameter values with continuous distribution as discrete values of fixed bits. However, the accuracy of the model is reduced due to discrete value representation in quantization. In this paper, we introduce various quantization techniques to correct the accuracy. We selected APoT and EWGS from existing quantization techniques, and comparatively analyzed the results through experimentations The selected techniques were trained and tested with CIFAR-10 or CIFAR-100 datasets in the ResNet model. We found out problems with them through experimental results analysis and presented directions for future research.

키워드

과제정보

이 논문은 2021년도 가천대학교 교내연구비 지원에 의한 결과임.(GCU-202104500001)

참고문헌

Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861, 2017.
Blalock, Davis, et al. "What is the state of neural network pruning?." Proceedings of machine learning and systems 2, pp.129-146, 2020.
Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531, 2015.
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. "Binarized neural networks." Advances in neural information processing systems 29, 2016.
Raghuraman Krishnamoorthi. "Quantizing deep convolutional networks for efficient inference: A whitepaper." arXiv preprint arXiv:1806.08342, 2018.
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. "Quantization and training of neural networks for efficient integer-arithmetic-only inference." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.2704-2713, 2018.
Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev, and Paulius Micikevicius. "Integer quantization for deep learning inference: Principles and empirical evaluation." arXiv preprint arXiv:2004.09602, 2020.
Song Han, Huizi Mao, and William J Dally. "Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding." arXiv preprint arXiv:1510.00149, 2015.
Zhaohui Yang, Yunhe Wang, Kai Han, Chunjing Xu, Chao Xu, Dacheng Tao, and Chang Xu. "Searching for low-bit weights in quantized neural networks." Advances in neural information processing systems 33, pp.4091-4102, 2020.
Kohei Yamamoto. "Learnable companding quantization for accurate low-bit neural networks." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.5029-5038, 2021.
Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev. "Compressing deep convolutional networks using vector quantization." arXiv preprint arXiv:1412.6115, 2014.
Yang, Jiwei, et al. "Quantization networks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.7308-7316, 2019.
Gong, Ruihao, et al. "Differentiable soft quantization: Bridging full-precision and low-bit neural networks." Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.4852-4861, 2019.
Kim, Dohyung, Junghyup Lee, and Bumsub Ham. "Distance-aware quantization." Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.5271-5280, 2021.
Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. "Incremental network quantization: Towards lossless cnns with low-precision weights." arXiv preprint arXiv:1702.03044, 2017.
Yuhang Li, Xin Dong, and Wei Wang. "Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks." In International Conference on Learning Representations, 2020.
Lee, Junghyup, Dohyung Kim, and Bumsub Ham. "Network quantization with element-wise gradient scaling." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.6448-6457, 2021.
Yoshua Bengio, Nicholas Leonard, and Aaron Courville. "Estimating or propagating gradients through stochastic neurons for conditional computation." arXiv preprint arXiv:1308.3432, 2013.
Avron, Haim, and Sivan Toledo. "Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix." Journal of the ACM (JACM), Vol.58, No.2, pp.1-34, 2011. https://doi.org/10.1145/1944345.1944349
Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, and Daniel Soudry. "Improving post training neural quantization: Layer-wise calibration and integer programming." arXiv preprint arXiv:2006.10518, 2020.
Markus Nagel, Mart van Baalen, Tijmen Blankevoort, and Max Welling. "Data-free quantization through weight equalization and bias correction." In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.1325-1334, 2019.
Li, Yuhang, et al. "Brecq: Pushing the limit of post-training quantization by block reconstruction." arXiv preprint arXiv:2102.05426, 2021.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770-778, 2016.
Alex Krizhevsky, Geoffrey Hinton, et al. "Learning multiple layers of features from tiny images." 2009.
Lee, Junghyup, et al. "Sfnet: Learning object-aware semantic correspondence." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.2278-2287, 2019.

사물인터넷융복합논문지 (Journal of Internet of Things and Convergence)

초소형 IoT 장치에 구현 가능한 딥러닝 양자화 기술 분석

Analysis of Deep learning Quantization Technology for Micro-sized IoT devices

초록

키워드

과제정보

참고문헌

자세히 찾기