DOI QR코드

DOI QR Code

MATE: Memory- and Retraining-Free Error Correction for Convolutional Neural Network Weights

  • Jang, Myeungjae (School of Computing, Korea Advanced Institute of Science and Technology) ;
  • Hong, Jeongkyu (Department of Computer Engineering, Yeungnam University)
  • Received : 2021.01.19
  • Accepted : 2021.03.02
  • Published : 2021.03.31

Abstract

Convolutional neural networks (CNNs) are one of the most frequently used artificial intelligence techniques. Among CNN-based applications, small and timing-sensitive applications have emerged, which must be reliable to prevent severe accidents. However, as the small and timing-sensitive systems do not have sufficient system resources, they do not possess proper error protection schemes. In this paper, we propose MATE, which is a low-cost CNN weight error correction technique. Based on the observation that all mantissa bits are not closely related to the accuracy, MATE replaces some mantissa bits in the weight with error correction codes. Therefore, MATE can provide high data protection without requiring additional memory space or modifying the memory architecture. The experimental results demonstrate that MATE retains nearly the same accuracy as the ideal error-free case on erroneous DRAM and has approximately 60% accuracy, even with extremely high bit error rates.

Keywords

References

  1. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, and A. C. Berg, "ImageNet large scale visual recognition challenge," International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211-252, 2015. DOI: 10.1007/s11263-015-0816-y.
  2. P. J. Nair, V. Sridharan, and M. K. Qureshi, "Xed: Exposing on-die error detection information for strong memory reliability," in ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 341-353, 2016. DOI: 10.1109/ISCA.2016.38.
  3. P. J. Nair, D. Kim, and M. K. Qureshi, "Archshield: Architectural framework for assisting dram scaling by tolerating high error rates," ACM SIGARCH Computer Architecture News, vol. 41, no. 3, pp. 72-83, 2013. DOI: 10.1145/2508148.2485929.
  4. K.K. Chang, A.G. Yaglikci, S. Ghose, A. Agrawal, N. Chatterjee, A. Kashyap, D. Lee, M. O'Connor, H. Hassan, and O. Mutlu, "Understanding reduced-voltage operation in modern DRAM devices: Experimental characterization, analysis, and mechanisms," in Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 1, no. 1, pp. 1-42, 2017. DOI: 10.1145/3084447.
  5. D. Nguyen, N. Ho, and I. Chang, "St-drc: Stretch-able dram refresh controller with no parity-overhead error correction scheme for energy-efficient DNNs," in Proceedings of the 56th Annual Design Automation Conference 2019, pp. 1-6, 2019. DOI: 10.1145/3316781.3317915.
  6. J. Redmon, "Darknet: Open source neural networks in c," 2013-2016. [online] Available: http://pjreddie.com/darknet/.
  7. E. Park, S. Yoo, and P. Vajda, "Value-aware quantization for training and inference of neural networks," in Proceedings of the European Conference on Computer Vision (ECCV), pp. 580-595, 2018.
  8. S. Han, H. Mao, and W. J. Dally, "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding," arXiv preprint arXiv:1510.00149, 2015.
  9. J. H. Ko, D. Kim, T. Na, J. Kung, and S. Mukhopadhyay, "Adaptive weight compression for memory-efficient neural networks," in Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 199-204, 2017. DOI: 10.23919/DATE.2017.7926982.
  10. G. R. Bolt, "Fault tolerance in artificial neural networks: Are neural networks inherently fault tolerant?," 1996.
  11. Q. Shi, H. Omar, and O. Khan, "Exploiting the tradeoff between program accuracy and soft-error resiliency overhead for machine learning workloads," arXiv preprint arXiv:1707.02589, 2017.
  12. P. J. Edwards and A. F. Murray, "Can deterministic penalty terms model the effects of synaptic weight noise on network fault-tolerance?," International Journal of Neural Systems, vol. 6, no. 4, pp. 401-416, 1995. DOI: 10.1142/S0129065795000263.
  13. J. T. Ausonio, "Measuring fault resilience in neural networks," PhD thesis, University of Reading, 2018.
  14. P. Keller, "Understanding the new bit error rate dram timing specifications," in Proceedings of JEDEC Server Memory Forum, 2011.
  15. L. Pullum and B. J. Taylor, "Risk and hazard analysis for neural network systems," in Methods and Procedures for the Verification and Validation of Artificial Neural Networks, pp. 33-49, Springer, Boston, MA, 2006. DOI: 10.1007/0-387-29485-6_3.
  16. S. Mittal, "A survey on modeling and improving reliability of DNN algorithms and accelerators," Journal of Systems Architecture, vol. 104, 2020. DOI: 10.1016/j.sysarc.2019.101689.
  17. G. Li, S.K. S. Hari, M. Sullivan, T. Tsai, K. Pattabiraman, J. Emer, and S. W. Keckler, "Understanding error propagation in deep learning neural network (DNN) accelerators and applications," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Association for Computing Machinery, New York, NY, USA, no. 8, pp. 1-12, 2017. DOI: 10.1145/3126908.3126964.