DOI QR코드

DOI QR Code

Conv-XP Pruning of CNN Suitable for Accelerator

가속 회로에 적합한 CNN의 Conv-XP 가지치기

  • Woo, Yonggeun (School of Computer Science and Engineering, Korea University of Technology and Education) ;
  • Kang, Hyeong-Ju (School of Computer Science and Engineering, Korea University of Technology and Education)
  • Received : 2018.11.06
  • Accepted : 2018.11.24
  • Published : 2019.01.31

Abstract

Convolutional neural networks (CNNs) show high performance in the computer vision, but they require an enormous amount of operations, making them unsuitable for some resource- or energy-starving environments like the embedded environments. To overcome this problem, there have been much research on accelerators or pruning of CNNs. The previous pruning schemes have not considered the architecture of CNN accelerators, so the accelerators for the pruned CNNs have some inefficiency. This paper proposes a new pruning scheme, Conv-XP, which considers the architecture of CNN accelerators. In Conv-XP, the pruning is performed following the 'X' or '+' shape. The Conv-XP scheme induces a simple architecture of the CNN accelerators. The experimental results show that the Conv-XP scheme does not degrade the accuracy of CNNs, and that the accelerator area can be reduced by 12.8%.

CNN은 컴퓨터 영상 인식 부분에서 높은 성능을 보여주고 있으나 많은 연산양을 요구하는 단점으로 인해 전력이나 연산 능력에 제한이 있는 임베디드 환경에서는 사용하기 어렵다. 이러한 단점을 극복하기 위해 CNN을 위한 가속회로나 가지치기 기법에 대한 연구가 많이 이루어지고 있다. 기존의 가지치기 기법은 가속 회로의 구조를 고려하지 않아서, 가지치기된 CNN을 위한 가속 회로는 비효율적인 구조를 가지게 된다. 이 논문에서는 가속 회로의 구조를 고려한 새로운 가지치기 기법인 Conv-XP 가지치기를 제안한다. Conv-XP 가지치기에서는 'X'와 '+' 모양의 두 가지 패턴으로만 가지치기함으로써, 이 기법으로 가지치기된 CNN을 위한 가속 회로의 구조를 단순하게 설계할 수 있도록 하였다. 실험 결과에 따르면, Conv-XP와 같이 가지치기 패턴을 제한하여도 CNN의 성능이 악화되지 않으며, 가속 회로의 면적은 12.8%을 감소시킬 수 있다.

Keywords

HOJBC0_2019_v23n1_55_f0001.png 이미지

Fig. 1 Operation architecture of the accelerator in [21]

HOJBC0_2019_v23n1_55_f0002.png 이미지

Fig. 2 Multiplier connection of a sparse accelerator for the conventionally pruned networks

HOJBC0_2019_v23n1_55_f0003.png 이미지

Fig. 3 Conv-XP pruning pattern: (a) ‘X’ pattern and (b)‘+’ pattern

HOJBC0_2019_v23n1_55_f0004.png 이미지

Fig. 4 Multiplier connection of a sparse accelerator for the networks pruned by Conv-XP

Table. 1 Classification accuracy comparison on VGG16 and ResNet-50

HOJBC0_2019_v23n1_55_t0001.png 이미지

Table. 2 Area comparison (㎛2)

HOJBC0_2019_v23n1_55_t0002.png 이미지

References

  1. Y. -J. Kim and E. -G. Kim, "Image based Fire Detection using Convolutional Neural Network," Journal of the Korea Institute of Information and Communication Engineering, vol. 20, no. 9, pp. 1649-1656, Sep. 2016. https://doi.org/10.6109/jkiice.2016.20.9.1649
  2. S. -H. Kwon, K. -W. Park, and B. -H. Chang, "A Comparison of Predicting Movie Success between Artificial Neural Network and Decision Tree," Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology, vol. 7, no. 4, pp. 593-601, Apr. 2017.
  3. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," in Proceedings of Advances in Neural Information Processing Systems, pp. 1097-1105, 2012.
  4. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in Proceedings of International Conference on Learning Representations, pp. 1-14, 2015.
  5. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," in Proceedings of Computer Vision and Pattern Recognition, pp. 1-9, 2015.
  6. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of Computer Vision and Pattern Recognition, pp. 770-778, 2016.
  7. S. Han, J. Pool, J. Tran, and W. J. Dally, "Learning both weights and connections for efficient neural networks," in Proceedings of Advances in Neural Information Processing Systems, pp. 1135-1143, 2015.
  8. S. Han, H. Mao, and W. J. Dally, "Deep Compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding," in Proceedings of International Conference on Learning Representations, pp. 1-14, 2016.
  9. W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, "Learning structured sparsity in deep neural networks," in Proceedings of Advances in Neural Information Processing Systems, pp. 2074-2082, 2016.
  10. N. Yu, S. Qiu, X. Hu, and J. Li, "Accelerating convolutional neural networks by group-wise 2D-filter pruning," in Proceedings of International Joint Conference on Neural Networks, pp. 2502-2509, 2017.
  11. H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, "Pruning filters for efficient ConvNets," in Proceedings of International Conference on Learning Representations, pp. 1-13, 2017.
  12. Y. He, X. Zhang, and J. Sun, "Channel pruning for accelerating very deep neural networks," in Proceedings of International Conference on Computer Vision, pp. 1398-1406, 2017.
  13. J. Yu, A. Lukefahr, D. Palframan, G. Dasika, R. Das, and S. Mahlke, "Scalpel: Customizing DNN pruning to the underlying hardware parallelism," in Proceedings of International Symposium on Computer Architecture, pp. 548-560, 2017.
  14. P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, "Pruning convolutional neural networks for resource efficient inference," in Proceedings of International Conference on Learning Representations, pp. 1-17, 2017.
  15. T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," in Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 269-283, 2014.
  16. Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam, "DaDianNao: A machine-learning supercomputer," in Proceedings of International Symposium on Micro-architecture, pp. 609-622, 2014.
  17. S. Liu, Z. Du, J. Tao, D. Han, T. Luo, Y. Zie, Y. Chen, and T. Chen, "Cambricon: An instruction set architecture for neural networks," in Proceedings of International Symposium on Computer Architecture, pp. 393-405, 2016.
  18. C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing FPGA-based accelerator design for deep convolutional neural networks," in Proceedings of ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 161-170, 2015.
  19. J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos, "Cnvlutin: Ineffectual-neuron-free deep neural network computing," in Proceedings of International Symposium on Computer Architecture, pp. 1-13, 2016.
  20. L. Song, Y. Wang, Y. Han, X. Zhao, B. Liu, and X. Li, "C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization," in Proceedings of Design Automation Conference, pp. 123:1-123:6, 2016.
  21. J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, Y. Wang, and H. Yang, "Going deeper with embedded FPGA platform for convolutional neural network," in Proceedings of ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 26-35, 2016.
  22. M. Motamedi, P. Gysel, V. Akella, and S. Ghiasi, "Design space exploration of FPGA-based deep convolutional neural networks," in Proceedings of Asia and South Pacific Design Automation Conference, pp. 575-580, 2016.
  23. J. Jo, S. Cha, D. Rho, and I.-C. Park, "DSIP: A scalable inference accelerator for convolutional neural networks," IEEE Journal of Solid-State Circuits, vol. 53, no. 2, pp. 605-618, Feb. 2018. https://doi.org/10.1109/JSSC.2017.2764045
  24. S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, "EIE: Efficient inference engine on compressed deep neural network," in Proceedings of International Symposium on Computer Architecture, pp. 243-254, 2016.
  25. S. Han, J. Kang, H. Mao, Y. Hu, X. Li, Y. Li, D. Xie, H. Luo, S. Yao, Y. Wang, H. Yang, and W. J. Dally, "ESE: Efficient speech recognition engine with sparse LSTM on FPGA," in Proceedings of ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 75-84, 2017.
  26. S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chyen, "Cambricon-X: An accelerator for sparse neural networks," in Proceedings of International Symposium on Micro-architecture, pp. 20:1-20:12, 2016.
  27. D. Kim, J. Ahn, and S. Yoo, "A novel zero weight/ activation-aware hardware architecture of convolutional neural network," in Proceedings of Design, Automation & Test in Europe Conference & Exhibition, pp. 1462-1467, 2017.
  28. V. Lebedev and V. Lempitsky, "Fast ConvNets using group-wise brain damage," in Proceedings of Computer Vision and Pattern Recognition, pp. 2554-2564, 2016.
  29. S. Anwar, K. Hwang, and W. Sung, "Structured pruning of deep convolutional neural networks," ACM Journal on Emerging Technologies in Computing System, vol. 13, no. 3, pp. 32:1-32:18, Feb. 2017.