DOI QR코드

DOI QR Code

딥 residual network를 이용한 선생-학생 프레임워크에서 힌트-KD 학습 성능 분석

Performance Analysis of Hint-KD Training Approach for the Teacher-Student Framework Using Deep Residual Networks

  • 배지훈 (한국전자통신연구원 KSB융합연구단) ;
  • 임준호 (한국과학기술원전기 및 전자공학부) ;
  • 유재학 (한국전자통신연구원 KSB융합연구단) ;
  • 김귀훈 (한국전자통신연구원 KSB융합연구단) ;
  • 김준모 (한국과학기술원전기 및 전자공학부)
  • Bae, Ji-Hoon (KSB Convergence Research Department, Electronics and Telecommunications Research Institute) ;
  • Yim, Junho (Department of Electrical Engineering, Korea Advanced Institute of Science and Technology) ;
  • Yu, Jaehak (KSB Convergence Research Department, Electronics and Telecommunications Research Institute) ;
  • Kim, Kwihoon (KSB Convergence Research Department, Electronics and Telecommunications Research Institute) ;
  • Kim, Junmo (Department of Electrical Engineering, Korea Advanced Institute of Science and Technology)
  • 투고 : 2016.12.16
  • 심사 : 2017.04.05
  • 발행 : 2017.05.25

초록

본 논문에서는 지식추출(knowledge distillation) 및 지식전달(knowledge transfer)을 위하여 최근에 소개된 선생-학생 프레임워크 기반의 힌트(Hint)-knowledge distillation(KD) 학습기법에 대한 성능을 분석한다. 본 논문에서 고려하는 선생-학생 프레임워크는 현재 최신 딥러닝 모델로 각광받고 있는 딥 residual 네트워크를 이용한다. 따라서, 전 세계적으로 널리 사용되고 있는 오픈 딥러닝 프레임워크인 Caffe를 이용하여 학생모델의 인식 정확도 관점에서 힌트-KD 학습 시 선생모델의 완화상수기반의 KD 정보 비중에 대한 영향을 살펴본다. 본 논문의 연구결과에 따르면 KD 정보 비중을 단조감소하는 경우보다 초기에 설정된 고정된 값으로 유지하는 것이 학생모델의 인식 정확도가 더 향상된다는 것을 알 수 있었다.

In this paper, we analyze the performance of the recently introduced Hint-knowledge distillation (KD) training approach based on the teacher-student framework for knowledge distillation and knowledge transfer. As a deep neural network (DNN) considered in this paper, the deep residual network (ResNet), which is currently regarded as the latest DNN, is used for the teacher-student framework. Therefore, when implementing the Hint-KD training, we investigate the impact on the weight of KD information based on the soften factor in terms of classification accuracy using the widely used open deep learning frameworks, Caffe. As a results, it can be seen that the recognition accuracy of the student model is improved when the fixed value of the KD information is maintained rather than the gradual decrease of the KD information during training.

키워드

참고문헌

  1. Y.-T. Park, "A comparative study of image recognition by neural network classifier and linear tree classifier", Journal of The Institute of Electronics and Information Engineers-B, vol. 31, no. 5, pp. 141-148, 1994.
  2. S. Hong, W. Im, J. Park, and H.-S. Yang, "Deep CNN-based person identification using facial and clothing features", in Proc. of Summer Conference on Institute of Electronics and Information Engineers (IEIE), pp. 2204-2207, June, 2016.
  3. Y. Shin, J.-H. Park, S. Shin, G. Lim, S. Song, C. Lee, and J.-M. Chung, "Improvement of image classification in augmented reality based on deep learning", in Proc. of Summer Conference on Institute of Electronics and Information Engineers (IEIE), pp. 1771-1773, June, 2016.
  4. G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural netwok", arXiv prreprint arXiv:1503.02531, pp. 1-19, 2015.
  5. A. Romero, N. Ballas, S.E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, "Fitnets: Hints for thin deep nets", in Proc. of 5th International Conference on Learning Representations (ICLR), pp. 1-13, San Diego, May 7-9, 2015.
  6. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition", in Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-12, Las Vegas, June 26-July 1, 2016.
  7. A. Veit, M. Wilber, and S. Belongie, "Residual networks are exponential ensembles of relatively shallow networks", arXivpreprint arXiv:1605.06431, pp. 1-12, 2016.
  8. I.J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio, "Maxout networks", arXiv:1302.4389, pp. 1-9, 2013.
  9. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition", in Proc. of 5th International Conference on Learning Representations (ICLR), pp. 1-14, San Diego, May 7-9, 2015.
  10. [Online available] "Caffe, deep learning frame work", http://caffe.berkeleyvision.org/
  11. [Online available] "CIFAR-10 and CIFAR-100 datasets", https://www.cs.toronto.edu/-kriz/cifar.html
  12. [Online available] "MNIST dataset", http://yann.lecun.com/exdb/mnist