DOI QR코드

DOI QR Code

Layer-wise hint-based training for knowledge transfer in a teacher-student framework

  • Bae, Ji-Hoon (KSB Convergence Research Department, Electronics and Telecommunications Research Institute) ;
  • Yim, Junho (School of Electrical Engineering, Korea Advanced Institute of Science and Technology) ;
  • Kim, Nae-Soo (KSB Convergence Research Department, Electronics and Telecommunications Research Institute) ;
  • Pyo, Cheol-Sig (KSB Convergence Research Department, Electronics and Telecommunications Research Institute) ;
  • Kim, Junmo (School of Electrical Engineering, Korea Advanced Institute of Science and Technology)
  • Received : 2018.03.27
  • Accepted : 2018.09.05
  • Published : 2019.04.07

Abstract

We devise a layer-wise hint training method to improve the existing hint-based knowledge distillation (KD) training approach, which is employed for knowledge transfer in a teacher-student framework using a residual network (ResNet). To achieve this objective, the proposed method first iteratively trains the student ResNet and incrementally employs hint-based information extracted from the pretrained teacher ResNet containing several hint and guided layers. Next, typical softening factor-based KD training is performed using the previously estimated hint-based information. We compare the recognition accuracy of the proposed approach with that of KD training without hints, hint-based KD training, and ResNet-based layer-wise pretraining using reliable datasets, including CIFAR-10, CIFAR-100, and MNIST. When using the selected multiple hint-based information items and their layer-wise transfer in the proposed method, the trained student ResNet more accurately reflects the pretrained teacher ResNet's rich information than the baseline training methods, for all the benchmark datasets we consider in this study.

Keywords

References

  1. Y. LeCun et al., Gradient‐based learning applied to document recognition, Proc. IEEE 86 (1998), 1-46.
  2. A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, 26th Annu. Conf. Neural Inform. Process. Syst. (NIPS), Stateline, NV, USA, Dec. 3-8, 2012, pp. 1106-1114.
  3. C. Szegedy et al., Going deeper with convolutions, Proc. 2015 IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Boston, MA, USA, June 7-12, 2015, pp. 1-9.
  4. K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, Proc. 5th Int. Conf. Learning Represent. (ICLR), San Diego, CA, USA, May 7-9, 2015, pp. 1-14.
  5. K. He et al., Deep residual learning for image recognition, Proc. IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Las Vegas, NV, USA, June 26-July 1, 2016, pp. 1-12.
  6. A. Veit, M. Wilber, and S. Belongie, Residual networks are exponential ensembles of relatively shallow networks, arXiv preprint arXiv: 1605.06431 (2016), 1-12.
  7. S. L. Phung and A. Bouzerdoum, A pyramidal neural network for visual pattern recognition, IEEE Trans. Neural Netw. 18 (2007), 329-343. https://doi.org/10.1109/TNN.2006.884677
  8. K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, Proc. 27th Neural Inform. Process. Sys. Conf. (NIPS), Montreal, Canada, Dec. 8-13, 2014, pp. 1-9.
  9. M. Lin, Q. Chen, and S. Yan, Network in network, Proc. Int. Conf. Learning Represent. (ICLR), Banff, Canada, Apr. 14-16, 2014, pp. 1-10.
  10. R. Girshick, Fast R-CNN, Proc. Int. Conf. Compu. Vision (ICCV), Santiago, Chile, Dec. 11-18, 2015, pp. 1440-1448.
  11. M. Liang and X. Hu, Recurrent convolutional neural network for object recognition, Proc. 2015 IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Boston, June 7-12, 2015, pp. 3367-3375.
  12. J. Donahue et al., Long-term recurrent convolutional networks for visual recognition and description, Proc. 2015 IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Boston, MA, USA, June 7-12, 2015, pp. 1-14.
  13. J. Yim et al., Rotating your face using multi-task deep neural network, Proc. 2015 IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Boston, MA, USA, June 7-12, 2015, pp. 676-684.
  14. C. Park et al., Korean conference resolution with guided mention pair model using the deep learning, ETRI J. 38 (2016), 1207-1217. https://doi.org/10.4218/etrij.16.0115.0896
  15. Y. Zhang et al., Adaptive convolutional neural network and its application in face recognition, Neural Process. Lett. 43 (2016), 389-399. https://doi.org/10.1007/s11063-015-9420-y
  16. D. Han, J. Kim, and J. Kim, Deep pyramidal residual networks, Proc. 2017 IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Honolulu, HI, USA, July 21-26, 2017, pp. 5927-5935.
  17. G. Huang, Z. Liu, and L. Maaten, Densely connected convolutional networks, Proc. 2017 IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Honolulu, HI, USA, July 21-26, 2017, pp. 2261-2269.
  18. G. Huang et al., Densely connected convolutional networks, Proc. 2017 IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Honolulu, HI, USA, July 21-26, 2017, pp. 2261-2269.
  19. M. Brahimi, Deep learning for tomato diseases: classification and symptoms visualization, Appl. Artif. Intell. 31 (2017), no. 4, 1-17. https://doi.org/10.1080/08839514.2017.1315516
  20. J. Yun and B.-J. Jang, Ambient light backscatter communication for IoT applications, J. Kor. Electromag. Eng. Soc. 16 (2016), no. 4, 214-218. https://doi.org/10.5515/JKIEES.2016.16.4.214
  21. J. Li et al., Learning small-size DNN with output-distributionbased criteria, Proc. INTERSPEECH, Singapore, Sept. 14-18, 2014, pp. 1910-1914.
  22. G. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural network, arXiv preprint arXiv:1503.02531, 2015, pp. 1-19.
  23. A. Romero et al., Fitnets: hints for thin deep nets, Proc. 5th Int. Conf. Learning Represent. (ICLR), San Diego, CA, USA, May 7-9, 2015, pp. 1-13.
  24. I. J. Goodfellow et al., Maxout networks, arXiv:1302.4389, 2013, pp. 1-9.
  25. T. Chen, I. Goodfellow, and J. Shlens, Net2Net: accelerating learning via knowledge transfer, Proc. 6th Int. Conf. Learning Represent. (ICLR), San Juan, Puerto Rico, May 2-4, 2016, pp. 1-12.
  26. C. Bucilua, R. Caruana, and A. Niculescu-Mizil, Model compression, Proc. 12th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Philadelphia, PA, USA, Aug. 20-23, 2016, pp. 535-541.
  27. Y. Jia et al., Caffe: convolutional architecture for fast feature embedding, Proc. 22th ACM Int. Conf. Multimedia (ACM MM), Orlando, FL, USA, Nov. 3-7, 2014, pp. 675-678.
  28. Caffe, Deep learning framework, available at http://caffe.berkeleyvision.org/.
  29. S. Roy et al., Handwritten isolated Bangla compound character recognition: a new benchmark using a novel deep learning approach, Pattern Recogn. Lett. 90 (2017), 15-21. https://doi.org/10.1016/j.patrec.2017.03.004
  30. S. Ioffe and C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariant shift, Proc. 32nd Int. Conf. Machine Learning (ICML), Lille, France, July 6-11, 2015, pp. 1-9.
  31. K. He et al., Delving deep into rectifiers: surpassing human-level performance on ImageNet classification, Proc. 2015 IEEE Conf. Comput. Vision Pattern Recogn. (CVPR), Boston, MA, USA, June 7-12, 2015. pp. 1026-1034.
  32. CIFAR-10 and CIFAR-100 datasets, available at https://www.cs.toronto.edu/-kriz/cifar.html.
  33. MNIST dataset, available at http://yann.lecun.com/exdb/mnist/.
  34. G. E. Hinton and R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (2006), 504-507. https://doi.org/10.1126/science.1127647
  35. Y. Bengio et al., Greedy layer-wise training of deep neural networks, Proc. Neural Inform. Process. Syst. Conf. (NIPS), Vancouver, Canada, Dec. 4-7, 2006, pp. 153-160.

Cited by

  1. Novel Model Based on Stacked Autoencoders with Sample-Wise Strategy for Fault Diagnosis vol.2019, 2019, https://doi.org/10.1155/2019/8985657
  2. Functional Brain Network Analysis of Knowledge Transfer While Engineering Problem-Solving vol.15, 2019, https://doi.org/10.3389/fnhum.2021.713692