Area-wise relational knowledge distillation

  • Sungchul Cho (Department of Applied Statistics, Chung-Ang University) ;
  • Sangje Park (Department of Applied Statistics, Chung-Ang University) ;
  • Changwon Lim (Department of Applied Statistics, Chung-Ang University)
  • 투고 : 2023.03.22
  • 심사 : 2023.04.10
  • 발행 : 2023.09.30


Knowledge distillation (KD) refers to extracting knowledge from a large and complex model (teacher) and transferring it to a relatively small model (student). This can be done by training the teacher model to obtain the activation function values of the hidden or the output layers and then retraining the student model using the same training data with the obtained values. Recently, relational KD (RKD) has been proposed to extract knowledge about relative differences in training data. This method improved the performance of the student model compared to conventional KDs. In this paper, we propose a new method for RKD by introducing a new loss function for RKD. The proposed loss function is defined using the area difference between the teacher model and the student model in a specific hidden layer, and it is shown that the model can be successfully compressed, and the generalization performance of the model can be improved. We demonstrate that the accuracy of the model applying the method proposed in the study of model compression of audio data is up to 1.8% higher than that of the existing method. For the study of model generalization, we demonstrate that the model has up to 0.5% better performance in accuracy when introducing the RKD method to self-KD using image data.



This research was supported by the Chung-Ang University research grant in 2020. This research was also supported by Next-Generation Information Computing Development Program through the National Research Foundation (NRF) of Korea and the NRF grant funded by the Ministry of Science, ICT (NRF-2017M3C4A7083281, NRF-2021R1F1A1056516).


