DOI QR코드

DOI QR Code

A Study on Lightweight Transformer Based Super Resolution Model Using Knowledge Distillation

지식 증류 기법을 사용한 트랜스포머 기반 초해상화 모델 경량화 연구

  • Dong-hyun Kim (School of Computer Science and Engineering, Kyungpook National University) ;
  • Dong-hun Lee (School of Computer Science and Engineering, Kyungpook National University) ;
  • Aro Kim (School of Computer Science and Engineering, Kyungpook National University) ;
  • Vani Priyanka Galia (School of Computer Science and Engineering, Kyungpook National University) ;
  • Sang-hyo Park (School of Computer Science and Engineering, Kyungpook National University)
  • 김동현 (경북대학교 IT대학 컴퓨터학부) ;
  • 이동훈 (경북대학교 IT대학 컴퓨터학부) ;
  • 김아로 (경북대학교 IT대학 컴퓨터학부) ;
  • ;
  • 박상효 (경북대학교 IT대학 컴퓨터학부)
  • Received : 2023.04.07
  • Accepted : 2023.05.17
  • Published : 2023.05.30

Abstract

Recently, the transformer model used in natural language processing is also applied to the image super resolution field, showing good performance. However, these transformer based models have a disadvantage that they are difficult to use in small mobile devices because they are complex and have many learning parameters and require high hardware resources. Therefore, in this paper, we propose a knowledge distillation technique that can effectively reduce the size of a transformer based super resolution model. As a result of the experiment, it was confirmed that by applying the proposed technique to the student model with reduced number of transformer blocks, performance similar to or higher than that of the teacher model could be obtained.

최근 자연어 처리에서 사용되던 트랜스포머 모델이 이미지 초해상화 분야에서도 적용되면서 좋은 성능을 보여주고 있다. 그러나 이러한 트랜스포머 기반 모델들은 복잡하고 많은 학습 파라미터를 가지고 있어 많은 하드웨어 자원을 요구하기 때문에 작은 모바일 기기에서는 사용하기 어렵다는 단점을 가지고 있다. 따라서 본 논문에서는 트랜스포머 기반 초해상화 모델의 크기를 효과적으로 줄일 수 있는 지식 증류 기법을 제안한다. 실험 결과 트랜스포머 블록의 개수를 줄인 학생 모델에서 제안 기법을 적용해 교사 모델과 비슷한 성능을 내거나 더 높일 수 있음을 확인하였다.

Keywords

Acknowledgement

이 논문은 2023년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임 (No.RS-2022-00167169, 이동형 로봇기반 실사 메타버스 실감형 비디오의 획득 및 처리 기술 개발).

References

  1. Zou, W., Ye, T., Zheng, W., Zhang, Y., Chen, L., & Wu, Y., "Self-calibrated efficient transformer for lightweight super-resolution." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 930-939 2022.  doi: https://doi.org/10.1109/CVPRW56347.2022.00107
  2. Wang, X., Zhu, S., Guo, Y., Han, P., Wang, Y., Wei, Z., & Jin, X., "TransFlowNet: A physics-constrained Transformer framework for spatio-temporal super-resolution of flow simulations." Journal of Computational Science, 65, 101906, 2022. doi: https://doi.org/10.1016/j.jocs.2022.101906
  3. Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., & Timofte, R. "Swinir: Image restoration using swin transformer." In Proceedings of the IEEE/CVF international conference on computer vision, pp. 1833-1844. 2021. doi: https://doi.org/10.1109/ICCVW54120.2021.00210
  4. E. Agustsson and R. Timofte, "NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study," 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, pp. 1122-1131, 2017. doi: https://doi.org/10.1109/CVPRW.2017.150
  5. Bevilacqua, M., Roumy, A., Guillemot, C., & Alberi-Morel, M. L., "Low-complexity single-image super-resolution based on nonnegative neighbor embedding." In Proceedings of the 23rd British Machine Vision Conference (BMVC). BMVA Press, 135.1-135.10. 2012. doi: https://doi.org/10.5244/C.26.135
  6. Zeyde, R., Elad, M., & Protter, M., "On single image scale-up using sparse-representations." In Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7, pp. 711-730. Springer Berlin Heidelberg, 2012. doi: https://doi.org/10.1007/978-3-642-27413-8_47
  7. Matsui, Y., Ito, K., Aramaki, Y., Fujimoto, A., Ogawa, T., Yamasaki, T., & Aizawa, K., "Sketch-based manga retrieval using manga109 dataset." Multimedia Tools and Applications, 76, 21811-21838.2017 doi: https://doi.org/10.1007/s11042-016-4020-z
  8. Huang, J. B., Singh, A., & Ahuja, N., "Single image super-resolution from transformed self-exemplars." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5197-5206. 2015. doi: https://doi.org/10.1109/cvpr.2015.7299156
  9. Martin, D., Fowlkes, C., Tal, D., & Malik, J., "A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics." In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 2, pp. 416-423. IEEE, 2001. doi: https://doi.org/10.1109/iccv.2001.937655