DOI QR코드

DOI QR Code

Face Super-Resolution using Adversarial Distillation of Multi-Scale Facial Region Dictionary

다중 스케일 얼굴 영역 딕셔너리의 적대적 증류를 이용한 얼굴 초해상화

  • Jo, Byungho (Inha University, Department of Electrical & Computer Engineering) ;
  • Park, In Kyu (Inha University, Department of Electrical & Computer Engineering) ;
  • Hong, Sungeun (Inha University, Department of Electrical & Computer Engineering)
  • 조병호 (인하대학교 전기컴퓨터공학과) ;
  • 박인규 (인하대학교 전기컴퓨터공학과) ;
  • 홍성은 (인하대학교 전기컴퓨터공학과)
  • Received : 2021.08.02
  • Accepted : 2021.09.24
  • Published : 2021.09.30

Abstract

Recent deep learning-based face super-resolution (FSR) works showed significant performances by utilizing facial prior knowledge such as facial landmark and dictionary that reflects structural or semantic characteristics of the human face. However, most of these methods require additional processing time and memory. To solve this issue, this paper propose an efficient FSR models using knowledge distillation techniques. The intermediate features of teacher network which contains dictionary information based on major face regions are transferred to the student through adversarial multi-scale features distillation. Experimental results show that the proposed model is superior to other SR methods, and its effectiveness compare to teacher model.

최근 딥러닝 기반의 얼굴 초해상화 연구는 일반적인 영상에 대한 초해상화 연구와 달리 인간의 얼굴이 가지는 구조적 혹은 의미론적인 특성을 반영한 안면 랜드마크 정보, 주요 영역 딕셔너리와 같은 사전 및 참조 정보를 사용하여 우수한 초해상화 결과를 보였다. 그러나 얼굴에 특화된 사전 정보를 사용할 시 추가적인 처리 소요 시간과 메모리를 요구하는 단점이 존재한다. 본 논문은 앞서 언급한 한계점을 극복하고자 지식 증류 기법을 활용한 효율적인 초해상화 모델을 제안한다. 주요 얼굴 영역 기반의 딕셔너리 정보를 사용하는 선생 모델에 지식 증류 기법을 적용하여 추론 시 랜드마크 정보와 부가적인 딕셔너리 사용이 필요 없는 학생 모델을 구축하였다. 제안하는 학생 모델은 특징맵 기반의 적대적 지식 증류를 통해 얼굴 주요 영역 딕셔너리를 가지고 있는 선생 모델로부터 학습을 진행하였다. 본 논문은 제안하는 학생 모델의 실험 결과를 통해 정량 및 정성적으로 우수함을 보이며 선생 모델의 연산량에 비해 90% 이상 절감되는 효율성을 증명한다.

Keywords

Acknowledgement

이 논문은 2021년도 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원(2020-0-01389, 인공지능융합연구센터지원(인하대학교))과 2021년 도 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임 (NRF-2019R1A2C1006706).

References

  1. D. Kim, M. Kim, G. Kwon, and D. S. Kim, "Progressive face super-resolution via attention to facial landmark," In Proc. of British Machine Vision Conference, pp. 192, 2017
  2. X. Hu et al., "Face super-resolution guided by 3D facial priors," In Proc. of European Conference on Computer Vision, pp. 763-780, 2020, doi:10.1007/978-3-030-58548-8_44.
  3. Y. Chen, Y. Tai, X. Liu, C. Shen, and J. Yang. "Fsrnet: End-to-end learning face super-resolution with facial priors," In Proc. of IEEE Confere nce on Computer Vision and Pattern Recognition, pp. 2492-2501, 2018, doi:10.1109/CVPR.2018.00264.
  4. X. Yu, B. Fernando, R. Hartley and F. Porikli, "Super-resolving very low-resolution face images with supplementary attributes," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 908-917, 2018, doi:10.1109/CVPR.2018.00101.
  5. X. Li, M. Liu, Y. Ye, W. Zuo, L. Lin, and R. Yang. "Learning warped guidance for blind face restoration," In Proc. of European Conference on Computer Vision, pp. 278-296, 2018, doi:10.1007/978-3-030-01261-8_17.
  6. B. Dogan, S. Gu, and R. Timofte. "Exemplar guided face image super-resolution without facial landmarks," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1814-1823, 2019, doi:10.1109/CVPRW.2019.00232.
  7. X. Li, C. Chen, S. Zhou, X. Lin, W. Zuo, and L. Zhang. "Blind face restoration via deep multi-scale component dictionaries," In Proc. of European Conference on Computer Vision, pp. 399-415, 2020, doi:10.1007/978-3-030-58545-7_23.
  8. I. Chung, S. Park, J. Kim, and N. Kwak, "Feature-map-level online adversarial knowledge distillation," In Proc. of International Conference on Machine Learning, pp. 2006-2015, 2020.
  9. W. Xiaogang, and T. Xiaoou. "Hallucinating face by eigentransformation," IEEE Trans. on Systems, Man, and Cybernetics, Part C, 35(3):425-434, 2005. https://doi.org/10.1109/TSMCC.2005.848171
  10. S. Kolouri and G. K. Rohde. "Transport-based single frame super resolution of very low resolution face images," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4876-4884, 2015, doi:10.1109/CVPR.2015.7299121.
  11. C. Dong, C. C. Loy, K. He, and X. Tang, "Image super-resolution using deep convolutional networks," In Proc. of European Conference on Computer Vision, pp. 184-199, 2014, doi:10.1007/978-3-319-10593-2_13.
  12. Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu. "Image super-resolution using very deep residual channel attention networks," In Proc. of European Conference on Computer Vision, pp. 294-310, 2018, doi:10.1007/978-3-030-01234-2_18.
  13. M. Haris, G. Shakhnarovich, and N. Ukita, "Deep back-projection networks for super-resolution," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1664-1673, 2018, doi: 10.1109/CVPR.2018.00179.
  14. Z. Li, J. Yang, Z. Liu, X. Yang, G. Jeon, and W. Wu, "Feedback network for image super-resolution," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3867-3876, 2019, doi: 10.1109/CVPR.2019.00399.
  15. I. J. Goodfellow, J. P. Abadie, M. Mirza, B. Xu, D. W. Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," In Proc. of Advances in Neural Information Processing Systems, 2014.
  16. A. Newell, K. Yang, and J. Deng, "Stacked hourglass networks for human pose estimation," In Proc. of European Conference on Computer Vision, pp. 483-499, 2016, doi:10.1007/978-3-319-46484-8_29.
  17. Y. Song, J. Zhang, S. He, L. Bao, and Q. Yang. "Learning to hallucinate face images via component generation and enhancement," In Proc. of International Joint Conference on Artificial Intelligence, pp. 4537-4543, 2017, doi:10.5555/3171837.3171921.
  18. C. Ma, Z. Jiang, Y. Rao, J. Lu, and J. Zhou. "Deep face super-resolution with iterative collaboration between attentive recovery and landmark estimation," In Proc. of IEEE Conference on Computer Vision and Patt ern Recognition, pp. 5568-5577, 2020, doi:10.1109/CVPR42600.2020.00561.
  19. L. Yang, C. Liu, P.Wang, S. Wang, P. Ren, S. Ma, and W. Gao. "Hifacegan: Face renovation via collaborative suppression and replenishment," In Proc. of ACM International Conference on Multimedia, pp. 1551-1560, 2020, doi:10.1145/3394171.3413965.
  20. G. Koch, R. Zemel, and R. Salakhutdinov. "Siamese neural networks for one-shot image recognition," In Proc. of International Conference on Machine Learning, 2015.
  21. G. Hinton, O. Vinyals, and J. Dean. "Distilling the knowledge in a neural network," In Proc. of Neural Information Processing Systems, 2014
  22. S. Zagoruyko and N. Komodakis, "Paying more attention to attention: Improving the performance of convolutional neural networks via attention trasfer," In Proc. of International Conference on Learning Representations, 2017
  23. S. Ahn and S. J. Kang, "Deep learning-based real-time super-resolution architecture design," Journal of Broadcast Engineering, Vol. 26, No. 2, pp. 167-174, March, 2021. https://doi.org/10.5909/JBE.2021.26.2.167
  24. O. S. Kwon "Real-time low-resolution face recognition algorithm for surveillance systems," Journal of Broadcast Engieering, Vol.25, No.1, pp. 105-108, Jan, 2020.
  25. Q. Gao, Y. Zhao, G. Li, and T. Tong, "Image super-reoslution using knowledge distillation," In Prof. of Asian Conference on Computer Vision, pp. 527-541, 2018, doi:10.1007/978-3-030-20890-5_34.
  26. Z. He, T. Dai, J. Lu, Y. Jiang, and S. T. Xia. "FAKD: Feature-affinity based knowledge distillation for efficient image super-resolution," In Proc. of IEEE International Conference on Image Processing, pp. 518-522, 2020, doi:10.1109/ICIP40778.2020.9190917.
  27. W. Lee, J. Lee, D. Kim, and B. Ham. "Learning with privileged information for efficient image super-resolution," In Proc. of European Conference on Computer Vision, pp. 465-482, 2020, doi:10.1007/978-3-030-58586-0_28.
  28. A. Bulat, and G. Tzimiropoulos. "How far are we from solving the 2d & 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks)," In Proc. of IEEE International Conference on Computer Vision. pp. 1021-1030, 2017, doi:10.1109/ICCV.2017.116.
  29. T. Karras, S. Laine, and T. Aila, "A style-based generator architecture for generative adversarial networks," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4396-4405, 2019, doi:10.1109/CVPR.2019.00453.
  30. Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman. "Vggface2: A dataset for recognising faces across pose and age," In Proc. of IEEE International Conference on Automatic Face & Gesture Recognition, pages 67-74, 2018
  31. K. Simonyand and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," In Proc. of International Conference on Learning Representations, 2015.
  32. K. He, G. Gkioxari, P. Dollar, and R. Girshick. "Mask r-cnn," In Proc. of IEEE International Conference on Computer Vision, pp. 2980-2988, 2017, doi:10.1109/ICCV.2017.322.
  33. X. Huang and S. Belongi. "Arbitrary style transfer in real-time with adaptive instance normalization," In Proc. of IEEE International Conference on Computer Vision, pp. 1510-1519, 2017, doi:10.1109/ICCV.2017.167.
  34. X. Wang, K. Yu, C. Dong, and C. C. Loy. "Recovering realistic texture in image super-resolution by deep spatial feature transform," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 606-615, 2018, doi:10.1109/CVPR.2018.00070.
  35. J. Johnson, A. Alahi, F.F. Li. "Perceptual Losses for real-time style transfer and super-resolution," In Proc. of European Conference on Computer Vision, pp. 694-711, 2016, doi:10.1007/978-3-319-46475-6_43.
  36. J. Dong, W. Dong, R. Socher, L . J. Li, K. Li, and F. F. Li. "ImageNet: A large-scale hierarchical image database," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 248-255, 2009, doi: 10.1109/CVPR.2009.5206848.
  37. X. Mao, Q. Li, H. Xie, R.Y.K Lau, Z. Wang, and S. PaulSmalley. "Least squares generative adversarial networks," In Proc. of IEEE International Conference on Computer Vision, pp. 2813-2821, 2017, doi:10.1109/ICCV.2017.304.
  38. D.P. Kingma and J. Ba. "Adam: A method for stochastic optimization," arxXiv preprint arXiv:1412.6980, 2014.
  39. R. Zhang, P. Isola, A. Efros, E. Shechtman, and O. Wang. "The unreasonable effectiveness of deep features as a perceptual metric," In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1664-1673, 2018, doi:10.1109/CVPR.2018.00068.
  40. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. "Gans trained by a two time-scale update rule converge to a local nash equilibrium," In Proc. of Advances in Neural Information Processing Systems, pp. 6629-6640, 2017.
  41. K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. "Joint face detection and alignment using multitask cascaded convolutional networks," IEEE Signal Processing Letters, 23(10):1499-1503, 2016. https://doi.org/10.1109/LSP.2016.2603342
  42. Z. Liu, P. Luo, X. Wang, and X. Tang. "Deep learning face attributes in the wild," In Proc. of International Conference on Computer Vision, pp. 3730-3738, 2015, doi:10.1109/ICCV.2015.425.