DOI QR코드

DOI QR Code

Gaze-Manipulated Data Augmentation for Gaze Estimation With Diffusion Autoencoders

디퓨전 오토인코더의 시선 조작 데이터 증강을 통한 시선 추적

  • Kangryun Moon (Sungkyunkwan University) ;
  • Younghan Kim (Gachon University) ;
  • Yongjun Park (Visualcamp) ;
  • Yonggyu Kim (Visualcamp)
  • 문강륜 (성균관대학교) ;
  • 김영한 (가천대학교) ;
  • 박용준 (비주얼캠프) ;
  • 김용규 (비주얼캠프)
  • Received : 2024.06.15
  • Accepted : 2024.07.05
  • Published : 2024.07.25

Abstract

Collecting a dataset with a corresponding labeled gaze vector requires a high cost in the gaze estimation field. In this paper, we suggest a data augmentation of manipulating the gaze of an original image, which improves the accuracy of the gaze estimation model when the number of given gaze labels is restricted. By conducting multi-class gaze bin classification as an auxiliary task and adjusting the latent variable of the diffusion model, the model semantically edits the gaze from the original image. We manipulate a non-binary attribute, pitch and yaw of gaze vector to a desired range and uses the edited image as an augmented train data. The improved gaze accuracy of the gaze estimation network in the semi-supervised learning validates the effectiveness of our data augmentation, especially when the number of gaze labels is 50k or less.

시선 벡터 정답값을 갖는 대규모 데이터의 수집은 시선 추적 분야에서 많은 비용을 필요로 한다. 본 논문에서는 원본 사진의 시선을 수정하는 데이터 증강 기법을 사용하여 제한된 개수의 시선 정답값이 주어진 상황에서 시선 추적 모델의 정확도를 향상시키는 방법을 제안한다. 시선 구간 다중 클래스 분류를 보조 작업으로 학습하고, 디퓨전 오토인코더의 잠재 변수를 조정하여 원본 사진의 시선을 편집한 사진을 생성한다. 기존의 얼굴 속성 편집과 달리, 우리는 이진 속성이 아닌 시선 벡터의 피치와 요를 지정한 범주 내로 변경하며, 편집된 사진을 시선 추적 모델의 증강된 학습 데이터로 활용한다. 시선 정답값이 5만 개 이하일 때 준지도 학습에서의 시선 추적 모델의 정확도 향상은 제안한 데이터 증강 기법의 효과를 입증한다.

Keywords

Acknowledgement

이 논문은 2024년도 비주얼캠프의 지원을 받아 수행된 연구임

References

  1. S. De Silva, S. Dayarathna, G. Ariyarathne, D. Meedeniya, S. Jayarathna, and A. M. Michalek, "Computational decision support system for adhd identification," International Journal of Automation and Computing, vol. 18, no. 2, pp. 233-255, 2021.
  2. J. H. Goldberg, M. J. Stimson, M. Lewenstein, N. Scott, and A. M. Wichansky, "Eye tracking in web search tasks: design implications," in Proceedings of the 2002 symposium on Eye tracking research & applications, 2002, pp. 51-58.
  3. S. Hong, Y. Kim, and T. Park, ""blinks in the dark": Blink estimation with domain adversarial training (beat) network," IEEE Transactions on Consumer Electronics, 2023.
  4. J.-Y. Lee, H.-M. Park, S.-H. Lee, T.-E. Kim, and J.-S. Choi, "Design and implementation of an augmented reality system using gaze interaction," in 2011 International Conference on Information Science and Applications. IEEE, 2011, pp. 1-8.
  5. K. Krafka, A. Khosla, P. Kellnhofer, H. Kannan, S. Bhandarkar, W. Matusik, and A. Torralba, "Eye tracking for everyone," in Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE and the Computer Vision Foundation, 2016, pp. 2176-2184.
  6. X. Zhang, Y. Sugano, M. Fritz, and A. Bulling, "Mpiigaze: Real-world dataset and deep appearance-based gaze estimation," IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 1, pp. 162-175, 2017.
  7. X. Zhang, S. Park, T. Beeler, D. Bradley, S. Tang, and O. Hilliges, "Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation," in Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part V 16. Springer, 2020, pp. 365-381.
  8. S. H. Choi, D. Son, Y. Ha, Y. Kim, S. Hong, and T. Park, "Looking to personalize gaze estimation using transformers," Journal of Computing Science and Engineering, vol. 17, no. 2, pp. 41-50, 2023.
  9. K. A. Funes Mora, F. Monay, and J.-M. Odobez, "Eyediap: A database for the development and evaluation of gaze estimation algorithms from rgb and rgb-d cameras," in Proceedings of the symposium on eye tracking research and applications, 2014, pp. 255-258.
  10. Y. Ganin, D. Kononenko, D. Sungatullina, and V. Lempitsky, "Deepwarp: Photorealistic image resynthesis for gaze manipulation," in Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11- 14, 2016, Proceedings, Part II 14. Springer, 2016, pp. 311-326.
  11. Y. Yu, G. Liu, and J.-M. Odobez, "Improving few-shot user-specific gaze adaptation via gaze redirection synthesis," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 937-11 946.
  12. K. Wang, R. Zhao, and Q. Ji, "A hierarchical generative model for eye image synthesis and eye gaze estimation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 440-448.
  13. Y. Shen, J. Gu, X. Tang, and B. Zhou, "Interpreting the latent space of gans for semantic face editing," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9243-9252.
  14. K. Preechakul, N. Chatthee, S. Wizadwongsa, and S. Suwajanakorn, "Diffusion autoencoders: Toward a meaningful and decodable representation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10 619-10 629.
  15. Y. Bao, Y. Liu, H. Wang, and F. Lu, "Generalizing gaze estimation with rotation consistency," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE and the Computer Vision Foundation, 2022, pp. 4207-4216.
  16. J. Qin, T. Shimoyama, and Y. Sugano, "Learning-by-novel-view-synthesis for full-face appearance-based 3d gaze estimation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4981-4991.
  17. P. Dhariwal and A. Nichol, "Diffusion models beat gans on image synthesis," Advances in neural information processing systems, vol. 34, pp. 8780-8794, 2021.
  18. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, "High-resolution image synthesis with latent diffusion models," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684-10 695.
  19. G. Batzolis, J. Stanczuk, C.-B. Schonlieb, and C. Etmann, "Conditional image generation with score-based diffusion models," arXiv preprint arXiv:2111.13606, 2021.
  20. J. Choi, S. Kim, Y. Jeong, Y. Gwon, and S. Yoon, "Ilvr: Conditioning method for denoising diffusion probabilistic models," 2021.
  21. J. Ho, A. Jain, and P. Abbeel, "Denoising diffusion probabilistic models," Advances in neural information processing systems, vol. 33, pp. 6840-6851, 2020.
  22. J. Song, C. Meng, and S. Ermon, "Denoising diffusion implicit models," arXiv preprint arXiv:2010.02502, 2020.
  23. G. Kim, H. Shim, H. Kim, Y. Choi, J. Kim, and E. Yang, "Diffusion video autoencoders: Toward temporally consistent face video editing via disentangled video encoding," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6091-6100.
  24. J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired image-to-image translation using cycle-consistent adversarial networks," in Computer Vision (ICCV), 2017 IEEE International Conference on, 2017.
  25. T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, "Learning to discover cross-domain relations with generative adversarial networks," in International conference on machine learning. PMLR, 2017, pp. 1857-1865.
  26. A. Nickabadi, M. S. Fard, N. M. Farid, and N. Mohammadbagheri, "A comprehensive survey on semantic facial attribute editing using generative adversarial networks," arXiv preprint arXiv:2205.10587, 2022.
  27. T. Karras, S. Laine, and T. Aila, "A style-based generator architecture for generative adversarial networks," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401-4410.
  28. W. Xia, Y. Zhang, Y. Yang, J.-H. Xue, B. Zhou, and M.-H. Yang, "Gan inversion: A survey," IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 3, pp. 3121-3138, 2022.
  29. J. Deng, J. Guo, N. Xue, and S. Zafeiriou, "Arcface: Additive angular margin loss for deep face recognition," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE and the Computer Vision Foundation, 2019, pp. 4690-4699.
  30. A. A. Abdelrahman, T. Hempel, A. Khalifa, A. Al-Hamadi, and L. Dinges, "L2cs-net : Fine-grained gaze estimation in unconstrained environments," in 2023 8th International Conference on Frontiers of Signal Processing (ICFSP). IEEE, 2023, pp. 98-102.
  31. X. Zhang, Y. Sugano, and A. Bulling, "Revisiting data normalization for appearance-based gaze estimation," in Proceedings of the 2018 ACM symposium on eye tracking research & applications, 2018, pp. 1-9.
  32. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.