DOI QR코드

DOI QR Code

CINEMAPIC : Generative AI-based movie concept photo booth system

시네마픽 : 생성형 AI기반 영화 컨셉 포토부스 시스템

  • Seokhyun Jeong (The Global School of Media, Soongsil University) ;
  • Seungkyu Leem (The Global School of Media, Soongsil University) ;
  • Jungjin Lee (The Global School of Media, Soongsil University)
  • 정석현 (숭실대학교 글로벌미디어학부) ;
  • 임승규 (숭실대학교 글로벌미디어학부) ;
  • 이정진 (숭실대학교 글로벌미디어학부)
  • Received : 2024.06.15
  • Accepted : 2024.07.05
  • Published : 2024.07.25

Abstract

Photo booths have traditionally provided a fun and easy way to capture and print photos to cherish memories. These booths allow individuals to capture their desired poses and props, sharing memories with friends and family. To enable diverse expressions, generative AI-powered photo booths have emerged. However, existing AI photo booths face challenges such as difficulty in taking group photos, inability to accurately reflect user's poses, and the challenge of applying different concepts to individual subjects. To tackle these issues, we present CINEMAPIC, a photo booth system that allows users to freely choose poses, positions, and concepts for their photos. The system workflow includes three main steps: pre-processing, generation, and post-processing to apply individualized concepts. To produce high-quality group photos, the system generates a transparent image for each character and enhances the backdrop-composited image through a small number of denoising steps. The workflow is accelerated by applying an optimized diffusion model and GPU parallelization. The system was implemented as a prototype, and its effectiveness was validated through a user study and a large-scale pilot operation involving approximately 400 users. The results showed a significant preference for the proposed system over existing methods, confirming its potential for real-world photo booth applications. The proposed CINEMAPIC photo booth is expected to lead the way in a more creative and differentiated market, with potential for widespread application in various fields.

오프라인에서 사진을 촬영하는 포토부스는 자신이 원하는 포즈와 소품 등을 통해 자연스럽게 나다운 모습을 촬영할 수 있으며, 함께한 사람들과 추억을 공유하는 특별한 경험을 선사한다. 최근 다양한 표현을 가능하게 하고자 생성형 AI를 활용한 포토부스 사례들이 등장했다. 그러나 기존 AI 포토부스는 단체 사진 촬영이 불가능하고, 대부분 사용자의 포즈를 반영하지 못하며, 개별 인물마다 다른 컨셉을 적용하기 어려운 한계가 존재한다. 본 연구는 이러한 문제를 해결하여 사용자가 자유롭게 포즈와 위치, 컨셉을 선택하여 촬영할 수 있는 AI 포토부스 시네마픽을 제안한다. 인물별 개별 컨셉 적용을 위해 개별 생성 워크플로우를 전처리, 생성, 후처리 세 단계로 설계하고, 이를 실제 프로토타입으로 구현했다. 이 과정에서 인물별 투명 이미지 생성, 배경 생성 후 합성시 발생하는 아티팩트를 줄이는 재생성 테크닉, 최적화 모델 적용 및 GPU 병렬화 등 다양한 방식을 워크플로우에 통합하여 한계점을 극복하였다. 사용자 품질 평가와 약 400명의 사용자를 대상으로 대규모 시범 운영을 통해 시스템의 효과성을 검증했다. 그 결과, 사용자들은 기존 방식에 비해 높은 선호도를 보였으며, 이를 통해 실제 포토부스로의 도입 가능성을 확인했다. 본 연구에서 제안하는 AI 포토부스 시네마픽은 더욱 창의적이고 차별화된 시장을 개척할 수 있을 것으로 기대하며, 앞으로 다양한 응용 분야에서 널리 활용될 것으로 기대된다.

Keywords

Acknowledgement

본 연구는 과학기술정보통신부 및 정보통신기획평가원의 메타버스 융합대학원(IITP-2024-RS-2024-00430997, 기여율 20%)과 지역지능화혁신인재양성사업(IITP-2024-RS-2022-00156360, 기여율 20%)과 문화체육관광부 및 한국콘텐츠진흥원의 2024년도 문화체육관광 연구개발사업(연구개발과제명 : 공연 콘텐츠의 고해상도(8K/16K) 서비스를 위한 AI 기반 영상확장 및 서비스 기술개발, 연구개발과제번호 : RS-2024-00395886, 기여율: 60%)의 지원을 받아 수행되었음

References

  1. 방효은, "셀프 포토 스튜디오 서비스 관련 실태조사," 조사보고서, pp. 1-32, 2023.
  2. 노지은 and 류한영, "Z 세대를 위한 포토부스 애플리케이션 제안," 한국 HCI 학회 학술대회, pp. 975-978, 2023.
  3. 박수빈, "[체험기] 포토부스에서도 ai 사진 촬영 하고 즉석 인화까지!" AI타임스. [Online]. Available: https://www.aitimes.com/news/articleView.html?idxno=158250
  4. 조현영, "LGU+, 대학교 축제 현장에 '익시' 사진관 열어," 연합뉴스. [Online]. Available: https://www.yna.co.kr/view/AKR20240529102700017
  5. Y. Pang, J. Lin, T. Qin, and Z. Chen, "Image-to-image translation: Methods and applications," IEEE Transactions on Multimedia, vol. 24, pp. 3859-3881, 2021.
  6. S. Mo, M. Cho, and J. Shin, "Instagan: Instance-aware image-to-image translation," arXiv preprint arXiv:1812.10889, 2018.
  7. T. Brooks, A. Holynski, and A. A. Efros, "Instructpix2pix: Learning to follow image editing instructions," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18 392-18 402.
  8. J. Ho, A. Jain, and P. Abbeel, "Denoising diffusion probabilistic models," Advances in neural information processing systems, vol. 33, pp. 6840-6851, 2020.
  9. L. Zhang, A. Rao, and M. Agrawala, "Adding conditional control to text-to-image diffusion models," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3836-3847.
  10. J. Shi, W. Xiong, Z. Lin, and H. J. Jung, "Instantbooth: Personalized text-to-image generation without test-time finetuning," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8543-8552.
  11. X. Zhang, X.-Y. Wei, W. Zhang, J. Wu, Z. Zhang, Z. Lei, and Q. Li, "A survey on personalized content synthesis with diffusion models," arXiv preprint arXiv:2405.05538, 2024.
  12. N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman, "Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 500-22 510.
  13. E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, "Lora: Low-rank adaptation of large language models," arXiv preprint arXiv:2106.09685, 2021.
  14. G. Xiao, T. Yin, W. T. Freeman, F. Durand, and S. Han, "Fast-composer: Tuning-free multi-subject image generation with localized attention," arXiv preprint arXiv:2305.10431, 2023.
  15. H. Ye, J. Zhang, S. Liu, X. Han, and W. Yang, "Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models," arXiv preprint arXiv:2308.06721, 2023.
  16. Q. Wang, X. Bai, H. Wang, Z. Qin, and A. Chen, "Instantid: Zero-shot identity-preserving generation in seconds," arXiv preprint arXiv:2401.07519, 2024.
  17. C. Kim, J. Lee, S. Joung, B. Kim, and Y.-M. Baek, "Instant-family: Masked attention for zero-shot multi-id image generation," arXiv preprint arXiv:2404.19427, 2024.
  18. Gourieff, "sd-webui-reactor," https://github.com/Gourieff/sd-webui-reactor, 2024, accessed: 2024-06-11.
  19. J. Deng, J. Guo, N. Xue, and S. Zafeiriou, "Arcface: Additive angular margin loss for deep face recognition," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4690-4699.
  20. Snow, "Snow corp official website," https://www.snowcorp.com/.
  21. Carat.im, "Carat.im official website," https://carat.im/.
  22. A. Izquierdo, "Opendallev1.1," https://huggingface.co/dataautogpt3/OpenDalleV1.1, 2023.
  23. S. Ramirez, "Fastapi," https://fastapi.tiangolo.com/, 2018.
  24. AUTOMATIC1111, "stable-diffusion-webui," https://github.com/AUTOMATIC1111/stable-diffusionwebui, 2024.
  25. Z. Yang, A. Zeng, C. Yuan, and Y. Li, "Effective whole-body pose estimation with two-stages distillation," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4210-4220.
  26. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.
  27. R. Birkl, D. Wofk, and M. Muller, "Midas v3. 1-a model zoo for robust monocular relative depth estimation," arXiv preprint arXiv:2307.14460, 2023.
  28. D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. Muller, J. Penna, and R. Rombach, "Sdxl: Improving latent diffusion models for high-resolution image synthesis," arXiv preprint arXiv:2307.01952, 2023.
  29. L. Zhang and M. Agrawala, "Transparent image layer diffusion using latent transparency," arXiv preprint arXiv:2402.17113, 2024.
  30. S. Luo, Y. Tan, L. Huang, J. Li, and H. Zhao, "Latent consistency models: Synthesizing high-resolution images with fewstep inference," arXiv preprint arXiv:2310.04378, 2023.
  31. S. Luo, Y. Tan, S. Patil, D. Gu, P. von Platen, A. Passos, L. Huang, J. Li, and H. Zhao, "Lcm-lora: A universal stable-diffusion acceleration module," arXiv preprint arXiv:2311.05556, 2023.
  32. S. Lin, A. Wang, and X. Yang, "Sdxl-lightning: Progressive adversarial diffusion distillation," arXiv preprint arXiv:2402.13929, 2024.
  33. Y. Ren, X. Xia, Y. Lu, J. Zhang, J. Wu, P. Xie, X. Wang, and X. Xiao, "Hyper-sd: Trajectory segmented consistency model for efficient image synthesis," arXiv preprint arXiv:2404.13686, 2024.
  34. L. Ke, Y.-W. Tai, and C.-K. Tang, "Deep occlusion-aware instance segmentation with overlapping bilayers," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 4019-4028.