DOI QR코드

DOI QR Code

Luxo character control using deep reinforcement learning

심층 강화 학습을 이용한 Luxo 캐릭터의 제어

  • 이정민 (한양대학교 컴퓨터소프트웨어학과) ;
  • 이윤상 (한양대학교 컴퓨터소프트웨어학과)
  • Received : 2020.04.20
  • Accepted : 2020.07.28
  • Published : 2020.09.01

Abstract

Motion synthesis using physics-based controllers can generate a character animation that interacts naturally with the given environment and other characters. Recently, various methods using deep neural networks have improved the quality of motions generated by physics-based controllers. In this paper, we present a control policy learned by deep reinforcement learning (DRL) that enables Luxo, the mascot character of Pixar animation studio, to run towards a random goal location while imitating a reference motion and maintaining its balance. Instead of directly training our DRL network to make Luxo reach a goal location, we use a reference motion that is generated to keep Luxo animation's jumping style. The reference motion is generated by linearly interpolating predetermined poses, which are defined with Luxo character's each joint angle. By applying our method, we could confirm a better Luxo policy compared to the one without any reference motions.

캐릭터로 하여금 시뮬레이션 내에서 사용자가 원하는 동작을 보이도록 물리 기반 제어기를 만들 수 있다면 주변 환경의 변화와 다른 캐릭터와의 상호작용에 대하여 자연스러운 반응을 보이는 캐릭터 애니메이션을 생성할 수 있다. 최근 심층 강화 학습을 이용해 물리 기반 제어기가 더 안정적이고 다양한 동작을 합성하도록 하는 연구가 다수 이루어져 왔다. 본 논문에서는 다리가 하나 달린 픽사 애니메이션 스튜디오의 마스코트 캐릭터 Luxo를 주어진 목적지까지 뛰어 도착하게 하는 심층 강화학습 모델을 제시한다. 효율적으로 뛰는 동작을 학습하도록 하기 위해서 Luxo의 각 관절의 각도값들을 선형 보간법으로 생성하여 참조 모션을 만들었으며, 캐릭터는 이를 모방하면서 균형을 유지하여 목표한 위치까지 도달하도록 하는 제어 정책(control policy)을 학습한다. 참조 동작을 사용하지 않고 Luxo 동작을 제어하도록 학습된 정책과 비교한 실험 결과, 제안된 방법을 사용하면 사용자가 지정한 위치로 Luxo가 점프하며 이동하는 정책을 더 효율적으로 학습할 수 있었다.

Keywords

Acknowledgement

본 연구는 과학기술정보통신부 및 정보통신기획평가원의 SW중심대학지원사업의 연구결과로 수행되었으며 (2016-0-00023), 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구이며 (No. 2019R1C1C1006778, NRF-2019R1A4A1029800), 과학기술정보통신부 및 정보통신산업진흥원의 '고성능 컴퓨팅 지원' 사업으로부터 지원받아 수행하였음.

References

  1. K. Yin, K. Loken, and M. van de Panne, "Simbicon: Simple biped locomotion control," ACM Trans. Graph., vol. 26, no. 3, p. Article 105, 2007.
  2. Y. Lee, S. Kim, and J. Lee, "Data-driven biped control," in ACM SIGGRAPH 2010 Papers, ser. SIGGRAPH'10. New York, NY, USA: Association for Computing Machinery, 2010. [Online]. Available: https://doi.org/10.1145/1833349.1781155
  3. X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne, "Deepmimic: Example-guided deep reinforcement learning of physics-based character skills," ACM Trans. Graph., vol. 37, no. 4, pp. 143:1-143:14, July 2018. [Online]. Available: http://doi.acm.org/10.1145/3197517.3201311
  4. J. Z. Kolter, P. Abbeel, and A. Y. Ng, "Hierarchical apprenticeship learning, with application to quadruped locomotion," in Proceedings of the 20th International Conference on Neural Information Processing Systems, ser. NIPS'07. Red Hook, NY, USA: Curran Associates Inc., 2007, p. 769-776.
  5. P. Abbeel, A. Coates, and A. Ng, "Autonomous helicopter aerobatics through apprenticeship learning," I. J. Robotic Res., vol. 29, pp. 1608-1639, 11 2010. https://doi.org/10.1177/0278364910371999
  6. N. M. O. Heess, T. Dhruva, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S. M. A. Eslami, M. A. Riedmiller, and D. Silver, "Emergence of locomotion behaviours in rich environments," ArXiv, vol. abs/1707.02286, 2017.
  7. S. Park, H. Ryu, S. Lee, S. Lee, and J. Lee, "Learning predictand-simulate policies from unorganized human motion data," ACM Trans. Graph., vol. 38, no. 6, 2019.
  8. M. de Lasa, I. Mordatch, and A. Hertzmann, "Featurebased locomotion controllers," ACM Trans. Graph., vol. 29, no. 4, July 2010. [Online]. Available: https://doi.org/10.1145/1778765.1781157
  9. S. Agrawal and M. van de Panne, "Task-based locomotion," ACM Transactions on Graphics (Proc. SIGGRAPH 2016), vol. 35, no. 4, 2016.
  10. J. M. Wang, D. J. Fleet, and A. Hertzmann, "Optimizing walking controllers for uncertain inputs and environments," in ACM SIGGRAPH 2010 Papers, ser. SIGGRAPH '10. New York, NY, USA: Association for Computing Machinery, 2010. [Online]. Available: https://doi.org/10.1145/1833349.1778810
  11. K. Wampler, Z. Popoviundefined, and J. Popoviundefined, "Generalizing locomotion style to new animals with inverse optimal regression," ACM Trans. Graph., vol. 33, no. 4, July 2014. [Online]. Available: https://doi.org/10.1145/2601097.2601192
  12. R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Mach. Learn., vol. 8, no. 3-4, p. 229-256, May 1992. [Online]. Available: https://doi.org/10.1007/BF00992696
  13. J. Schulman, S. Levine, P. Moritz, M. Jordan, and P. Abbeel, "Trust region policy optimization," in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ser. ICML'15. JMLR.org, 2015, p. 1889-1897.
  14. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," 2017.
  15. X. B. Peng and M. van de Panne, "Learning locomotion skills using deeprl: Does the choice of action space matter?" in Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation, ser. SCA '17. New York, NY, USA: Association for Computing Machinery, 2017. [Online]. Available: https://doi.org/10.1145/3099564.3099567
  16. W. Yu, G. Turk, and C. K. Liu, "Learning symmetry and low-energy locomotion," CoRR, vol. abs/1801.08093, 2018. [Online]. Available: http://arxiv.org/abs/1801.08093
  17. X. B. Peng, G. Berseth, K. Yin, and M. Van De Panne, "Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning," ACM Trans. Graph., vol. 36, no. 4, July 2017. [Online]. Available: https://doi.org/10.1145/3072959.3073602
  18. J. Won, J. Park, and J. Lee, "Aerobatics control of flying creatures via self-regulated learning," ACM Trans. Graph., vol. 37, no. 6, Dec. 2018. [Online]. Available: https://doi.org/10.1145/3272127.3275023
  19. A. Witkin and M. Kass, "Spacetime constraints," SIGGRAPH Comput. Graph., vol. 22, no. 4, p. 159-168, June 1988. [Online]. Available: https://doi.org/10.1145/378456.378507
  20. K. Yamane, Y. Ariki, and J. Hodgins, "Animating nonhumanoid characters with human motion data," in Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, ser. SCA '10. Goslar, DEU: Eurographics Association, 2010, p. 169-178.
  21. A. Sharma and K. M. Kitani, "Phase-parametric policies for reinforcement learning in cyclic environments," in AAAI, 2018.
  22. T. Kwon, Y. Lee, and M. van de Panne, "Fast and flexible multilegged locomotion using learned centroidal dynamics," ACM Trans. Graph., 2020. [Online]. Available: http://calab.hanyang.ac.kr/papers/flexLoco.html
  23. R. Sutton and A. Barto, Reinforcement Learning: An Introduction, ser. Adaptive Computation and Machine Learning series. MIT Press, 1998. [Online]. Available: https://books.google.co.kr/books?id=6DKPtQEACAAJ
  24. J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, "High-dimensional continuous control using generalized advantage estimation," 2015.
  25. "stable baselines," https://github.com/hill-a/stable-baselines, accessed: 2020-03-10.
  26. S. Coros, P. Beaudoin, and M. van de Panne, "Generalized biped walking control," ACM Transctions on Graphics, vol. 29, no. 4, p. Article 130, 2010.
  27. I. Mordatch, E. Todorov, and Z. Popoviundefined, "Discovery of complex behaviors through contact-invariant optimization," ACM Trans. Graph., vol. 31, no. 4, July 2012. [Online]. Available: https://doi.org/10.1145/2185520.2185539
  28. J. Tan, K. Liu, and G. Turk, "Stable proportionalderivative controllers," IEEE Comput. Graph. Appl., vol. 31, no. 4, p. 34-44, July 2011. [Online]. Available: https://doi.org/10.1109/MCG.2011.30
  29. A. Rajeswaran, V. Kumar, A. Gupta, J. Schulman, E. Todorov, and S. Levine, "Learning complex dexterous manipulation with deep reinforcement learning and demonstrations," CoRR, vol. abs/1709.10087, 2017. [Online]. Available: http://arxiv.org/abs/1709.10087
  30. Y. Lee, M. S. Park, T. Kwon, and J. Lee, "Locomotion control for many-muscle humanoids," ACM Trans. Graph., vol. 33, no. 6, Nov. 2014. [Online]. Available: https://doi.org/10.1145/2661229.2661233
  31. D. Sharon and M. van de Panne, "Synthesis of controllers for stylized planar bipedal walking," in Proceedings of the 2005 IEEE International Conference on Robotics and Automation, 2005, pp. 2387-2392.
  32. K. Bergamin, S. Clavet, D. Holden, and J. R. Forbes, "Drecon: Data-driven responsive control of physics-based characters," ACM Trans. Graph., vol. 38, no. 6, Nov. 2019. [Online]. Available: https://doi.org/10.1145/3355089.3356536
  33. K. Lee, S. Lee, and J. Lee, "Interactive character animation by learning multi-objective control," ACM Trans. Graph., vol. 37, no. 6, Dec. 2018. [Online]. Available: https://doi.org/10.1145/3272127.3275071