DOI QR코드

DOI QR Code

Punching Motion Generation using Reinforcement Learning and Trajectory Search Method

경로 탐색 기법과 강화학습을 사용한 주먹 지르기동작 생성 기법

  • Received : 2018.06.19
  • Accepted : 2018.07.03
  • Published : 2018.08.31

Abstract

Recent advances in machine learning approaches such as deep neural network and reinforcement learning offer significant performance improvements in generating detailed and varied motions in physically simulated virtual environments. The optimization methods are highly attractive because it allows for less understanding of underlying physics or mechanisms even for high-dimensional subtle control problems. In this paper, we propose an efficient learning method for stochastic policy represented as deep neural networks so that agent can generate various energetic motions adaptively to the changes of tasks and states without losing interactivity and robustness. This strategy could be realized by our novel trajectory search method motivated by the trust region policy optimization method. Our value-based trajectory smoothing technique finds stably learnable trajectories without consulting neural network responses directly. This policy is set as a trust region of the artificial neural network, so that it can learn the desired motion quickly.

Keywords

References

  1. J.A. Boyan and A.W. Moore, "Generalization in Reinforcement Learning: Safely Approximating the Value Function," Advances in Neural Information Processing Systems, pp. 369-376, 1995.
  2. V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, et. al, "Human-level Control through Deep Reinforcement Learning," Nature, Vol. 518, No. 7540, pp. 529-533, 2015. https://doi.org/10.1038/nature14236
  3. D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. van den Driessche, et. al., "Mastering the Game of Go with Deep Neural Networks and Tree Search," Nature, Vol. 529, No. 7587, pp. 484-489, 2016. https://doi.org/10.1038/nature16961
  4. X.B. Peng, G. Berseth, and M. van de Panne, "Terrainadaptive Locomotion Skills Using Deep Reinforcement Learning" ACM Transactions on Graphics, Vol. 35, No. 4, pp. 81:1-81:12, 2016.
  5. S. Hong, "Physics-based Character Motion Research," Media Storytelling, Vol. 1, pp. 66-79, 2014.
  6. J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, "Trust Region Policy Optimization," Proceedings of the 32nd International Conference on Machine Learning, pp. 1889-1897, 2015.
  7. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal Policy Optimization Algorithms," Computing Research Repository, arXiv:1707.06347, 2017.
  8. K. Yin, K. Loken, and M. Van de Panne, "Simbicon: Simple Biped Locomotion Control," ACM Transactions on Graphics, Vol. 26, No. 3, pp. 1-10, 2007. https://doi.org/10.1145/1276377.1276379
  9. T. Kwon and J. Hodgins, "Control Systems for Human Running Using an Inverted Pendulum Model and a Reference Motion Capture Sequence," Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 129-138, 2010.
  10. M. De Lasa, I. Mordatch, and A. Hertzmann, "Featurebased Locomotion Controllers," ACM Transactions on Graphics, Vol. 29, No. 4, pp.131:1-131:10, 2010.
  11. S. Ha and C.K. Liu, "Iterative Training of Dynamic Skills Inspired by Human Coaching Techniques," ACM Transactions on Graphics, Vol. 34, No. 1, pp.1:1-1:11, 2014.
  12. Y. Lee, S. Kim, and J. Lee, "Data-driven Biped
  13. V.B. Zordan and J.K. Hodgins, "Motion Capturedriven Simulations that Hit and React," Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 89-96, 2002.
  14. J. Hwang, I. Suh, and T. Kwon, "Editing and Synthesizing Two-character Motions Using a Coupled Inverted Pendulum Model," Computer Graphics Forum, Vol. 33, No. 7, pp. 21-30, 2014. https://doi.org/10.1111/cgf.12470
  15. R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, 1998.
  16. Y.S. Sung and K.E. Cho, "Q-learning Using Influence Map," Journal of Korea Multimedia Society, Vol. 9, No, 5, pp. 649-657, 2006.
  17. H. van Hasselt and M.A. Wiering, "Reinforcement Learning in Continuous Action Spaces," Proceeding of 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp. 272-279, 2007.
  18. J.A. Bagnell and J. Schneider, "Covariant Policy Search," Proceeding of International Joint Conference on Aritifical Intelligence, pp. 1019-1024, 2003.
  19. H.J. Kappen, "Path Integrals and Symmetry Breaking for Optimal Control Theory," Journal of Statistical Mechanics: Theory and Experiment, Vol. 2005, No. 11, p. P11011, 2005. https://doi.org/10.1088/1742-5468/2005/11/P11011
  20. J. Peters, K. Mulling, and Y. Altun, "Relative Entropy Policy Search," Proceeding of Association for the Advancement of Artificial Intelligence Atlanta, pp. 1607-1612, 2010.
  21. V. Mnih, A.P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, "Asynchronous Methods for Deep Reinforcement Learning," Proceeding of International Conference on Machine Learning, pp. 1928-1937, 2016.
  22. D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, "Deterministic Policy Gradient Algorithms," Proceedings of the 31st International Conference on Machine Learning, pp. 387-395, 2014.
  23. T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous Control with Deep Reinforcement Learning," Computing Research Repository, arXiv:1509.02971, 2015.
  24. N. Heess, D. TB, S. Sriram, J. Lemmon, J. Merel, G. Wayne, et. al., "Emergence of Locomotion Behaviours in Rich Environments," Computing Research Repository, arXiv: 1707.02286, 2017.
  25. J. Tan, Y. Gu, C. K. Liu, and G. Turk, "Learning Bicycle Stunts," ACM Transactions on Graphic, Vol. 33, No. 4, pp. 50:1-50:12, 2014.
  26. Y. Wang, E. Li, F. Wang, and B. Xu, "A Virtual Character Learns to Defend Himself in Sword Fighting Based on Qnetwork," Procceding of 2016 IEEE 28th International Conference on Tools with Artificial Intelligence, pp. 291-298, 2016.
  27. L. Liu and J. Hodgins, "Learning to Schedule Control Fragments for Physics-based Characters Using Deep q-Learning," ACM Transactions on Graphic, Vol. 36, No. 3, pp. 29:1-29:14, 2017.
  28. J. Won, J. Park, K. Kim, and J. Lee, "How to Train Your Dragon: Example-guided Control of Flapping Flight," ACM Transactions on Graphic, Vol. 36, No. 4, pp. 198:1-198:13, 2017.
  29. X.B. Peng, G. Berseth, K. Yin, and M. van de Panne, "Deeploco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning," ACM Transactions on Graphics, Vol. 36, No. 4, pp. 41:1-41:13, 2017.
  30. H.G. Armstrong, "Anthropometry and Mass Distribution for Human Analogues," Military Male Aviators, Vol. 1, 1988.
  31. M. Cheraghi, H.A. Alinejad, A.R. Arshi, and E. Shirzad, "Kinematics of Straight Right Punch in Boxing," Annals of Applied Sport Science, Vol. 2, No. 2, pp. 39-50, 2014. https://doi.org/10.18869/acadpub.aassjournal.2.2.39
  32. S.J. Kim and B.H. Woo, "Biomechanical Analysis of Left and Right Hook Type on Trunk Motion in Boxing," The Korea Journal of Sports Science, Vol. 22, No. 6, pp. 1519-1529, 2013.