DOI QR코드

DOI QR Code

Comparison of learning performance of character controller based on deep reinforcement learning according to state representation

상태 표현 방식에 따른 심층 강화 학습 기반 캐릭터 제어기의 학습 성능 비교

  • 손채준 (한양대학교 컴퓨터소프트웨어학과) ;
  • 권태수 (한양대학교 컴퓨터소프트웨어학과) ;
  • 이윤상 (한양대학교 컴퓨터소프트웨어학과)
  • Received : 2021.11.12
  • Accepted : 2021.11.26
  • Published : 2021.12.01

Abstract

The character motion control based on physics simulation using reinforcement learning continue to being carried out. In order to solve a problem using reinforcement learning, the network structure, hyperparameter, state, action and reward must be properly set according to the problem. In many studies, various combinations of states, action and rewards have been defined and successfully applied to problems. Since there are various combinations in defining state, action and reward, many studies are conducted to analyze the effect of each element to find the optimal combination that improves learning performance. In this work, we analyzed the effect on reinforcement learning performance according to the state representation, which has not been so far. First we defined three coordinate systems: root attached frame, root aligned frame, and projected aligned frame. and then we analyze the effect of state representation by three coordinate systems on reinforcement learning. Second, we analyzed how it affects learning performance when various combinations of joint positions and angles for state.

물리 시뮬레이션 기반의 캐릭터 동작 제어 문제를 강화학습을 이용하여 해결해나가는 연구들이 계속해서 진행되고 있다. 강화학습을 사용하여 문제를 풀기 위해서는 네트워크 구조, 하이퍼파라미터 튜닝, 상태(state), 행동(action), 보상(reward)이 문제에 맞게 적절히 설정이 되어야 한다. 많은 연구들에서 다양한 조합으로 상태, 행동, 보상을 정의하였고, 성공적으로 문제에 적용하였다. 상태, 행동, 보상을 정의함에 다양한 조합이 있다보니 학습 성능을 향상시키는 최적의 조합을 찾기 위해서 각각의 요소들이 미치는 영향을 분석하는 연구도 진행되고 있다. 우리는 지금까지 이뤄지지 않았던 상태 표현 방식에 따른 강화학습성능에 미치는 영향을 분석하였다. 첫째로, root attached frame, root aligned frame, projected aligned frame 3가지로 좌표계를 정의하였고, 이에 대해 표현된 상태를 이용하여 강화학습에 미치는 영향을 분석하였다. 둘째로, 상태를 정의 할 때, 관절의 위치, 각도로 다양하게 조합하는 경우에 학습성능에 어떠한 영향을 미치는지 분석하였다.

Keywords

Acknowledgement

본 연구는 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임 (No. 2019R1C1C1006778, NRF-2019R1A4A1029800).

References

  1. X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne, "Deepmimic," ACM Transactions on Graphics, vol. 37, no. 4, p. 1-14, Aug 2018. [Online]. Available: http://dx.doi.org/10.1145/3197517.3201311
  2. J. Won and J. Lee, "Learning body shape variation in physics-based characters," ACM Trans. Graph., vol. 38, no. 6, Nov. 2019. [Online]. Available: https://doi.org/10.1145/3355089.3356499
  3. X. B. Peng and M. van de Panne, "Learning locomotion skills using deeprl," Proceedings of the ACM SIGGRAPH / Euro-graphics Symposium on Computer Animation, Jul 2017. [Online]. Available: http://dx.doi.org/10.1145/3099564.3099567
  4. D. Reda, T. Tao, and M. van de Panne, "Learning to locomote: Understanding how environment design matters for deep reinforcement learning," Motion, Interaction and Games, Oct 2020. [Online]. Available: http://dx.doi.org/10.1145/3424636.3426907
  5. K. Bergamin, S. Clavet, D. Holden, and J. R. Forbes, "Drecon: Data-driven responsive control of physics-based characters," ACM Trans. Graph., vol. 38, no. 6, Nov. 2019. [Online]. Available: https://doi.org/10.1145/3355089.3356536
  6. S. Park, H. Ryu, S. Lee, S. Lee, and J. Lee, "Learning predict-and-simulate policies from unorganized human motion data," ACM Trans. Graph., vol. 38, no. 6, 2019.
  7. X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa, "Amp," ACM Transactions on Graphics, vol. 40, no. 4, p. 1-20, Aug 2021. [Online]. Available: http://dx.doi.org/10.1145/3450626.3459670
  8. S. Lee, M. Park, K. Lee, and J. Lee, "Scalable muscle-actuated human simulation and control," ACM Trans. Graph., vol. 38, no. 4, July 2019. [Online]. Available: https://doi.org/10.1145/3306346.3322972
  9. J. Won, D. Gopinath, and J. Hodgins, "Control strategies for physically simulated characters performing two-player competitive sports," ACM Trans. Graph., vol. 40, no. 4, jul 2021. [Online]. Available: https://doi.org/10.1145/3450626.3459761
  10. Z. Yin, Z. Yang, M. van de Panne, and K. Yin, "Discovering diverse athletic jumping strategies," 2021.
  11. L. Engstrom, A. Ilyas, S. Santurkar, D. Tsipras, F. Janoos, L. Rudolph, and A. Madry, "Implementation matters in deep policy gradients: A case study on ppo and trpo," 2020.
  12. L. Ma, Z. Yang, B. Guo, and K. Yin, "Towards robust direction invariance in character animation," Computer Graphics Forum, vol. 38, no. 7, p. 235-242, Oct 2019. [Online]. Available: http://dx.doi.org/10.1111/cgf.13832
  13. "Bullet physics library," https://pybullet.org/, 2015.