DOI QR코드

DOI QR Code

Multi-Agent Deep Reinforcement Learning for Fighting Game: A Comparative Study of PPO and A2C

  • Yoshua Kaleb Purwanto (Department of Computer Engineering, Dongseo University) ;
  • Dae-Ki Kang (Department of Computer Engineering, Dongseo University)
  • Received : 2024.06.07
  • Accepted : 2024.06.20
  • Published : 2024.08.31

Abstract

This paper investigates the application of multi-agent deep reinforcement learning in the fighting game Samurai Shodown using Proximal Policy Optimization (PPO) and Advantage Actor-Critic (A2C) algorithms. Initially, agents are trained separately for 200,000 timesteps using Convolutional Neural Network (CNN) and Multi-Layer Perceptron (MLP) with LSTM networks. PPO demonstrates superior performance early on with stable policy updates, while A2C shows better adaptation and higher rewards over extended training periods, culminating in A2C outperforming PPO after 1,000,000 timesteps. These findings highlight PPO's effectiveness for short-term training and A2C's advantages in long-term learning scenarios, emphasizing the importance of algorithm selection based on training duration and task complexity. The code can be found in this link https://github.com/Lexer04/Samurai-Shodown-with-Reinforcement-Learning-PPO.

Keywords

Acknowledgement

This research was supported by "Regional Innovation Strategy (RIS)" through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (MOE) (2023RIS-007) and the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2022R1A2C2012243).

References

  1. V. Mnih, K. Kavukcuoglu, D. Silver, et al., "Human-level control through deep reinforcement learning," Nature, Feb. 2015. DOI: https://doi.org/10.1038/nature14236
  2. B. Baker, I. Kanitschedier, T. Markov, et al., "Emergent Tool Use From Multi-Agent Autocurricula," Sep. 2019. DOI: https://doi.org/10.48550/arXiv.1909.07528
  3. V. Mnih, K. Kavukcuoglu, D. Silver, et al., "Playing Atari with Deep Reinforcement Learning," Dec. 2013. DOI: https://doi.org/10.48550/arXiv.1312.5602
  4. M. G. Bellemare, Y. Naddaf, J. Veness, et al., "The Arcade Learning Environment: An Evaluation Platform for General Agents," 2012. DOI: https://doi.org/10.48550/arXiv.1207.4708
  5. J. Schulman, F. Wolski, P. Dhariwal, et al., "Proximal Policy Optimization Algorithms," Jul. 2017. DOI: https://doi.org/10.48550/arXiv.1707.06347
  6. V. Mnih, A. P. Badia, M. Mirza, et al., "Asynchronous Methods for Deep Reinforcement Learning," 2016. DOI: https://doi.org/10.48550/arXiv.1602.01783
  7. L. Busoniu, R. Babuska, B. De Schutter, "A comprehensive survey of multi agent reinforcement learning," 2008. DOI: 10.1109/TSMCC.2007.913919
  8. P. Hernandez, B. Kartal, M. Taylor, "A survey and critique of multi agent deep reinforcement learning," Oct. 2018. DOI: https://doi.org/10.1007/s10458-019-09421-1
  9. M. Tan, "Multi-agent reinforcement learning: Independent vs. cooperative agents," Oct. 1997. DOI: https://dl.acm.org/doi/10.5555/284860.284934
  10. S. Shalev, S. Shammah, A. Shashua, "Safe, multi-agent, reinforcement learning for autonomous driving," Oct. 2016. DOI: https://doi.org/10.48550/arXiv.1610.03295
  11. R. Lowe, Y. Wu, A. Tamar, et al., "Multi-agent actor-critic for mixed cooperative-competitive environments," 2017. DOI: https://doi.org/10.48550/arXiv.1706.02275
  12. T. Rashid, M. Samvelyan, C. Schroeder, et al., "Monotonic value function factorisation for deep multi-agent reinforcement learning," Aug. 2020. DOI: https://doi.org/10.48550/arXiv.2003.08839
  13. R. Sutton, D. McAllester, S. Singh, et al., "Policy Gradient methods for Reinforcement Learning with Fu nction Approximation," 1999.
  14. J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel,"Trust Region Policy Optimization," 2015. DOI: https://doi.org/10.48550/arXiv.1502.05477
  15. R. Sutton and A. Barto, Reinforcement Learning: An Introduction, 2nd ed., MIT Press, 2018. ISBN: 978-0262039246
  16. A. Nair, P. Srinivasan, S. Blackwell, et al., " Massively Parallel Methods for Deep Reinforcement Learning," Jul. 2015. DOI: https://doi.org/10.48550/arXiv.1507.04296
  17. A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, "Stable-Baselines3: Reliable Reinforcement Learning Implementations," 2021.
  18. A. Nichol, V. Pfau, C. Hesse, O. Klimov, and J. Schulman, "Gotta Learn Fast: A New Benchmark for Generalization in RL," 2018. DOI: https://doi.org/10.48550/arXiv.1804.03720