Multi-Agent Deep Reinforcement Learning for Fighting Game: A Comparative Study of PPO and A2C

Yoshua Kaleb Purwanto;Dae-Ki Kang;

doi:10.7236/IJIBC.2024.16.3.192

International Journal of Internet, Broadcasting and Communication

Volume 16 Issue 3
/
Pages.192-198
/
2024
/
2288-4920(pISSN)
/
2288-4939(eISSN)

The Institute of Internet, Broadcasting and Communication (한국인터넷방송통신학회)

DOI QR Code

Multi-Agent Deep Reinforcement Learning for Fighting Game: A Comparative Study of PPO and A2C

Yoshua Kaleb Purwanto (Department of Computer Engineering, Dongseo University) ;
Dae-Ki Kang (Department of Computer Engineering, Dongseo University)

Received : 2024.06.07
Accepted : 2024.06.20
Published : 2024.08.31

https://doi.org/10.7236/IJIBC.2024.16.3.192 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

This paper investigates the application of multi-agent deep reinforcement learning in the fighting game Samurai Shodown using Proximal Policy Optimization (PPO) and Advantage Actor-Critic (A2C) algorithms. Initially, agents are trained separately for 200,000 timesteps using Convolutional Neural Network (CNN) and Multi-Layer Perceptron (MLP) with LSTM networks. PPO demonstrates superior performance early on with stable policy updates, while A2C shows better adaptation and higher rewards over extended training periods, culminating in A2C outperforming PPO after 1,000,000 timesteps. These findings highlight PPO's effectiveness for short-term training and A2C's advantages in long-term learning scenarios, emphasizing the importance of algorithm selection based on training duration and task complexity. The code can be found in this link https://github.com/Lexer04/Samurai-Shodown-with-Reinforcement-Learning-PPO.

Keywords

Acknowledgement

This research was supported by "Regional Innovation Strategy (RIS)" through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (MOE) (2023RIS-007) and the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2022R1A2C2012243).

References

V. Mnih, K. Kavukcuoglu, D. Silver, et al., "Human-level control through deep reinforcement learning," Nature, Feb. 2015. DOI: https://doi.org/10.1038/nature14236
B. Baker, I. Kanitschedier, T. Markov, et al., "Emergent Tool Use From Multi-Agent Autocurricula," Sep. 2019. DOI: https://doi.org/10.48550/arXiv.1909.07528
V. Mnih, K. Kavukcuoglu, D. Silver, et al., "Playing Atari with Deep Reinforcement Learning," Dec. 2013. DOI: https://doi.org/10.48550/arXiv.1312.5602
M. G. Bellemare, Y. Naddaf, J. Veness, et al., "The Arcade Learning Environment: An Evaluation Platform for General Agents," 2012. DOI: https://doi.org/10.48550/arXiv.1207.4708
J. Schulman, F. Wolski, P. Dhariwal, et al., "Proximal Policy Optimization Algorithms," Jul. 2017. DOI: https://doi.org/10.48550/arXiv.1707.06347
V. Mnih, A. P. Badia, M. Mirza, et al., "Asynchronous Methods for Deep Reinforcement Learning," 2016. DOI: https://doi.org/10.48550/arXiv.1602.01783
L. Busoniu, R. Babuska, B. De Schutter, "A comprehensive survey of multi agent reinforcement learning," 2008. DOI: 10.1109/TSMCC.2007.913919
P. Hernandez, B. Kartal, M. Taylor, "A survey and critique of multi agent deep reinforcement learning," Oct. 2018. DOI: https://doi.org/10.1007/s10458-019-09421-1
M. Tan, "Multi-agent reinforcement learning: Independent vs. cooperative agents," Oct. 1997. DOI: https://dl.acm.org/doi/10.5555/284860.284934
S. Shalev, S. Shammah, A. Shashua, "Safe, multi-agent, reinforcement learning for autonomous driving," Oct. 2016. DOI: https://doi.org/10.48550/arXiv.1610.03295
R. Lowe, Y. Wu, A. Tamar, et al., "Multi-agent actor-critic for mixed cooperative-competitive environments," 2017. DOI: https://doi.org/10.48550/arXiv.1706.02275
T. Rashid, M. Samvelyan, C. Schroeder, et al., "Monotonic value function factorisation for deep multi-agent reinforcement learning," Aug. 2020. DOI: https://doi.org/10.48550/arXiv.2003.08839
R. Sutton, D. McAllester, S. Singh, et al., "Policy Gradient methods for Reinforcement Learning with Fu nction Approximation," 1999.
J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel,"Trust Region Policy Optimization," 2015. DOI: https://doi.org/10.48550/arXiv.1502.05477
R. Sutton and A. Barto, Reinforcement Learning: An Introduction, 2nd ed., MIT Press, 2018. ISBN: 978-0262039246
A. Nair, P. Srinivasan, S. Blackwell, et al., " Massively Parallel Methods for Deep Reinforcement Learning," Jul. 2015. DOI: https://doi.org/10.48550/arXiv.1507.04296
A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, "Stable-Baselines3: Reliable Reinforcement Learning Implementations," 2021.
A. Nichol, V. Pfau, C. Hesse, O. Klimov, and J. Schulman, "Gotta Learn Fast: A New Benchmark for Generalization in RL," 2018. DOI: https://doi.org/10.48550/arXiv.1804.03720

International Journal of Internet, Broadcasting and Communication

Multi-Agent Deep Reinforcement Learning for Fighting Game: A Comparative Study of PPO and A2C

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)