Multi-Agent Deep Reinforcement Learning for Fighting Game: A Comparative Study of PPO and A2C

Yoshua Kaleb Purwanto;Dae-Ki Kang;

doi:10.7236/IJIBC.2024.16.3.192

International Journal of Internet, Broadcasting and Communication

제16권3호
/
Pages.192-198
/
2024
/
2288-4920(pISSN)
/
2288-4939(eISSN)

한국인터넷방송통신학회 (The Institute of Internet, Broadcasting and Communication)

DOI QR Code

Multi-Agent Deep Reinforcement Learning for Fighting Game: A Comparative Study of PPO and A2C

Yoshua Kaleb Purwanto (Department of Computer Engineering, Dongseo University) ;
Dae-Ki Kang (Department of Computer Engineering, Dongseo University)

투고 : 2024.06.07
심사 : 2024.06.20
발행 : 2024.08.31

https://doi.org/10.7236/IJIBC.2024.16.3.192 인용 PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

This paper investigates the application of multi-agent deep reinforcement learning in the fighting game Samurai Shodown using Proximal Policy Optimization (PPO) and Advantage Actor-Critic (A2C) algorithms. Initially, agents are trained separately for 200,000 timesteps using Convolutional Neural Network (CNN) and Multi-Layer Perceptron (MLP) with LSTM networks. PPO demonstrates superior performance early on with stable policy updates, while A2C shows better adaptation and higher rewards over extended training periods, culminating in A2C outperforming PPO after 1,000,000 timesteps. These findings highlight PPO's effectiveness for short-term training and A2C's advantages in long-term learning scenarios, emphasizing the importance of algorithm selection based on training duration and task complexity. The code can be found in this link https://github.com/Lexer04/Samurai-Shodown-with-Reinforcement-Learning-PPO.

키워드

과제정보

This research was supported by "Regional Innovation Strategy (RIS)" through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (MOE) (2023RIS-007) and the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2022R1A2C2012243).

참고문헌

V. Mnih, K. Kavukcuoglu, D. Silver, et al., "Human-level control through deep reinforcement learning," Nature, Feb. 2015. DOI: https://doi.org/10.1038/nature14236
B. Baker, I. Kanitschedier, T. Markov, et al., "Emergent Tool Use From Multi-Agent Autocurricula," Sep. 2019. DOI: https://doi.org/10.48550/arXiv.1909.07528
V. Mnih, K. Kavukcuoglu, D. Silver, et al., "Playing Atari with Deep Reinforcement Learning," Dec. 2013. DOI: https://doi.org/10.48550/arXiv.1312.5602
M. G. Bellemare, Y. Naddaf, J. Veness, et al., "The Arcade Learning Environment: An Evaluation Platform for General Agents," 2012. DOI: https://doi.org/10.48550/arXiv.1207.4708
J. Schulman, F. Wolski, P. Dhariwal, et al., "Proximal Policy Optimization Algorithms," Jul. 2017. DOI: https://doi.org/10.48550/arXiv.1707.06347
V. Mnih, A. P. Badia, M. Mirza, et al., "Asynchronous Methods for Deep Reinforcement Learning," 2016. DOI: https://doi.org/10.48550/arXiv.1602.01783
L. Busoniu, R. Babuska, B. De Schutter, "A comprehensive survey of multi agent reinforcement learning," 2008. DOI: 10.1109/TSMCC.2007.913919
P. Hernandez, B. Kartal, M. Taylor, "A survey and critique of multi agent deep reinforcement learning," Oct. 2018. DOI: https://doi.org/10.1007/s10458-019-09421-1
M. Tan, "Multi-agent reinforcement learning: Independent vs. cooperative agents," Oct. 1997. DOI: https://dl.acm.org/doi/10.5555/284860.284934
S. Shalev, S. Shammah, A. Shashua, "Safe, multi-agent, reinforcement learning for autonomous driving," Oct. 2016. DOI: https://doi.org/10.48550/arXiv.1610.03295
R. Lowe, Y. Wu, A. Tamar, et al., "Multi-agent actor-critic for mixed cooperative-competitive environments," 2017. DOI: https://doi.org/10.48550/arXiv.1706.02275
T. Rashid, M. Samvelyan, C. Schroeder, et al., "Monotonic value function factorisation for deep multi-agent reinforcement learning," Aug. 2020. DOI: https://doi.org/10.48550/arXiv.2003.08839
R. Sutton, D. McAllester, S. Singh, et al., "Policy Gradient methods for Reinforcement Learning with Fu nction Approximation," 1999.
J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel,"Trust Region Policy Optimization," 2015. DOI: https://doi.org/10.48550/arXiv.1502.05477
R. Sutton and A. Barto, Reinforcement Learning: An Introduction, 2nd ed., MIT Press, 2018. ISBN: 978-0262039246
A. Nair, P. Srinivasan, S. Blackwell, et al., " Massively Parallel Methods for Deep Reinforcement Learning," Jul. 2015. DOI: https://doi.org/10.48550/arXiv.1507.04296
A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, "Stable-Baselines3: Reliable Reinforcement Learning Implementations," 2021.
A. Nichol, V. Pfau, C. Hesse, O. Klimov, and J. Schulman, "Gotta Learn Fast: A New Benchmark for Generalization in RL," 2018. DOI: https://doi.org/10.48550/arXiv.1804.03720

International Journal of Internet, Broadcasting and Communication

Multi-Agent Deep Reinforcement Learning for Fighting Game: A Comparative Study of PPO and A2C

초록

키워드

과제정보

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)