DOI QR코드

DOI QR Code

Card Battle Game Agent Based on Reinforcement Learning with Play Level Control

플레이 수준 조절이 가능한 강화학습 기반 카드형 대전 게임 에이전트

  • 이용철 (전남대학교 전자컴퓨터공학과) ;
  • 이칠우 (전남대학교 전자컴퓨터공학과)
  • Received : 2023.11.22
  • Accepted : 2024.01.02
  • Published : 2024.02.29

Abstract

Game agents which are behavioral agent for game playing are a crucial component of game satisfaction. However it takes a lot of time and effort to create game agents for various game levels, environments, and players. In addition, when the game environment changes such as adding contents or updating characters, new game agents need to be developed and the development difficulty gradually increases. And it is important to have a game agent that can be customized for different levels of players. This is because a game agent that can play games of various levels is more useful and can increase the satisfaction of more players than a high-level game agent. In this paper, we propose a method for learning and controlling the level of play of game agents that can be rapidly developed and fine-tuned for various game environments and changes. At this time, reinforcement learning applies a policy-based distributed reinforcement learning method IMPALA for flexible processing and fast learning of various behavioral structures. Once reinforcement learning is complete, we choose actions by sampling based on Softmax-Temperature method. From this result, we show that the game agent's play level decreases as the Temperature value increases. This shows that it is possible to easily control the play level.

게임 플레이를 위한 행동 주체인 에이전트는 게임 만족도를 높일 수 있는 중요한 요소이다. 하지만 다양한 게임 난이도와 게임 환경, 여러 플레이어를 위한 게임 에이전트 개발에는 많은 시간과 노력이 필요하다. 또한 캐릭터 추가나 업데이트와 같은 게임 환경 변화가 일어나면 새로운 게임 에이전트의 개발이 필요하고, 개발 난이도는 점차 높아진다는 단점이 존재한다. 이와 함께 다양한 플레이어의 수준에 맞는 세분화된 게임 에이전트 역시 중요하다. 단순히 강한 게임 에이전트보다는 세분화된 수준의 게임 플레이가 가능한 게임 에이전트가 활용성이 높고, 플레이어에 대한 만족도를 높일 수 있기 때문이다. 본 논문에서는 카드형 대전 게임을 대상으로 빠른 게임 에이전트 학습과 세분화된 플레이 수준 조절이 가능한 방법을 제안한다. 제안된 방법은 먼저 행동 구성에 대한 높은 자유도와 멀티 에이전트 환경에서의 빠른 학습을 위해 정책(Policy) 기반 분산형 강화학습 방법 중 하나인 IMPALA를 적용한다. 세분화된 플레이 수준 조절은 Temperature-Softmax를 통해 얻은 행동별 확률 값의 샘플링을 통해 수행한다. 논문에서는 Temperature 값의 증가에 따라 게임 에이전트의 플레이 수준이 낮아지는 결과와 이 수치를 다변화하여 손쉽게 다양한 플레이 수준 조절이 가능함을 확인하였다.

Keywords

References

  1. Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602, 2013. 
  2. Mnih, Volodymyr, et al. "Asynchronous methods for deep reinforcement learning." International conference on machine learning. PMLR, pp. 1928-1937, 2016. 
  3. Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347, 2017. 
  4. Haarnoja, Tuomas, et al. "Soft actor-critic algorithms and applications." arXiv preprint arXiv:1812.05905, 2018 
  5. Fujimoto, Scott, Herke Hoof, and David Meger. "Addressing function approximation error in actor-critic methods." International conference on machine learning. PMLR, pp. 1587-1596, 2018. 
  6. Kapturowski, Steven, et al. "Recurrent experience replay in distributed reinforcement learning." International conference on learning representations, 2018. 
  7. Espeholt, Lasse, et al. "Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures." International conference on machine learning. PMLR, pp. 1407-1416, 2018. 
  8. Berner, Christopher, et al. "Dota 2 with large scale deep reinforcement learning." arXiv preprint arXiv:1912.06680, 2019. 
  9. Vinyals, Oriol, et al. "Grandmaster level in StarCraft II using multi-agent reinforcement learning." Nature pp.350-354, 2019. 
  10. Dossa, Rousslan Fernand Julien, et al. "A human-like agent based on a hybrid of reinforcement and imitation learning." 2019 International Joint Conference on Neural Networks, 2019. 
  11. Ho, Seng-Beng, et al. "On human-like performance artificial intelligence: A demonstration using an Atari game." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2019. 
  12. Christiano, Paul, et al. "Deep reinforcement learning from human preferences." Advances in neural information processing systems, 2017. 
  13. Introduction for Marble Snap One of Mobile Battel Game(2023). https://www.marvelsnap.com (accessed Nov., 17, 2023). 
  14. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016. 
  15. Silver, David, et al. "Mastering chess and shogi by self-play with a general reinforcement learning algorithm." arXiv preprint arXiv:1712.01815, 2017.