DOI QR코드

DOI QR Code

Deep Q-Network based Game Agents

심층 큐 신경망을 이용한 게임 에이전트 구현

  • Received : 2019.03.15
  • Accepted : 2019.06.18
  • Published : 2019.08.30

Abstract

The video game Tetris is one of most popular game and it is well known that its game rule can be modelled as MDP (Markov Decision Process). This paper presents a DQN (Deep Q-Network) based game agent for Tetris game. To this end, the state is defined as the captured image of the Tetris game board and the reward is designed as a function of cleared lines by the game agent. The action is defined as left, right, rotate, drop, and their finite number of combinations. In addition to this, PER (Prioritized Experience Replay) is employed in order to enhance learning performance. To train the network more than 500000 episodes are used. The game agent employs the trained network to make a decision. The performance of the developed algorithm is validated via not only simulation but also real Tetris robot agent which is made of a camera, two Arduinos, 4 servo motors, and artificial fingers by 3D printing.

Keywords

References

  1. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, 2015. https://doi.org/10.1038/nature14236
  2. E. Demaine, S. Hohenberger, and D. Liben-Nowell, "Tetris is hard, even to approximate," In Proceedings of the Ninth International Computing and Combinatorics Conference, pp. 351-363, 2003.
  3. I. Szita and A. Lorincz, "Learning Tetris using the noisy cross-entropy method," Neural Computation, vol. 18, no. 12, pp. 2936-2941, 2006. https://doi.org/10.1162/neco.2006.18.12.2936
  4. V. Gabillon, M. Ghavamzadeh, and B. Scherrer, "Approximate dynamic programming finally performs well in the game of Tetris," In Advances in Neural Information Processing Systems, pp. 1754-1762, 2013.
  5. C. J. C. H. Watkins and P. Dayan, "Technical note Q-learning," Journal of Machine Learning, vol. 8, pp. 279-292, 1992.
  6. T. Mitchell, "Machine Learning," McGraw Hill, 1997.
  7. H. van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double Q-learning," In Proceedings of the Association for the Advancement of Artificial Intelligence, pp. 2094-2100, 2016.
  8. T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized experience replay," In Proceedings of the International Conference on Learning Representations, 2016 (arXiv:1511.05952).
  9. Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, "Dueling network architectures for deep reinforcement learning," In Proceedings of the International conference on Machine Learning, vol. 48, pp. 1995-2003, 2016.