Actor-Critic Reinforcement Learning System with Time-Varying Parameters

  • Obayashi, Masanao (Department of Computer Science and Systems Engineering, Yamaguchi University) ;
  • Umesako, Kosuke (Department of Computer Science and Systems Engineering, Yamaguchi University) ;
  • Oda, Tazusa (Department of Computer Science and Systems Engineering, Yamaguchi University) ;
  • Kobayashi, Kunikazu (Department of Computer Science and Systems Engineering, Yamaguchi University) ;
  • Kuremoto, Takashi (Department of Computer Science and Systems Engineering, Yamaguchi University)
  • Published : 2003.10.22

Abstract

Recently reinforcement learning has attracted attention of many researchers because of its simple and flexible learning ability for any environments. And so far many reinforcement learning methods have been proposed such as Q-learning, actor-critic, stochastic gradient ascent method and so on. The reinforcement learning system is able to adapt to changes of the environment because of the mutual action with it. However when the environment changes periodically, it is not able to adapt to its change well. In this paper we propose the reinforcement learning system that is able to adapt to periodical changes of the environment by introducing the time-varying parameters to be adjusted. It is shown that the proposed method works well through the simulation study of the maze problem with aisle that opens and closes periodically, although the conventional method with constant parameters to be adjusted does not works well in such environment.

Keywords