Flexible Labeling Mechanism in LQ-learning for Maze Problems

  • Published : 2001.10.01

Abstract

Recently, Reinforcement Learning (RL) methods in MDP have been extended and applied to the POMDP problems. Currently, hierarchical RL methods are widely studied. However, they have the drawback that the learning time and memories are exhausted only for keeping the hierarchical structure, though they aren´t necessary. On the other hand, our "Labeling Q-learning (LQ-learning) proposed previously, has no hierarchical structure, but adopts a characteristic internal memory mechanism. Namely, LQ-1earning agent percepts the state by pair of observation and its label, and the agent can distinguish states, which look as same, but obviously different, more exactly. So to speak, at each step t, we define a new type of perception of its environment ~ot = (ot, $\theta$t), where of is conventional observation, and $\theta$t is the label attached to the observation. Then the conventional ...

Keywords