Credit-Assigned-CMAC-based Reinforcement Learn ing with Application to the Acrobot Swing Up Control Problem

Acrobot Swing Up Control을 위한 Credit-Assigned-CMAC-based 강화학습

  • 장시영 (한양대학교 전자전기제어계측공학부) ;
  • 신연용 (한양대학교 전자전기제어계측공학) ;
  • 서승환 (한양대학교 메카트로닉스공학) ;
  • 서일홍 (한양대학교 정보통신대학원)
  • Published : 2004.07.01

Abstract

For real world applications of reinforcement learning techniques, function approximation or generalization will be required to avoid curse of dimensionality. For this, an improved function approximation-based reinforcement teaming method is proposed to speed up convergence by using CA-CMAC(Credit-Assigned Cerebellar Model Articulation Controller). To show that our proposed CACRL(CA-CMAC-based Reinforcement Learning) performs better than the CRL(CMAC- based Reinforcement Learning), computer simulation and experiment results are illustrated, where a swing-up control Problem of an acrobot is considered.

Keywords

References

  1. R. M. Murray and J. Hauser, 'A Case Study in Approximate Linerization: The Acrobot Example,' Electronics Research Lab. College of Engineering, University of California, Berkely, April 1991
  2. R. S. Sutton and A. G. Barto, 'Reinforcement Learning, An Introduction,' Cambridge, MA: MIT Press, 1998
  3. M. W. Spong, 'The swing up control problem for the acrobot,' IEEE Control Systems Magazine, vol. 15, pp. 49-55, feb. 1995 https://doi.org/10.1109/37.341864
  4. J. A. Boyan and A. W. Moore, 'Generalization in Reinforcement Learning: Safely Approximation the Value Function,' NIPS-7, San Mateo, CA: Morgan Kaufmann, 1995
  5. G. Boone, 'Minimum-time control of the acrobot,' International Conference on Robotics and Automation, pp. 3281-3287, 1997 https://doi.org/10.1109/ROBOT.1997.606789
  6. R. S. Sutton, 'Generalization in reinforcement learning: successful examples using sparse coarse coding,' Neural Information Proceedings Systems 8, pp. 1038-1044, MIT Press, 1996
  7. S. Su, T. Tao and T. Hung, 'Credit Assigned CMAC and Its Application to Online Learning Robust Controllers,' IEEE Transaction on Systems, Man, and Cybernetics Part B:Cybernetics, vol. 33, no. 2, Apirl 2003 https://doi.org/10.1109/TSMCB.2003.810447
  8. A. D. Luca and G. Oriolo, 'Stabilization of the Acrobot via Iterative State Steering,' Proceeding of the 1998 IEEE International Conference on Robotics & Automation, Leuven, Belgium, May 1998 https://doi.org/10.1109/ROBOT.1998.681023
  9. S. C. Brown and K. M. Passino, 'Intelligent Control of the Acrobot,' Journal of Intelligent and Robotics Systems 18: 209-248, 1997 https://doi.org/10.1023/A:1007953809856
  10. I.H.Suh, J.H.Kim, J.S.Ryoo, Y.J.Cho, and Y.K.Chung, 'Region-based Q-Learning using Convex Clustering Approach,' Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp601-607, 1997 https://doi.org/10.1109/IROS.1997.655073
  11. D. Ormoneit and P. Glynn, 'Kernel-Based Reinforcement Learning in Average-Cost Problems,' IEEE Transaction on Automatic Control, Vol. 47, No. 10, October, 2002 https://doi.org/10.1109/TAC.2002.803530