DOI QR코드

DOI QR Code

Implementation of End-to-End Training of Deep Visuomotor Policies for Manipulation of a Robotic Arm of Baxter Research Robot

백스터 로봇의 시각기반 로봇 팔 조작 딥러닝을 위한 강화학습 알고리즘 구현

  • Received : 2018.12.13
  • Accepted : 2019.01.10
  • Published : 2019.02.28

Abstract

Reinforcement learning has been applied to various problems in robotics. However, it was still hard to train complex robotic manipulation tasks since there is a few models which can be applicable to general tasks. Such general models require a lot of training episodes. In these reasons, deep neural networks which have shown to be good function approximators have not been actively used for robot manipulation task. Recently, some of these challenges are solved by a set of methods, such as Guided Policy Search, which guide or limit search directions while training of a deep neural network based policy model. These frameworks are already applied to a humanoid robot, PR2. However, in robotics, it is not trivial to adjust existing algorithms designed for one robot to another robot. In this paper, we present our implementation of Guided Policy Search to the robotic arms of the Baxter Research Robot. To meet the goals and needs of the project, we build on an existing implementation of Baxter Agent class for the Guided Policy Search algorithm code using the built-in Python interface. This work is expected to play an important role in popularizing robot manipulation reinforcement learning methods on cost-effective robot platforms.

Keywords

References

  1. D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, "Mastering the game of go with deep neural networks and tree search," Nature, vol. 529, no. 7587, pp. 484-489, January, 2016. https://doi.org/10.1038/nature16961
  2. D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. Van Den Driessche, T. Graepel, and D. Hassabis, "Mastering the game of go without human knowledge," Nature, vol. 550, no. 7676, pp. 354-359, October, 2017. https://doi.org/10.1038/nature24270
  3. K. Lee, S.-A. Kim, J. Choi, S.-W. Lee, "Deep reinforcement learning in continuous action spaces: a case study in the game of simulated curling," 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, pp. 2937-2946, 2018.
  4. K. Arulkumaran, M.P. Deisenroth, M. Brundage, and A.A. Bharath, "Deep reinforcement learning: A brief survey," IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26-38, November, 2017. https://doi.org/10.1109/MSP.2017.2743240
  5. J. Kober and J. Peters, "Reinforcement learning in robotics: A survey," Learning Motor Skills, Springer, 2014, ch. 2, pp. 9-67.
  6. V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, February, 2015. https://doi.org/10.1038/nature14236
  7. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous control with deep reinforcement learning," arXiv:1509.02971 [cs.LG], 2015.
  8. S. Levine and V. Koltun, "Guided policy search," 30th International Conference on Machine Learning (ICML), Atlanta, Georgia, USA, pp. 1-9, 2013.
  9. C. Finn, Guided policy search, [Online], https://github.com/cbfinn/gps, Accessed: January 14, 2019.
  10. P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, "Deep reinforcement learning that matters," arXiv:1709.06560 [cs.LG], 2017.
  11. C. Finn, X.Y. Tan, Y. Duan, T. Darrell, S. Levine, and P. Abbeel, "Deep spatial autoencoders for visuomotor learning," 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, pp. 512-519, 2016.
  12. Y. Tsurumine, Y. Cui, E. Uchibe, and T. Matsubara, "Deep dynamic policy programming for robot control with raw images," 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, pp. 1545-1550, 2017.
  13. Y. Chebotar, K. Hausman, M. Zhang, G. Sukhatme, S. Schaal, and S. Levine, "Combining model-based and model-free updates for trajectory-centric reinforcement learning," 34th International Conference on Machine Learning (ICML), Sydney, Australia, pp. 703-711, 2017.
  14. S. Levine, C. Finn, T. Darrell, and P. Abbeel, "End-to-end training of deep visuomotor policies," Journal of Machine Learning Research (JMLR), vol. 17, no. 39, pp. 1-40, January, 2016.
  15. H. Wang and A. Banerjee, "Bregman alternating direction method of multipliers," Advances in Neural Information Processing Systems (NIPS), Montreal, Canada, pp. 2816-2824, 2014.
  16. W. Montgomery and S. Levine, "Guided policy search via approximate mirror descent," Advances in Neural Information Processing Systems (NIPS), Barcelona, Spain, pp. 4008-4016, 2016.

Cited by

  1. The DIAMOND Model: Deep Recurrent Neural Networks for Self-Organizing Robot Control vol.14, 2019, https://doi.org/10.3389/fnbot.2020.00062