DOI QR코드

DOI QR Code

Evaluation of Human Demonstration Augmented Deep Reinforcement Learning Policies via Object Manipulation with an Anthropomorphic Robot Hand

휴먼형 로봇 손의 사물 조작 수행을 이용한 사람 데모 결합 강화학습 정책 성능 평가

  • 박나현 (경희대학교 전자정보융합공학과) ;
  • 오지헌 (경희대학교 전자정보융합공학과) ;
  • 류가현 (경희대학교 전자정보융합공학과) ;
  • ;
  • ;
  • 김태성 (경희대학교 생체의공학과 및 전자정보융합공학과)
  • Received : 2020.12.18
  • Accepted : 2021.02.16
  • Published : 2021.05.31

Abstract

Manipulation of complex objects with an anthropomorphic robot hand like a human hand is a challenge in the human-centric environment. In order to train the anthropomorphic robot hand which has a high degree of freedom (DoF), human demonstration augmented deep reinforcement learning policy optimization methods have been proposed. In this work, we first demonstrate augmentation of human demonstration in deep reinforcement learning (DRL) is effective for object manipulation by comparing the performance of the augmentation-free Natural Policy Gradient (NPG) and Demonstration Augmented NPG (DA-NPG). Then three DRL policy optimization methods, namely NPG, Trust Region Policy Optimization (TRPO), and Proximal Policy Optimization (PPO), have been evaluated with DA (i.e., DA-NPG, DA-TRPO, and DA-PPO) and without DA by manipulating six objects such as apple, banana, bottle, light bulb, camera, and hammer. The results show that DA-NPG achieved the average success rate of 99.33% whereas NPG only achieved 60%. In addition, DA-NPG succeeded grasping all six objects while DA-TRPO and DA-PPO failed to grasp some objects and showed unstable performances.

로봇이 사람과 같이 다양하고 복잡한 사물 조작을 하기 위해서는 휴먼형 로봇 손의 사물 파지 작업이 필수적이다. 자유도 (Degree of Freedom, DoF)가 높은 휴먼형(anthropomorphic) 로봇 손을 학습시키기 위하여 사람 데모(human demonstration)가 결합한 강화학습 최적화 방법이 제안되었다. 본 연구에서는 강화학습 최적화 방법에 사람 데모가 결합한 Demonstration Augmented Natural Policy Gradient (DA-NPG)와 NPG의 성능 비교를 통하여 행동 복제의 효율성을 확인하고, DA-NPG, DA-Trust Region Policy Optimization (DA-TRPO), DA-Proximal Policy Optimization (DA-PPO)의 최적화 방법의 성능 평가를 위하여 6 종의 물체에 대한 휴먼형 로봇 손의 사물 조작 작업을 수행한다. 학습 후 DA-NPG와 NPG를 비교한 결과, NPG의 물체 파지 성공률은 평균 60%, DA-NPG는 평균 99.33%로, 휴먼형 로봇 손의 사물 조작 강화학습에 행동 복제가 효율적임을 증명하였다. 또한, DA-NPG는 DA-TRPO와 유사한 성능을 보이면서 모든 물체에 대한 사물 파지에 성공하였고 가장 안정적이었다. 반면, DA-TRPO와 DA-PPO는 사물 조작에 실패한 물체가 존재하여 불안정한 성능을 보였다. 본 연구에서 제안하는 방법은 향후 실제 휴먼형 로봇에 적용하여 휴먼형 로봇 손의 사물 조작 지능 개발에 유용할 것으로 전망된다.

Keywords

Acknowledgement

이 논문은 2019년도 정부(교육과학기술부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임(2019R1A2C1003713). 본 연구는 과학기술정보통신부 및 정보통신기획평가원의 디지털콘텐츠원천 기술개발사업의 연구결과로 수행되었음(IITP-2017-0-00655).

References

  1. S. Kakade, "Natural Policy Gradient," Neural Information Processing systems (NIPS), 14:1531-1538. 2001.
  2. J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. "Trust Region Policy Optimization," Proceedings of the 32nd International Conference on Machine Learning, PMLR 2015, 37: 1889-1897.
  3. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal Policy Optimization Algorithms," arXiv:1707.06347v2 [cs.LG].
  4. S. Gu, E. Holly, T. Lillicrap, S. Levine, "Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates," 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, pp.3389-3396, 2017.
  5. A. Rajeswaran, V. Kumar, A. Gupta, G. Vezzani, J. Schulman, E. Todorov, and S. Levine. "Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations," arXiv:1709.10087v2 [cs.LG]. 2018.
  6. A. Gupta, C. Eppner, S. Levine, and P. Abbeel, "Learning dexterous manipulation for a soft robotic hand from human demonstrations," 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, pp.3786-3793, 2016.
  7. C. Piazza, G. Grioli, M. G. Catalano, and A. Bicchi, "A century of robotic hands," Annual Review of Control, Robotics, and Autonomous Systems, Vol.2, pp.1-32, 2019. https://doi.org/10.1146/annurev-control-060117-105003
  8. Zhou, Jianshu, et al., "A soft-robotic approach to anthropomorphic robotic hand dexterity," IEEE Access, Vol.7, pp.101483-101495, 2019. https://doi.org/10.1109/access.2019.2929690
  9. C. Piazza, et al., "The SoftHand Pro-H: a hybrid body-controlled, electrically powered hand prosthesis for daily living and working," IEEE Robotics & Automation Magazine, Vol.24, No.4, pp.87-101, 2017.
  10. A. Kargov, et al., "Development of an anthropomorphic hand for a mobile assistive robot," IEEE In 9th International Conference on Rehabilitation Robotics, New York, 2005. pp.182-186.
  11. N. Correll, et al., "Analysis and observations from the first Amazon Picking Challenge," IEEE Transactions on Automation Science and Engineering, Vol.15, No.1, pp.172-188, 2018. https://doi.org/10.1109/tase.2016.2600527
  12. Kontoudis GP, Liarokapis MV, Zisimatos AG, Mavrogiannis CI, Kyriakopoulos KJ. "Opensource, anthropomorphic, underactuated robot hands with a selectively lockable differential mechanism:towards affordable prostheses," In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp.5857-5862. New York: IEEE. 2015.
  13. Jacobsen S, Iversen E, Knutti D, Johnson R, Biggers K. 1986. "Design of the Utah/M.I.T. dextrous hand." In 1986 IEEE. International Conference on Robotics and Automation, Vol.3, pp.1520-1532, New York: IEEE.
  14. Shadow Robot Co. 2018. Shadow Dexterous Hand. Shadow Robot Company. https://www.shadowrobot.com/products/dexterous-hand
  15. A. Firouzeh, and J. Paik, "Grasp mode and compliance control of an underactuated origami gripper using adjustable stiffness joints," IEEE/ASME Transactions on Mechatronics, Vol.22, No.5, pp.2165-2173, 2017. doi: 10.1109/TMECH.2017.2732827.
  16. Billard, Aude, and Danica Kragic, "Trends and challenges in robot manipulation," Science, 364.6446, 2019.
  17. Chao, Ya, Xingchen Chen, and Nanfeng Xiao,"Deep learning-based grasp-detection method for a five-fingered industrial robot hand," IET Computer Vision, Vol.13, No.1, pp.61-70, 2018. https://doi.org/10.1049/iet-cvi.2018.5002
  18. N. Kohl and P. Stone, "Policy gradient reinforcement learning for fast quadrupedal locomotion," IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004, New Orleans, LA, USA, 2004, pp. 2619-2624 Vol.3, doi: 10.1109/ROBOT.2004.1307456.
  19. A.Y. Ng, et al., "Autonomous Inverted Helicopter Flight via Reinforcement Learning," In: Ang M.H., Khatib O. (eds) Experimental Robotics IX. Springer Tracts in Advanced Robotics, Vol.21. Springer, Berlin, Heidelberg. 2006. https://doi.org/10.1007/11552246_35.
  20. V. Mnih, et al., "Human-level control through deep reinforcement learning," Nature, Vol.518, No.7540, pp.529-533, 2015. https://doi.org/10.1038/nature14236.
  21. D. Silver, et al., "Mastering the game of Go with deep neural networks and tree search," Nature, Vol.529, No.7587, pp.484-489, 2016. https://doi.org/10.1038/nature16961
  22. Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin A. Riedmiller, and David Silver, "Emergence of Locomotion Behaviours in Rich Environments," 2017. CoRR abs/1707.02286. arXiv:1707.02286
  23. E. Valarezo Anazco, et al., "Natural object manipulation using anthropomorphic robotic hand through deep reinforcement learning and deep grasping probability network," Applied Intelligence, Vol.51, No.2, pp.1041-1055, 2021. https://doi.org/10.1007/s10489-020-01870-6
  24. Edwin Valarezo Anazco, Patricio Rivera Lopez, Hyemin Park, Nahyeon Park, Jiheon Oh, Sangmin Lee, Kyungmin Byun, and Tae-Seong Kim. "Human-like Object Grasping and Relocation for an Anthropomorphic Robotic Hand with Natural Hand Pose Priors in Deep Reinforcement Learning," In Proceedings of the 2019 2nd International Conference on Robot Systems and Applications (ICRSA 2019). Association for Computing Machinery, New York, NY, USA, 46-50. DOI:https://doi.org/10.1145/3378891.3378900
  25. Gao, Yang, et al.,"Reinforcement learning from imperfect demonstrations," arXiv preprint arXiv:1802.05313. 2018.
  26. Vecerik, Mel, et al.,"Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards," arXiv preprint arXiv:1707.08817. 2017.
  27. A. Nair, B. McGrew, M. Andrychowicz, W. Zaremba, and P. Abbeel, "Overcoming Exploration in Reinforcement Learning with Demonstrations," 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, 2018, pp.6292-6299, doi: 10.1109/ICRA.2018.8463162.
  28. Hester, Todd, et al.,"Learning from demonstrations for real world reinforcement learning," 2017.
  29. Osa, Takayuki, Jan Peters, and Gerhard Neumann, "Hierarchical reinforcement learning of multiple grasping strategies with human instructions," Advanced Robotics, Vol.32, No.18, pp.955-968, 2018. https://doi.org/10.1080/01691864.2018.1509018
  30. Leap Motion [Internet], https://www.ultraleap.com/
  31. E. Todorov, T. Erez, and Y. Tassa, "MuJoCo: A physics engine for model-based control," 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, 2012, pp.5026-5033.
  32. V. Kumar, Z. Xu, and E. Todorov, "Fast, strong and compliant pneumatic actuation for dexterous tendon-driven hands," 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, pp.1512-1519, 2013.
  33. Ahmed Hussein, Mohamed Medhat Gaber, Eyad Elyan, and Chrisina Jayne, "Imitation Learning: A Survey of Learning Methods," ACM Computing Surveys, Vol.50, No.2, Article 21, pp.1-35, 2017.