Visual Object Manipulation Based on Exploration Guided by Demonstration

Kim, Doo-Jun;Jo, HyunJun;Song, Jae-Bok;

doi:10.7746/jkros.2022.17.1.040

로봇학회논문지 (The Journal of Korea Robotics Society)

제17권1호
/
Pages.40-47
/
2022
/
1975-6291(pISSN)
/
2287-3961(eISSN)

한국로봇학회 (Korea Robotics Society)

DOI QR Code

시연에 의해 유도된 탐험을 통한 시각 기반의 물체 조작

Visual Object Manipulation Based on Exploration Guided by Demonstration

Kim, Doo-Jun (Mechanical Engineering, Korea University) ;
Jo, HyunJun (Mechanical Engineering, Korea University) ;
Song, Jae-Bok (Mechanical Engineering, Korea University)

투고 : 2021.12.07
심사 : 2022.01.24
발행 : 2022.02.28

https://doi.org/10.7746/jkros.2022.17.1.040 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

A reward function suitable for a task is required to manipulate objects through reinforcement learning. However, it is difficult to design the reward function if the ample information of the objects cannot be obtained. In this study, a demonstration-based object manipulation algorithm called stochastic exploration guided by demonstration (SEGD) is proposed to solve the design problem of the reward function. SEGD is a reinforcement learning algorithm in which a sparse reward explorer (SRE) and an interpolated policy using demonstration (IPD) are added to soft actor-critic (SAC). SRE ensures the training of the critic of SAC by collecting prior data and IPD limits the exploration space by making SEGD's action similar to the expert's action. Through these two algorithms, the SEGD can learn only with the sparse reward of the task without designing the reward function. In order to verify the SEGD, experiments were conducted for three tasks. SEGD showed its effectiveness by showing success rates of more than 96.5% in these experiments.

키워드

과제정보

This work was supported by IITP grant funded by the Korea Government MSIT. (No. 2018-0-00622)

참고문헌

S. Gu, E. Holly, T. Lillicrap, and S. Levine, "Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates," 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 2017, DOI: 10.1109/ICRA.2017.7989385.
A. Y. Ng and S. Russell, "Algorithms for inverse reinforcement learning," 17th International Conference on Machine Learning (ICML), 2000, [Online], https://ai.stanford.edu/~ang/papers/icml00-irl.pdf.
G. Schoettler, A. Nair, J. Luo, J. Luo, S. Bahl, J. A. Ojea, E. Solowjow, and S. Levine, "Deep reinforcement learning for industrial insertion tasks with visual inputs and natural rewards," International Conference on Intelligent Robots and Systems, 2019, [Online], https://arxiv.org/pdf/1906.05841.pdf.
X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne, "Deepmimic: Example-guided deep reinforcement learning of physics-based character skills," ACM SIGGRAPH, 2018, [Online], https://dl.acm.org/doi/pdf/10.1145/3197517.3201311.
A. Rajeswaran, V. Kumar, A. Gupta, G. Vezzani, J. Schulman, E. Todorov, and S. Levine, "Learning complex dexterous manipulation with deep reinforcement learning and demonstrations," arXiv:1709.10087v2 [cs.LG], 2018, [Online], https://arxiv.org/pdf/1709.10087.pdf.
T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, A. Sendonaris, G. Dulac-Arnold, I. Osband, J. Agapiou, J. Z. Leibo, and A. Gruslys, "Learning from demonstrations for real world reinforcement learning," arXiv preprint arXiv:1704.03732, 2017, [Online], https://openreview.net/forum?id=5AvKPhXBsz.
A. Nair, B. McGrew, M. Andrychowicz, W. Zaremba, and P. Abbeel, "Overcoming exploration in reinforcement learning with demonstrations," 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 2017, DOI: 10.1109/ICRA.2018.8463162.
A. Singh, L. Yang, K. Hartikainen, C. Finn, and S. Levine, "End-toEnd Robotic Reinforcement Learning without Reward Engineering," arXiv:1904.07854 [cs.LG], 2019. [Online], https://arxiv.org/pdf/1904.07854.pdf.
P. Sermanet, K. Xu, and S. Levine, "Unsupervised perceptual rewards for imitation learning," arXiv:1612.06699v3 [cs.CV], 2017, [Online], https://arxiv.org/pdf/1612.06699.pdf.
L. Smith, N. Dhawan, M. Zhang, P. Abbeel, and S. Levine, "AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos," arXiv:1912.04443v3 [cs.RO], 2019, [Online], https://arxiv.org/pdf/1912.04443.pdf.
T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor," Machine Learning Research, pp. 1861-1870, 2018, [Online], https://proceedings.mlr.press/v80/haarnoja18b.
S. Schaal, A. Ijspeert, and A. Billard, "Computational approaches to motor learning by imitation," Philosophical Transactions: Biological Sciences, 2003, DOI: 10.1098/rstb.2002.1258.
C. P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Watters, G. Desjardins, and A. Lerchner, "Understanding disentangling in beta-vae," arXiv:1804.03599v1 [stat.ML], 2017, [Online], https://arxiv.org/pdf/1804.03599.pdf.
A. Kumar, J. Fu, M. Soh, G. Tucker, and S. Levine, "Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction," Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2019, [Online], https://proceedings.neurips.cc/paper/2019/hash/c2073ffa77b5357a498057413bb09d3a-Abstract.html.

로봇학회논문지 (The Journal of Korea Robotics Society)

시연에 의해 유도된 탐험을 통한 시각 기반의 물체 조작

Visual Object Manipulation Based on Exploration Guided by Demonstration

초록

키워드

과제정보

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)