Q-Learning Policy Design to Speed Up Agent Training

Yong, Sung-jung;Park, Hyo-gyeong;You, Yeon-hwi;Moon, Il-young;

doi:10.14702/JPEE.2022.219

실천공학교육논문지 (Journal of Practical Engineering Education)

제14권1호
/
Pages.219-224
/
2022
/
2288-405X(pISSN)
/
2288-4068(eISSN)

한국실천공학교육학회 (Korean Institute for Pratical Engineering Education)

DOI QR Code

에이전트 학습 속도 향상을 위한 Q-Learning 정책 설계

Q-Learning Policy Design to Speed Up Agent Training

용성중 (한국기술교육대학교 컴퓨터공학과) ;
박효경 (한국기술교육대학교 컴퓨터공학과) ;
유연휘 (한국기술교육대학교 컴퓨터공학과) ;
문일영 (한국기술교육대학교 컴퓨터공학과)

Yong, Sung-jung (Department of Computer Science and Engineering, Korea University of Technology and Education) ;
Park, Hyo-gyeong (Department of Computer Science and Engineering, Korea University of Technology and Education) ;
You, Yeon-hwi (Department of Computer Science and Engineering, Korea University of Technology and Education) ;
Moon, Il-young (Department of Computer Science and Engineering, Korea University of Technology and Education)

투고 : 2022.03.23
심사 : 2022.04.11
발행 : 2022.04.30

https://doi.org/10.14702/JPEE.2022.219 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

강화학습의 기본적인 알고리즘으로 많이 사용되고 있는 Q-Learning은 현재 상태에서 취할 수 있는 행동의 보상 중 가장 큰 값을 선택하는 Greedy action을 통해 보상을 최대화하는 방향으로 에이전트를 학습시키는 기법이다. 본 논문에서는 Frozen Lake 8*8 그리드 환경에서 Q-Learning을 사용하여 에이전트의 학습 속도를 높일 수 있는 정책에 관하여 연구하였다. 또한, Q-learning 의 기존 알고리즘과 에이전트의 행동에 '방향성'이라는 속성을 부여한 알고리즘의 학습 결과 비교를 진행하였다. 결과적으로, 본 논문에서 제안한 Q-Learning 정책이 통상적인 알고리즘보다 정확도와 학습 속도 모두 크게 높일 수 있는 것을 분석되었다.

Q-Learning is a technique widely used as a basic algorithm for reinforcement learning. Q-Learning trains the agent in the direction of maximizing the reward through the greedy action that selects the largest value among the rewards of the actions that can be taken in the current state. In this paper, we studied a policy that can speed up agent training using Q-Learning in Frozen Lake 8×8 grid environment. In addition, the training results of the existing algorithm of Q-learning and the algorithm that gave the attribute 'direction' to agent movement were compared. As a result, it was analyzed that the Q-Learning policy proposed in this paper can significantly increase both the accuracy and training speed compared to the general algorithm.

키워드

과제정보

본 연구는 2021년도 교육부의 재원으로 한국연구재단의 지원을 받아 수행된 지자체-대학 협력 기반 지역혁신 사업의 결과입니다(2021RIS-004).

참고문헌

X. Wang, L. Jin, and H. Wei, "The shortest path planning based on reinforcement learning," Journal of Physics: Conference Series, vol. 1584, 012006, 2020. https://doi.org/10.1088/1742-6596/1584/1/012006
R. S. Sutton and A. G. Barto, "Reinforcement learning: an introduction," MIT Press Cambridge, vol. 135, 1998.
C. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, pp. 279-292, May 1992.
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing atari with deep reinforcement learning," Proceeding of the 2013 Conference on Neural Information Processing Systems Deep Learning Workshop, California: USA, 2013.
J. Clifton and E. Laber, "Q-learning: theory and applications", Annual Review of Statistics and Its Application, vol. 7, pp. 279-301, 2020. https://doi.org/10.1146/annurev-statistics-031219-041220
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, "OpenAI Gym," Jun. 2016, arXiv [Online]. Available: https://arxiv.org/ abs/1606.01540v1.

실천공학교육논문지 (Journal of Practical Engineering Education)

에이전트 학습 속도 향상을 위한 Q-Learning 정책 설계

Q-Learning Policy Design to Speed Up Agent Training

초록

키워드

과제정보

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)