DOI QR코드

DOI QR Code

Flight Trajectory Simulation via Reinforcement Learning in Virtual Environment

가상 환경에서의 강화학습을 이용한 비행궤적 시뮬레이션

  • Received : 2018.06.15
  • Accepted : 2018.09.27
  • Published : 2018.12.30

Abstract

The most common way to control a target point using artificial intelligence is through reinforcement learning. However, it had to process complicated calculations that were difficult to implement in order to process reinforcement learning. In this paper, the enhanced Proximal Policy Optimization (PPO) algorithm was used to simulate finding the planned flight trajectory to reach the target point in the virtual environment. In this paper, we simulated how this problem was used to find the planned flight trajectory to reach the target point in the virtual environment using the enhanced Proximal Policy Optimization(PPO) algorithm. In addition, variables such as changes in trajectory, effects of rewards, and external winds are added to determine the zero conditions of external environmental factors on flight trajectory learning, and the effects on trajectory learning performance and learning speed are compared. From this result, the simulation results have shown that the agent can find the optimal trajectory in spite of changes in the various external environments, which will be applicable to the actual vehicle.

인공지능을 이용하여 목표 지점까지 제어하는 가장 대표적인 방법은 강화학습이다. 하지만 그동안 강화학습을 처리하기 위해서는 구현하기 어렵고 복잡한 연산을 처리해야만 했다. 본 논문에서는 이를 개선한 Proximal Policy Optimization (PPO) 알고리즘을 이용하여 가상환경에서 목표지점에 도달하기 위한 계획된 비행궤적을 찾는 방법을 시뮬레이션 하였다. 또한 외부 환경요소가 비행궤적 학습에 미치는 영항을 알아보기 위하여 궤적의 변화, 보상 값의 영향 및 외부 바람등과 같은 변수를 추가하고 궤적 학습 성능 및 학습 속도에 미치는 영향을 비교 분석을 수행한다. 본 결과를 통하여 에이전트가 다양한 외부환경의 변화에도 계획된 궤적을 찾을 수 있다는 것을 시뮬레이션 결과에 따라 알 수 있었으며, 이는 실제 비행체에 적용할 수 있을 것이다.

Keywords

SMROBX_2018_v27n4_1_f0001.png 이미지

Fig. 1. configuration Objects

SMROBX_2018_v27n4_1_f0002.png 이미지

Fig. 2. Learning cycle

SMROBX_2018_v27n4_1_f0003.png 이미지

Fig. 3. simulation trajectories

SMROBX_2018_v27n4_1_f0004.png 이미지

Fig. 4. wind applied trajectory

SMROBX_2018_v27n4_1_f0005.png 이미지

Fig. 5. Test Result Graph

SMROBX_2018_v27n4_1_f0006.png 이미지

Fig. 6. Wind Applied Test Result Graph

Table 1. Hyperparameters of Unity ML-agent

SMROBX_2018_v27n4_1_t0001.png 이미지

Table 2. Training Statistics

SMROBX_2018_v27n4_1_t0002.png 이미지

Table 3. Simulation result

SMROBX_2018_v27n4_1_t0003.png 이미지

Table 4. Wind Applied Simulation result

SMROBX_2018_v27n4_1_t0004.png 이미지

References

  1. John Schulman, Sergey Levine, Philipp Moritz, Michael Jordan, Pieter Abbeel (2015), Trust Region Policy Optimization, arXiv:1502.05477v5 [cs.LG].
  2. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov (2017), Proximal Policy Optimization Algorithms, arXiv:1707.06347v2 [cs.LG] 28 Aug 2017.
  3. Vincent Pierre (2017), Unity ML-Agents, https://github.com/Unity-Technologies/ml-agents
  4. Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick (2017), Mask R-CNN, arXiv:1703.06870 [cs.CV]
  5. Jemin Hwangbo, Inkyu Sa, Roland Siegwart, Marco Hutter (2017), Control of a Quadrotor with Reinforcement Learning, arXiv:1707.05110v1 [cs.RO] https://doi.org/10.1109/LRA.2017.2720851
  6. Huy X. Pham, Hung. M. La, David Feil-Seifer, Luan V. Nguyen (2018), Autonomous UAV Navigation Using Reinforcement Learning, arXiv:1801.05086v1 [cs.RO]
  7. William Koch, Renato Mancuso, Richard West, Azer Bestavros, Reinforcement Learning for UAV Attitude Control, arXiv:1804.04154v1 [cs.RO]
  8. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, Mastering the game of go with deep neural networks and tree search, Nature, vol. 529, no. 7587, pp. 484-489, 2016 https://doi.org/10.1038/nature16961
  9. Huy X. Pham, Hung. M. La, David Feil-Seifer, Luan V. Nguyen, Autonomous UAV Navigation Using Reinforcement Learning, arXiv:1801.05086v1 [cs.RO]
  10. 김성필 (2016), 딥러닝 첫걸음, 한빛미디어, 서울, pp. 17-33.