DOI QR코드

DOI QR Code

Comparing State Representation Techniques for Reinforcement Learning in Autonomous Driving

자율주행 차량 시뮬레이션에서의 강화학습을 위한 상태표현 성능 비교

  • 안지환 (한양대학교 컴퓨터소프트웨어학과) ;
  • 권태수 (한양대학교 컴퓨터소프트웨어학과)
  • Received : 2024.06.15
  • Accepted : 2024.07.05
  • Published : 2024.07.25

Abstract

Research into vision-based end-to-end autonomous driving systems utilizing deep learning and reinforcement learning has been steadily increasing. These systems typically encode continuous and high-dimensional vehicle states, such as location, velocity, orientation, and sensor data, into latent features, which are then decoded into a vehicular control policy. The complexity of urban driving environments necessitates the use of state representation learning through networks like Variational Autoencoders (VAEs) or Convolutional Neural Networks (CNNs). This paper analyzes the impact of different image state encoding methods on reinforcement learning performance in autonomous driving. Experiments were conducted in the CARLA simulator using RGB images and semantically segmented images captured by the vehicle's front camera. These images were encoded using VAE and Vision Transformer (ViT) networks. The study examines how these networks influence the agents' learning outcomes and experimentally demonstrates the role of each state representation technique in enhancing the learning efficiency and decision- making capabilities of autonomous driving systems.

딥러닝과 강화학습을 활용한 비전 기반 엔드투엔드 자율주행 시스템 관련 연구가 지속적으로 증가하고 있다. 일반적으로 이러한 시스템은 위치, 속도, 방향, 센서 데이터 등 연속적이고 고차원적인 차량의 상태를 잠재 특징 벡터로 인코딩하고, 이를 차량의 주행 정책으로 디코딩하는 두 단계로 구성된다. 도심 주행과 같이 다양하고 복잡한 환경에서는 Variational Autoencoder(VAE)나 Convolutional Neural Network(CNN)과 같은 네트워크를 이용한 효율적인 상태 표현 방법의 필요성이 더욱 부각된다. 본 논문은 차량의 이미지 상태 표현이 강화학습 성능에 미치는 영향을 분석하였다. CARLA 시뮬레이터 환경에서 실험을 수행하였고, 차량의 전방 카메라 센서로부터 취득한 RGB 이미지 및 Semantic Segmented 이미지를 각각 VAE와 Vision Transformer(ViT) 네트워크로 특징 추출하여 상태 표현 학습에 활용하였다. 이러한 방법론이 강화학습에 미치는 영향을 실험하여, 데이터 유형과 상태 표현 기법이 자율주행의 학습 효율성과 결정 능력 향상에 어떤 역할을 하는지를 실험하였다.

Keywords

References

  1. C. Chen, A. Seff, A. Kornhauser, and J. Xiao, "Deepdriving: Learning affordance for direct perception in autonomous driving," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 2722-2730. 
  2. J. Chen, W. Zhan, and M. Tomizuka, "Autonomous driving motion planning with constrained iterative lqr," IEEE Transactions on Intelligent Vehicles, vol. 4, no. 2, pp. 244-254, 2019.  https://doi.org/10.1109/TIV.2019.2904385
  3. B. Paden, M. Cap, S. Z. Yong, D. Yershov, and E. Frazzoli, "A survey of motion planning and control techniques for self- driving urban vehicles," IEEE Transactions on intelligent vehicles, vol. 1, no. 1, pp. 33-55, 2016.  https://doi.org/10.1109/TIV.2016.2578706
  4. F. Codevilla, M. Muller, A. Lopez, V. Koltun, and A. Dosovitskiy, "End-to-end driving via conditional imitation learning," in 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 2018, pp. 4693-4700. 
  5. M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller,J. Zhang, et al., "End to end learning for self-driving cars," arXiv preprint arXiv:1604.07316, 2016. 
  6. R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018. 
  7. D. P. Kingma and M. Welling, "Auto-encoding variational bayes," arXiv preprint arXiv:1312.6114, 2013. 
  8. I. Higgins, A. Pal, A. Rusu, L. Matthey, C. Burgess,A. Pritzel, M. Botvinick, C. Blundell, and A. Lerchner, "Darla: Improving zeroshot transfer in reinforcement learning," in International Conference on Machine Learning. PMLR, 2017, pp. 1480-1490. 
  9. A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J.-M. Allen, V.-D. Lam, A. Bewley, and A. Shah, "Learning to drive in a day," in 2019 international conference on robotics and automation (ICRA). IEEE, 2019, pp. 8248-8254. 
  10. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020. 
  11. A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, "Carla: An open urban driving simulator," in Conference on robot learning. PMLR, 2017, pp. 1-16. 
  12. D. A. Pomerleau, "Alvinn: An autonomous land vehicle in a neural network," Advances in neural information processing systems, vol. 1, 1988. 
  13. D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi, "Dream to control: Learning behaviors by latent imagination," arXiv preprint arXiv:1912.01603, 2019. 
  14. D. Chen, B. Zhou, V. Koltun, and P. Krahenbuhl, "Learning by cheating," in Conference on Robot Learning. PMLR, 2020, pp. 66-75. 
  15. Y. Pan, C.-A. Cheng, K. Saigol, K. Lee, X. Yan, E. Theodorou, and B. Boots, "Agile autonomous driving using end-to-end deep imitation learning," arXiv preprint arXiv:1709.07174, 2017. 
  16. B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Al Sallab, S. Yogamani, and P. Perez, "Deep reinforcement learning for autonomous driving: A survey," IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 4909-4926, 2021. 
  17. L. Chen, P. Wu, K. Chitta, B. Jaeger, A. Geiger, and H. Li, "End-to-end autonomous driving: Challenges and frontiers," arXiv preprint arXiv:2306.16927, 2023. 
  18. V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, "Asynchronous methods for deep reinforcement learning," in International conference on machine learning. PMLR, 2016, pp. 1928- 1937. 
  19. X. Liang, T. Wang, L. Yang, and E. Xing, "Cirl: Controllable imitative reinforcement learning for vision-based self- driving," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 584-599. 
  20. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015. 
  21. J. Chen, B. Yuan, and M. Tomizuka, "Model-free deep reinforcement learning for urban autonomous driving," in 2019 IEEE intelligent transportation systems conference (ITSC). IEEE, 2019, pp. 2765-2771. 
  22. H. Van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," in Proceedings of the AAAI conference on artificial intelligence, vol. 30, no. 1, 2016. 
  23. S. Fujimoto, H. Hoof, and D. Meger, "Addressing function approximation error in actor-critic methods," in International conference on machine learning. PMLR, 2018, pp. 1587- 1596. 
  24. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor," in International conference on machine learning. PMLR, 2018, pp. 1861-1870. 
  25. E. Kargar and V. Kyrki, "Vision transformer for learning driving policies in complex multi-agent environments," arXiv preprint arXiv:2109.06514, 2021. 
  26. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., "Human-level control through deep reinforcement learning," nature, vol. 518, no. 7540, pp. 529-533, 2015.  https://doi.org/10.1038/nature14236
  27. N. Xu, B. Tan, and B. Kong, "Autonomous driving in reality with reinforcement learning and image translation," arXiv preprint arXiv:1801.05299, 2018. 
  28. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017. 
  29. Q. Khan, T. Schon, and P. Wenzel, "Latent space reinforcement learning for steering angle prediction," arXiv preprint arXiv:1902.03765, 2019.