DOI QR코드

DOI QR Code

Multi-Agent Reinforcement Learning based Swarm Drone using QPLEX and PER

  • Jin-Ho Ahn (Intelligence Software Team, Hanwha Systems Co., Ltd.) ;
  • Byung-In Choi (Intelligence Software Team, Hanwha Systems Co., Ltd.) ;
  • Tae-Young Lee (Intelligence Software Team, Hanwha Systems Co., Ltd.) ;
  • Hae-Moon Kim (Intelligence Software Team, Hanwha Systems Co., Ltd.) ;
  • Hyun-Hak Kim (Intelligence Software Team, Hanwha Systems Co., Ltd.)
  • Received : 2024.10.11
  • Accepted : 2024.11.01
  • Published : 2024.11.29

Abstract

With the advancement of unmanned aerial vehicle technology, swarm drones are increasingly being deployed across various domains, including disaster response and military operations, where their effectiveness is particularly pronounced. These swarm drones leverage real-time data sharing and decision-making capabilities to execute tactical missions, making collaborative behavior essential in complex battlefield environments. However, traditional rule-based behavior mechanisms face limitations as environmental complexity escalates. This paper explores the potential of applying multi-agent reinforcement learning (MARL) to swarm drone models and proposes strategies to enhance their mission success rates. By utilizing QPLEX and Prioritized Experience Replay (PER), we present methods aimed at improving learning efficiency. Validation through the SMACv2 simulator reveals that the proposed approach achieves faster learning convergence and higher mission success rates compared to existing MARL algorithms.

무인 항공기 기술의 발전으로 군집 드론은 재난 구조 및 군사 작전 등 다양한 분야에서 활용되고 있으며, 특히 군사 작전에서 그 효과가 두드러진다. 군집 드론은 실시간 데이터 공유와 의사결정 능력을 통해 효과적인 전술 작전을 수행할 수 있어, 복잡한 전장 환경에서의 협업 능력이 필수적이다. 그러나 기존의 규칙 기반 행동 매커니즘은 환경의 복잡성이 증가함에 따라 한계를 보인다. 이에 본 논문에서는 군집 드론 모델에 다중 에이전트 강화학습 적용 가능성을 검토하고 군집 드론의 임무 성공률을 향상시키기 위한 방안을 제안한다. QPLEX와 PER를 활용하여 학습 효율을 높이는 방법을 제안하였으며, SMACv2 시뮬레이터를 통해 검증한 결과, 제안된 방법은 기존 MARL 알고리즘보다 빠른 학습 수렴 속도와 높은 임무 성공률을 기록하였다.

Keywords

References

  1. Seong hyun Yoo, Chun ki Ahn, Jung hun Kim, "Technology and Development of Drones," The Korea Institute of Electrical Engineers, Vol. 66, No. 2, pp. 19-23, Feb 2017
  2. Seung ho Cheon, Kyung soo Kim, "Defense & Technology", Korea Defense Industry association, pp. 140-153, Jaunuary 2023.
  3. Tae Joon Park, Yerim Chung, "Multi Objective Vehicle and Drone Routing Problem with Time Window", Journal of The Korea Society of Computer and Information Vol. 24 No. 1, pp. 167-178, January 2019
  4. Chang-Hun Jiw, Youn-Hee Han, Sung-Tae Moon "Real-Time Hierarchical Deep Reinforcement Learning-Based Drone Trajectory Generation Algorithm Parameter Control", The Journal of Korean Institute of Communications and Information Sciences '23-10 Vol.48 No.10
  5. Seung won Do, Sung woo Jun, Jae hwan Kim, "Hierarchical Structure Multi-agent Reinforcment Learning for Presenting Battlefield Strategy and Tactics", IEIE, Summer Annual conference of IEIE, 3,393-3,395, Jeju, korea, June, 2024.
  6. Jianhao Wang, Zhizhou Ren, Terry Liu, Yang Yu, Chongjie Zhang, "QPLEX: Duplex Dueling Multi-Agent Q-Learning", In International Conference on Learning Representations, 2021. DOI: arXiv:2008.01062v3
  7. T. Schaul, J. Quan, I. Antonoglou, and D. Silver. "Prioritized experience replay.",In ICLR, 2016. DOI: arXiv:1511.05952v4
  8. Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, Shimon Whiteson, "The StarCraft Multi-Agent Challenge", December 2019, DOI: arXiv:1902.04043v5
  9. Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, YI WU, "The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games", 35th Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks DOI: arXiv:2103.01955
  10. Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel, "Value-Decomposition Networks For Cooperative Multi-Agent Learning", Jun 2017, DOI: arXiv:1706.05296v1
  11. Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson, "QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning", Jun 2018, DOI: arXiv:1803.11485v2
  12. Tonghan Wang, Jianhao Wang, Chongyi Zheng, Chongjie Zhang, "Learning Nearly Decomposable Value Functions Via Communication Minimization", Jul 2020, DOI: arXiv:1910.05366v2
  13. Benjamin Ellis, Jonathan Cook, Skander Moalla, Mikayel Samvelyan, Mingfei Sun, Anuj Mahajan, Jakob N. Foerster, Shimon Whiteson, "SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning", Oct 2023, DOI: arXiv:2212.07489
  14. Benjamin Ellis, Jonathan Cook, Skander Moalla, Mikayel Samvelyan, Mingfei Sun, Anuj Mahajan, Jakob N. Foerster, Shimon Whiteson, "SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning", Dec 2013, DOI: arXiv:1312.5602v1