• 제목/요약/키워드: Multi-Agent Advantage Actor-Critic

검색결과 2건 처리시간 0.016초

Intelligent Warehousing: Comparing Cooperative MARL Strategies

  • Yosua Setyawan Soekamto;Dae-Ki Kang
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제16권3호
    • /
    • pp.205-211
    • /
    • 2024
  • Effective warehouse management requires advanced resource planning to optimize profits and space. Robots offer a promising solution, but their effectiveness relies on embedded artificial intelligence. Multi-agent reinforcement learning (MARL) enhances robot intelligence in these environments. This study explores various MARL algorithms using the Multi-Robot Warehouse Environment (RWARE) to determine their suitability for warehouse resource planning. Our findings show that cooperative MARL is essential for effective warehouse management. IA2C outperforms MAA2C and VDA2C on smaller maps, while VDA2C excels on larger maps. IA2C's decentralized approach, focusing on cooperation over collaboration, allows for higher reward collection in smaller environments. However, as map size increases, reward collection decreases due to the need for extensive exploration. This study highlights the importance of selecting the appropriate MARL algorithm based on the specific warehouse environment's requirements and scale.

Multi-Agent Deep Reinforcement Learning for Fighting Game: A Comparative Study of PPO and A2C

  • Yoshua Kaleb Purwanto;Dae-Ki Kang
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제16권3호
    • /
    • pp.192-198
    • /
    • 2024
  • This paper investigates the application of multi-agent deep reinforcement learning in the fighting game Samurai Shodown using Proximal Policy Optimization (PPO) and Advantage Actor-Critic (A2C) algorithms. Initially, agents are trained separately for 200,000 timesteps using Convolutional Neural Network (CNN) and Multi-Layer Perceptron (MLP) with LSTM networks. PPO demonstrates superior performance early on with stable policy updates, while A2C shows better adaptation and higher rewards over extended training periods, culminating in A2C outperforming PPO after 1,000,000 timesteps. These findings highlight PPO's effectiveness for short-term training and A2C's advantages in long-term learning scenarios, emphasizing the importance of algorithm selection based on training duration and task complexity. The code can be found in this link https://github.com/Lexer04/Samurai-Shodown-with-Reinforcement-Learning-PPO.