DOI QR코드

DOI QR Code

Avoiding collaborative paradox in multi-agent reinforcement learning

  • Kim, Hyunseok (Intelligent Convergence Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Kim, Hyunseok (Intelligent Convergence Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Lee, Donghun (Intelligent Convergence Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Jang, Ingook (Intelligent Convergence Research Laboratory, Electronics and Telecommunications Research Institute)
  • Received : 2021.01.14
  • Accepted : 2021.05.26
  • Published : 2021.12.01

Abstract

The collaboration productively interacting between multi-agents has become an emerging issue in real-world applications. In reinforcement learning, multi-agent environments present challenges beyond tractable issues in single-agent settings. This collaborative environment has the following highly complex attributes: sparse rewards for task completion, limited communications between each other, and only partial observations. In particular, adjustments in an agent's action policy result in a nonstationary environment from the other agent's perspective, which causes high variance in the learned policies and prevents the direct use of reinforcement learning approaches. Unexpected social loafing caused by high dispersion makes it difficult for all agents to succeed in collaborative tasks. Therefore, we address a paradox caused by the social loafing to significantly reduce total returns after a certain timestep of multi-agent reinforcement learning. We further demonstrate that the collaborative paradox in multi-agent environments can be avoided by our proposed effective early stop method leveraging a metric for social loafing.

Keywords

Acknowledgement

This work was supported by Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government. [21ZR1100, A Study of Hyper-Connected Thinking Internet Technology by autonomous connecting, controlling, and evolving ways].

References

  1. S. Lee, H. Baek, and S. Oh, The role of openness in open collaboration: A focus on open-source software development projects, ETRI J. 42 (2020), no. 2, 196-204. https://doi.org/10.4218/etrij.2018-0536
  2. S. J. Karau and K. D. Williams, Social loafing: A meta-analytic review and theoretical integration, J. Personal. Soc. Psychol. 65 (1993), no. 4, 681. https://doi.org/10.1037/0022-3514.65.4.681
  3. D. Neumann, On the paradox of collaboration, collaborative systems and collaborative networks, in Collaborative Networks in the Internet of Services, Springer, Berlin, Heidelberg, Germany, 2012, pp. 363-373.
  4. N. Jaques et al., Social influence as intrinsic motivation for multi-agent deep reinforcement learning, in Proc. Int. Conf. Mach. Learn. (Long Beach, CA, USA), June 2019, pp. 3040-3049.
  5. O. Vinyals et al., Grandmaster level in starcraft ii using multi-agent reinforcement learning, Nature 575 (2019), no. 7782, 350-354. https://doi.org/10.1038/s41586-019-1724-z
  6. C. Berneret al., Dota 2 with large scale deep reinforcement learning, arXiv preprint, CoRR, 2019, arXiv: 1912.06680.
  7. L. Panait and S. Luke, Cooperative multi-agent learning: The state of the art, Auton. Agent. Multi-Agent Syst. 11 (2005), no. 3, 387-434. https://doi.org/10.1007/s10458-005-2631-2
  8. A. Mahajan et al., Maven: Multi-agent variational exploration, in Proc. Conf. Neural Inf. Process. Syst. (Vancouver, Canada), Dec. 2019, pp. 7611-7622.
  9. R. Lowe et al., Multi-agent actor-critic for mixed cooperative-competitive environments, in Proc. Conf. Neural Inf. Process. Syst. (Long Beach, CA, USA), Jan. 2017, pp. 6379-6390.
  10. S. Omidshafiei et al., Deep decentralized multi-task multi-agent reinforcement learning under partial observability, in Proc. Int. Conf. Mach. Learn. (Sydney, Australia), Aug. 2017, pp. 2681-2690.
  11. E. Hughes et al., Inequity aversion improves cooperation in intertemporal social dilemmas, in Proc. Conf.Neural Inf. Process. Syst. (Montreal, Canada), Dec. 2018, pp. 3326-3336.
  12. D. T. Nguyen, A.Kumar, and H. C. Lau, Credit assignment for collective multiagent RL with global rewards, in Proc. Conf. Neural Inf. Process. Syst. (Montreal, Canada), Dec. 2018, pp. 8102-8113.
  13. T. Rashid et al., Qmix:Monotonic value function factorisation for deep multi-agent reinforcement learning, arXiv preprint, CoRR, 2018, arXiv:1803.11485.
  14. P. Sunehag et al., Value-decomposition networks for cooperative multi-agent learning based on team reward, in Proc. Int. Conf. pp. 2085-2087.
  15. T. Wang et al., Learning nearly decomposable value functions via communication minimization, arXiv preprint, CoRR, 2019, arXiv: 1910.05366.
  16. F. Prantare and F. Heintz, An anytime algorithm for optimal simultaneous coalition structure generation and assignment, Auton. Agent. Multi-Agent Syst. 34 (2020), no. 1, 1-31. https://doi.org/10.1007/s10458-019-09423-z
  17. R. C. Craft and C. Leake, The pareto principle in organizational decision making, Manag. Decis. 40 (2002), no. 8, 729-733. https://doi.org/10.1108/00251740210437699
  18. J. Schulman et al., Trust region policy optimization, in Proc. Int. Conf. Mach. Learn. (Lille, France), July 2015, pp. 1889-1897.
  19. R. Zhao, X. Sun, and V. Tresp, Maximum entropy-regularized multi-goal reinforcement learning, in Proc. Int. Conf. Mach. Learn. (Long Beach, CA, USA), June 2019, pp. 7553-7562.
  20. V. Mnih et al., Playing atari with deep reinforcement learning, arXiv preprint, CoRR, 2013, arXiv: 1312.5602.
  21. D. Silver et al., A general reinforcement learning algorithm that masters chess, shogi, and go through self-play, Science 362 (2018), no. 6419, 1140-1144. https://doi.org/10.1126/science.aar6404
  22. D. Floreano and R. J.Wood, Science, technology and the future of small autonomous drones, Nature 521 (2015), no. 7553, 460. https://doi.org/10.1038/nature14542
  23. M. Bojarski et al., End to end learning for self-driving cars, arXiv preprint, CoRR, 2016, arXiv: 1604.07316.
  24. D. Angelov, Y. Hristov, and S. Ramamoorthy, From demonstrations to task-space specifications. using causal analysis to extract rule parameterization from demonstrations, Auton. Agent. Multi-Agent Syst. 34 (2020), no. 2, 1-19. https://doi.org/10.1007/s10458-019-09423-z
  25. D. Kim et al., Special issue on smart interactions in cyber-physical systems: Humans, agents, robots, machines, and sensors, ETRI J. 40 (2018), no. 4, 417-420. https://doi.org/10.4218/etrij.18.3018.0000
  26. S. A. DeLoach,M. F.Wood, and C. H. Sparkman, Multiagent systems engineering, Int. J. Softw. Eng. Knowl. Eng. 11 (2001), no. 03, 231-258. https://doi.org/10.1142/S0218194001000542
  27. M. G. Lagoudakis and R. Parr, Value function approximation in zero-sum markov games, in Proc. Conf. Uncertain. Artif. Intell. (Alberta, Canada), Aug. 2002, pp. 283-292.
  28. M. L. Littman, Markov games as a framework for multi-agent reinforcement learning, in Machine Learning Proceedings 1994, Morgan Kaufmann, Burlington, MA, USA, 1994, pp. 157-163.
  29. G. J. Laurent, L. Matignon, and L. Fort-Piat, The world of independent learners is not markovian, Int. J. Knowl.-based Intell. Eng. Syst. 15 (2011), no. 1, 55-64.
  30. M. Kleiman-Weiner et al., Coordinate to cooperate or compete: Abstract goals and joint intentions in social interaction, Cogsci, (2016), 1-6.
  31. J. B. Harvey, The abilene paradox: The management of agreement, Organ. Dyn. 3 (1974), no. 1, 63-80. https://doi.org/10.1016/0090-2616(74)90005-9
  32. S. J. Grossman and O. D. Hart, An analysis of the principal-agent problem, in Foundations of Insurance Economics, vol. 4, Springer, Dordrecht, Netherlands, 1992, pp. 302-340.
  33. G. Brockman et al., Openai gym, arXiv preprint, CoRR, 2016, arXiv: 1606.01540.
  34. I. Mordatch and P. Abbeel, Emergence of grounded compositional language in multi-agent populations, in Proc. AAAI Conf. Artif. Intell. (New Orleans, LA, USA), Feb. 2018, pp. 1495-1502.
  35. L. Zheng et al., Magent: A many-agent reinforcement learning platform for artificial collective intelligence, in Proc. AAAI Conf. Artif. Intell. (New Orleans, LA, USA), Feb. 2018, pp. 8222-8223.
  36. J. Hu and M. P. Wellman, Multiagent reinforcement learning: Theoretical framework and an algorithm, in Proc. Int. Conf. Mach. Learn. (San Francisco, CA, USA), July 1998, pp. 242-250.
  37. J. Schulman et al., Proximal policy optimization algorithms, arXiv preprint, CoRR, 2017, arXiv: 1707.06347.
  38. M. Liu andG. Cheng, Early stopping for nonparametric testing, in Proc. Conf. Neural Inf. Process. Syst. (Montreal, Canada), Dec. 2018, pp. 3985-3994.
  39. Z.Wang et al., Sample efficient actor-critic with experience replay, arXiv preprint, CoRR, 2016, arXiv: 1611.01224.
  40. P. Dhariwal et al., Openai baselines, GitHub, 2017.