Distributed multi-agent deep reinforcement learning for cooperative multi-robot pursuit

Chao Yu; Yinzhao Dong; Yangning Li; Yatong Chen

Distributed multi-agent deep reinforcement learning for cooperative multi-robot pursuit

View Fulltext

Author(s): Chao Yu ; Yinzhao Dong ; Yangning Li ; Yatong Chen
- Affiliations: 1: School of Data and Computer Science, Sun Yat-Sen University , 510006, Guangzhou , People's Republic of China ;
  2: School of Computer Science and Technology, Dalian University of Technology , Dalian 116024 , People's Republic of China
Source: The Journal of Engineering, 0pp.
DOI: 10.1049/joe.2019.1200 , Available online: 31 July 2020

« Previous Article
Table of contents
Next Article »

This is an open access article published by the IET under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/)

Received 14/10/2019, Accepted 19/11/2019, Published 13/01/2020

As a popular research topic in the area of distributed artificial intelligence, the multi-robot pursuit problem is widely used as a testbed for evaluating coordinated and cooperative strategies in multi-robot systems. This study the problem of multi-robot pursuit game using reinforcement learning (RL) techniques is studied. Unlike most existing studies that apply fully centralised deep RL methods based on the centralised-learning and decentralised-execution scheme, the authors propose a fully decentralised multi-agent deep RL approach by modelling each agent as an individual deep RL agent that has its own individual learning system (i.e. individual action-value function, individual leaning update process, and individual action output). To realise coordination among agents, the limited information of other environmental agents is used as input of the learning process. Experimental results show that both distributed and centralised approaches can ultimately solve the pursuit-evasion problem in different dimensions, but the learning efficiency and coordination performance of the proposed distributed approach are much better than the traditional centralised approach.

References

1. 1)
  - 23. Yu, C., Zhang, M., Ren, F.: ‘Collective learning for the emergence of social norms in networked multiagent systems’, IEEE Trans. Cybern., 2014, 44, (12), pp. 2342–2355.
2. 2)
  - 20. Yu, C., Wang, X., Xu, X., et al: ‘Distributed multiagent coordinated learning for autonomous driving in highways based on dynamic coordination graphs’, IEEE Trans. Intell. Transp. Syst., 2020, 21, (2), pp. 735–748, doi: 10.1109/TITS.2019.2893683.
3. 3)
  - 15. Camci, E., Kayacan, E.: ‘Game of drones: UAV pursuit-evasion game with type-2 fuzzy logic controllers tuned by reinforcement learning’. IEEE Int. Conf. on Fuzzy Systems, Vancouver, Canada, 2016, pp. 618–625.
4. 4)
  - 10. Sunehag, P., Lever, G., Gruslys, A., et al: ‘Value-decomposition networks for cooperative multi-agent learning’. AAMAS, Sao Paulo, Brazil, 2017, pp. 2085–2087.
5. 5)
  - 9. Peng, P., Wen, Y., Yang, Y., et al: ‘Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play StarCraft combat games’, arXiv preprint arXiv:1703.10069, 2017.
6. 6)
  - 16. Kehagias, A., Hollinger, G., Singh, S.: ‘A graph search algorithm for indoor pursuit/evasion’, Math. Comput. Model., 2009, 50, (9), pp. 1305–1317.
7. 7)
  - 7. Schulman, J., Levine, S., Abbeel, P., et al: ‘Trust region policy optimization’. ICML, Lille, France, 2015, pp. 1889–1897.
8. 8)
  - 1. Sutton, R.S., Barto, A.G.: ‘Reinforcement learning: an Introduction’ (MIT Press, Cambridge, MA, USA, 2018).
9. 9)
  - 6. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Human-level control through deep reinforcement learning’, Nature, 2015, 518, (7540), p. 529.
10. 10)
  - 28. Wang, Z., Schaul, T., Hessel, M., et al: ‘Dueling network architectures for deep reinforcement learning’. Conf. on Int. Conf. on Machine Learning, New York City, NY, USA, 2016, pp. 1995–2003.
11. 11)
  - 30. Grondman, I., Busoniu, L., Lopes, G.A., et al: ‘A survey of actor-critic reinforcement learning: standard and natural policy gradients’, IEEE Trans. Syst. Man Cybern., C, 2012, 42, (6), pp. 1291–1307.
12. 12)
  - 3. Silver, D., Huang, A., Maddison, C.J., et al: ‘Mastering the game of Go with deep neural networks and tree search’, Nature, 2015, 529, (7587), pp. 484–489.
13. 13)
  - 25. Watkins Christopher, J.C.H., Dayan, P.: ‘Q-learning’, Mach. Learn., 1992, 8, (3–4), pp. 279–292.
14. 14)
  - 26. Van Hasselt, H., Guez, A., Silver, D.: ‘Deep reinforcement learning with double Q-learning’. AAAI, Austin, TX, USA, 2015.
15. 15)
  - 11. Leibo, J.Z., Zambaldi, V., Lanctot, M., et al: ‘Multi-agent reinforcement learning in sequential social dilemmas’. AAMAS, Sao Paulo, Brazil, 2017, pp. 464–473.
16. 16)
  - 29. Zheng, L., Yang, J., Cai, H., et al: ‘MAgent: a many-agent reinforcement learning platform for artificial collective intelligence’. AAAI, New Orleans, LA, USA, 2018.
17. 17)
  - 5. Busoniu, L., Babuska, R., De Schutter, B.: ‘A comprehensive survey of multiagent reinforcement learning’, IEEE Trans. Syst. Man Cybern.-C, Appl. Rev., 2008, 38, (2), pp. 156–172.
18. 18)
  - 12. Isaacs, R., Philip, R.: ‘Differential games: a mathematical theory with applications to warfare and pursuit’, Control Opt., 1966, 17, (2), pp. 60–60.
19. 19)
  - 4. Olfati-Saber, R., Fax, J.A, Murray, R.M.: ‘Consensus and cooperation in networked multi-agent systems’, Proc. IEEE, 2007, 95, (1), pp. 215–233.
20. 20)
  - 19. Su, Z.B., Lu, J.L., Tong, L.: ‘Strategy of cooperative hunting by multiple Mobile robots’, J. Beijing Inst. Technol., 2004, 5, (8), pp. 403–406.
21. 21)
  - 18. Grizou, J., Barrett, S., Stone, P., et al: ‘Collaboration in Ad hoc teamwork: ambiguous tasks, roles, and communication’. AAMAS Adaptive Learning Agents (ALA) Workshop, Singapore, 2016.
22. 22)
  - 8. Lowe, R., Wu, Y., Tamar, A., et al: ‘Multi-agent actor-critic for mixed cooperative-competitive environments’. Advances in Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6379–6390.
23. 23)
  - 24. Tan, M.: ‘Multi-agent reinforcement learning: independent vs. Cooperative agents’. Machine Learning Proc., Amherst, MA, USA, 1993, pp. 330–337.
24. 24)
  - 2. Jiang, Z., Xu, D., Liang, J.: ‘A deep reinforcement learning framework for the financial portfolio management problem’, arXiv preprint arXiv:1706.10059, 2017.
25. 25)
  - 14. Vidal, R., Shakernia, O., Kim, H.J., et al ‘Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation’, IEEE Trans. Robotics Autom., 2002, 18, (5), pp. 662–669.
26. 26)
  - 27. Hausknecht, M., Stone, P.: ‘Deep recurrent Q-learning for partially observable MDPs’. 2015 AAAI Fall Symp. Series, Arlington, VA, USA, 2015.
27. 27)
  - 13. Parsons, D.T.: ‘Pursuit-evasion in a graph’, Theory and Applications of Graphs. Lecture Notes in Mathematics (Springer, New York City, NY, USA, 1976), pp. 426–441.
28. 28)
  - 22. Yu, C., Zhang, M., Ren, F., et al: ‘Multiagent learning of coordination in loosely coupled multiagent systems’, IEEE Trans. Cybern., 2015, 45, (12), pp. 2853–2867.
29. 29)
  - 17. Hollinger, G., Singh, S., Kehagias, A.: ‘Improving the efficiency of clearing with multi-agent teams’, Int. J. Robotics Res., 2010, 29, (8), pp. 1088–1105.
30. 30)
  - 21. Yu, C., Zhang, M., Ren, F., et al: ‘Emotional multiagent reinforcement learning in spatial social dilemmas’, IEEE Trans. Neural Netw. Learn. Syst., 2015, 26, (12), pp. 3083–3096.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Distributed multi-agent deep reinforcement learning for cooperative multi-robot pursuit

References

Related content