Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Output feedback reinforcement learning based optimal output synchronisation of heterogeneous discrete-time multi-agent systems

This study proposes a model-free distributed output feedback control scheme that achieves synchronisation of the outputs of the heterogeneous follower agents with that of the leader agent in a directed network. A distributed two degree of freedom approach is presented that separates the learning of the optimal output feedback and the feedforward terms of the local control law for each agent. The local feedback parameters are learned using the proposed off-policy Q-learning algorithm, whereas a gradient adaptive law is presented to learn the local feedforward control parameters to achieve asymptotic tracking of each agent. This learning scheme and the resulting distributed control laws neither require access to the local internal state of the agents nor do they need an additional distributed leader state observer. The proposed approach has the advantage over the previous state augmentation approaches as it circumvents the need of introducing a discounting factor in the local performance functions. It is shown that the proposed algorithm converges to the optimal solution of the algebraic Riccati equation and the output regulator equations without explicitly solving them as long as the leader agent is reachable directly or indirectly from all the follower agents. Simulation results validate the proposed scheme.

References

    1. 1)
      • 18. Zhang, H., Lewis, F.L., Das, A.: ‘Optimal design for synchronization of cooperative systems: state feedback, observer and output feedback’, IEEE Trans. Autom. Control, 2011, 56, (8), pp. 19481952.
    2. 2)
      • 20. Wen, G.X., Chen, C.P., Liu, Y.J., et al: ‘Neural-network-based adaptive leader-following consensus control for second-order non-linear multi-agent systems’, IET Control Theory Appl., 2015, 9, (13), pp. 19271934.
    3. 3)
      • 14. Li, Z., Duan, Z., Lewis, F.L.: ‘Distributed robust consensus control of multi-agent systems with heterogeneous matching uncertainties’, Automatica, 2014, 50, (3), pp. 883889.
    4. 4)
      • 13. Shi, L., Li, Y., Lin, Z.: ‘Semi-global leader-following output consensus of heterogeneous multi-agent systems with input saturation’, Int. J. Robust Nonlinear Control, 2018, 28, pp. 49164930.
    5. 5)
      • 37. Lewis, F.L., Syrmos, V.L.: ‘Optimal control’ (John Wiley & Sons, Hoboken, 1995).
    6. 6)
      • 16. Lewis, F.L., Zhang, H., Hengster-Movric, K., et al: ‘Cooperative control of multi-agent systems: optimal and adaptive design approaches’ (Springer Science and Business Media, London, 2013).
    7. 7)
      • 42. Lewis, F.L., Vrabie, D.: ‘Reinforcement learning and adaptive dynamic programming for feedback control’, IEEE Circuits Syst. Mag., 2009, 9, (3), pp. 3250.
    8. 8)
      • 40. Bradtke, S.J., Ydstie, B.E., Barto, A.G: ‘Adaptive linear quadratic control using policy iteration’. Proc. of the 1994 American Control Conf., Baltimore, USA, 1994, pp. 34753479.
    9. 9)
      • 21. Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: ‘Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers’, IEEE Control Syst. Mag., 2012, 32, (6), pp. 76105.
    10. 10)
      • 28. Gao, W., Liu, Y., Odekunle, A., et al: ‘Adaptive dynamic programming and cooperative output regulation of discrete-time multi-agent systems’, Int. J. Control Autom. Syst., 2018, 16, (5), pp. 22732281.
    11. 11)
      • 30. Kiumarsi, B., Lewis, F.L.: ‘Output synchronization of heterogeneous discrete-time systems: a model-free optimal approach’, Automatica, 2017, 84, pp. 8694.
    12. 12)
      • 1. Ferber, J., Weiss, G.: ‘Multi-agent systems: an introduction to distributed artificial intelligence’, vol. 1 (Addison-Wesley Reading, Boston, 1999).
    13. 13)
      • 12. Kim, H., Shim, H., Seo, J.H.: ‘Output consensus of heterogeneous uncertain linear multi-agent systems’, IEEE Trans. Autom. Control, 2011, 56, (1), pp. 200206.
    14. 14)
      • 3. Ren, W., Beard, R.W., Atkins, E.M.: ‘Information consensus in multivehicle cooperative control’, IEEE Control Syst., 2007, 27, (2), pp. 7182.
    15. 15)
      • 41. Bertsekas, D.P., Tsitsiklis, J.: ‘Neuro-dynamic programming’ (Athena Scientific, Belmont, MA, USA, 1996).
    16. 16)
      • 2. Olfati-Saber, R., Fax, J.A., Murray, R.M.: ‘Consensus and cooperation in networked multi-agent systems’, Proc. IEEE, 2007, 95, (1), pp. 215233.
    17. 17)
      • 36. Lewis, F.L., Vamvoudakis, K.G.: ‘Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data’, IEEE Trans. Syst. Man Cybern. B, Cybern., 2011, 41, (1), pp. 1425.
    18. 18)
      • 32. Su, H., Wu, H., Chen, X.: ‘Observer-based discrete-time nonnegative edge synchronization of networked systems’, IEEE Trans. Neural Netw. Learn. Syst., 2017, 28, (10), pp. 24462455.
    19. 19)
      • 29. Modares, H., Nageshrao, S.P., Lopes, G.A.D., et al: ‘Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning’, Automatica, 2016, 71, pp. 334341.
    20. 20)
      • 10. Zheng, Y., Zhu, Y., Wang, L.: ‘Consensus of heterogeneous multi-agent systems’, IET Control Theory Appl., 2011, 5, (16), pp. 18811888.
    21. 21)
      • 35. Huang, J.: ‘Nonlinear output regulation: theory and applications’ (SIAM, Philadelphia, 2004).
    22. 22)
      • 15. Su, Y., Huang, J.: ‘Cooperative output regulation of linear multi-agent systems’, IEEE Trans. Autom. Control, 2012, 57, (4), pp. 10621066.
    23. 23)
      • 8. Fax, J.A., Murray, R.M.: ‘Information flow and cooperative control of vehicle formations’, IEEE Trans. Autom. Control, 2004, 49, (9), pp. 14651476.
    24. 24)
      • 19. Das, A., Lewis, F.L.: ‘Distributed adaptive control for synchronization of unknown nonlinear networked systems’, Automatica, 2010, 46, (12), pp. 20142021.
    25. 25)
      • 6. Li, S., Du, H., Lin, X.: ‘Finite-time consensus algorithm for multi-agent systems with double-integrator dynamics’, Automatica, 2011, 47, (8), pp. 17061712.
    26. 26)
      • 26. Abouheaf, M.I., Lewis, F.L., Vamvoudakis, K.G., et al: ‘Multi-agent discrete-time graphical games and reinforcement learning solutions’, Automatica, 2014, 50, (12), pp. 30383053.
    27. 27)
      • 4. Hong, Y., Hu, J., Gao, L.: ‘Tracking control for multi-agent consensus with an active leader and variable topology’, Automatica, 2006, 42, (7), pp. 11771182.
    28. 28)
      • 33. Cai, H., Lewis, F.L., Hu, G., et al: ‘The adaptive distributed observer approach to the cooperative output regulation of linear multi-agent systems’, Automatica, 2017, 75, pp. 299305.
    29. 29)
      • 7. Li, Z., Duan, Z., Chen, G., et al: ‘Consensus of multiagent systems and synchronization of complex networks: A unified viewpoint’, IEEE Trans. Circuits Syst. I, Regul. Pap., 2010, 57, (1), pp. 213224.
    30. 30)
      • 39. Sutton, R.S., Barto, A.G.: ‘Reinforcement learning: an introduction’ (MIT Press, Cambridge, 1998).
    31. 31)
      • 9. Zhao, Z., Lin, Z.: ‘Global leader-following consensus of a group of general linear systems using bounded controls’, Automatica, 2016, 68, pp. 294304.
    32. 32)
      • 31. Postoyan, R., Busoniu, L., Nesic, D., et al: ‘Stability analysis of discrete-time infinite-horizon optimal control with discounted cost’, IEEE Trans. Autom. Control, 2017, 62, (6), pp. 27362749.
    33. 33)
      • 27. Gao, W., Jiang, Z.P., Lewis, F.L., et al: ‘Leader-to-formation stability of multi-agent systems: an adaptive optimal control approach’, IEEE Trans. Autom. Control, 2018, 63, (10), pp. 35813587.
    34. 34)
      • 34. Huang, J.: ‘The cooperative output regulation problem of discrete-time linear multi-agent systems by the adaptive distributed observer’, IEEE Trans. Autom. Control, 2017, 62, (4), pp. 19791984.
    35. 35)
      • 22. Wang, F.Y., Zhang, H., Liu, D.: ‘Adaptive dynamic programming: an introduction’, IEEE Comput. Intell. Mag., 2009, 4, (2), pp. 3947.
    36. 36)
      • 5. Cao, W., Zhang, J., Ren, W.: ‘Leader–follower consensus of linear multi-agent systems with unknown external disturbances’, Syst. Control Lett., 2015, 82, pp. 6470.
    37. 37)
      • 23. Busoniu, L., Babuska, R., De Schutter, B.: ‘A comprehensive survey of multiagent reinforcement learning’, IEEE Trans. Syst. Man Cybern. C, Appl. Rev., 2008, 38, (2), pp. 156172.
    38. 38)
      • 11. Liu, C.L., Liu, F.: ‘Stationary consensus of heterogeneous multi-agent systems with bounded communication delays’, Automatica, 2011, 47, (9), pp. 21302133.
    39. 39)
      • 38. Lancaster, P., Rodman, L.: ‘Algebraic Riccati equations’ (Clarendon press, Wotton-under-Edge, UK, 1995).
    40. 40)
      • 25. Vamvoudakis, K.G., Lewis, F.L., Hudas, G.R.: ‘Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality’, Automatica, 2012, 48, (8), pp. 15981611.
    41. 41)
      • 17. Cao, Y., Ren, W.: ‘Optimal linear-consensus algorithms: an LQR perspective’, IEEE Trans. Syst. Man Cybern. B, Cybern., 2010, 40, (3), pp. 819830.
    42. 42)
      • 24. Shoham, Y., Powers, R., Grenager, T.: ‘Multi-agent reinforcement learning: a critical survey’. Technical report, Stanford University, 2003.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cta.2018.6266
Loading

Related content

content/journals/10.1049/iet-cta.2018.6266
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address