http://iet.metastore.ingenta.com
1887

Single-network ADP for near optimal control of continuous-time zero-sum games without using initial stabilising control laws

Single-network ADP for near optimal control of continuous-time zero-sum games without using initial stabilising control laws

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Control Theory & Applications — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

This study establishes an approximate optimal critic learning algorithm based on single-network adaptive dynamic programming aiming at solutions to continuous-time two-player zero-sum games in the absence of initial stabilising control policies. Single-network means one critic neural network, which is utilised to derive the saddle-point equilibrium of a zero-sum differential game by approximately learning the value function. First, the authors elaborate mathematically two-player zero-sum game problems and analyse the similarity of the zero-sum game problem between linear and non-linear systems. Then, this adaptive learning scheme is implemented as a critic structure that derives control and disturbance policies by learning the optimal value, and a novel weight tuning law involving a stable operator is proposed to ensure convergence and stability. Moreover, the uniform ultimate bounded stability of the whole system is rigorously proved by Lyapunov theory. Finally, reasonable simulation results are provided to confirm the effectiveness of the improved approximate optimal control technique in solving equations for a complex linear system and a non-linear system.

References

    1. 1)
      • 1. Zhang, H., Jiang, H., Luo, C., et al: ‘Discrete-time nonzero-sum games for multiplayer using policy-iteration-based adaptive dynamic programming algorithms’, IEEE Trans. Cybern., 2017, 47, (3), pp. 33313340.
    2. 2)
      • 2. Zhu, Y., Zhao, D., Li, X.: ‘Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data’, IEEE Trans. Neural Netw. Learn. Syst., 2017, 28, (3), pp. 714725.
    3. 3)
      • 3. Johnson, M., Kamalapurkar, R., Bhasin, S., et al: ‘Approximate N-player nonzero-sum game solution for an uncertain continuous nonlinear system’, IEEE Trans. Neural Netw. Learn. Syst., 2015, 26, (8), pp. 16451658.
    4. 4)
      • 4. Dierks, T., Jagannathan, S.: ‘Optimal control of affine nonlinear continuous-time systems using an online hamilton-jacobi-Isaacs formulation’. 49th IEEE Conf. Decision and Control, Atlanta, GA, 2010, pp. 30483053.
    5. 5)
      • 5. Denardo, E.V.: ‘Introduction to game theory’ (Springer, Boston, MA, 2011).
    6. 6)
      • 6. Zhang, W., Chen, B., Tseng, C.-S.: ‘Robust H filtering for nonlinear stochastic systems’, IEEE Trans. Signal Process., 2005, 53, (2), pp. 589598.
    7. 7)
      • 7. Yang, X., He, H., Liu, D., et al: ‘Adaptive dynamic programming for robust neural control of unknown continuous-time non-linear systems’, IET Control Theory Appl., 2017, 11, (14), pp. 23072316.
    8. 8)
      • 8. Mu, C., Wang, D., Sun, C., et al: ‘Robust adaptive critic control design with network-based event-triggered formulation’, Nonlinear Dyn., 2017, 90, (3), pp. 20232035.
    9. 9)
      • 9. Yang, X., Liu, D., Huang, Y.: ‘Neural-network-based online optimal control for uncertain nonlinear continuous-time systems with control constraints’, IET Control Theory Appl., 2013, 7, (17), pp. 20372047.
    10. 10)
      • 10. Modares, H., Lewis, F.L., Davoudi, A.: ‘Optimal output synchronization of nonlinear multi-agent systems using approximate dynamic programming’. 2016 Int. Joint Conf. Neural Networks, Vancouver, BC, 2016, pp. 42274232.
    11. 11)
      • 11. Wang, F.Y., Zhang, H., Liu, D.: ‘Adaptive dynamic programming: an introduction’, IEEE Comput. Intell. Mag., 2009, 4, (2), pp. 3947.
    12. 12)
      • 12. Lewis, F.L., Vrabie, D.: ‘Reinforcement learning and adaptive dynamic programming for feedback control’, IEEE Circuits Syst. Mag., 2009, 9, (3), pp. 3250.
    13. 13)
      • 13. Mu, C., Wang, D., He, H.: ‘Novel iterative neural dynamic programming for data-based approximate optimal control design’, Automatica, 2017, 81, pp. 240252.
    14. 14)
      • 14. Mu, C., Ni, Z., Sun, C., et al: ‘Air-breathing hypersonic vehicle tracking control based on adaptive dynamic programming’, IEEE Trans. Neural Netw. Learn. Syst., 2017, 28, (3), pp. 584598.
    15. 15)
      • 15. Abu-Khalaf, M., Lewis, F.L.: ‘Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach’, Automatica, 2005, 41, (5), pp. 779791.
    16. 16)
      • 16. Modares, H., Lewis, F.L.: ‘Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning’, Automatica, 2014, 50, (7), pp. 17801792.
    17. 17)
      • 17. Li, H., Liu, D.: ‘Optimal control for discrete-time affine nonlinear systems using general value iteration’, IET Control Theory Appl., 2012, 6, (18), pp. 27252736.
    18. 18)
      • 18. Wang, D., Mu, C., Liu, D.: ‘Data-driven nonlinear near-optimal regulation based on iterative neural dynamic programming’, Acta Autom. Sin., 2017, 43, (3), pp. 366375.
    19. 19)
      • 19. Lv, Y., Na, J., Ren, X.: ‘Online H control for completely unknown nonlinear systems via an identifier-critic-based ADP structure’, Int. J. Control, 2017, 3, pp. 128.
    20. 20)
      • 20. Si, J., Barto, A.G., Powell, W.B., et al: ‘Handbook of learning and approximate dynamic programming’ (Wiley, New York, 2004).
    21. 21)
      • 21. He, H., Ni, Z., Fu, J.: ‘A three-network architecture for on-line learning and optimization based on adaptive dynamic programming’, Neurocomputing, 2012, 78, (1), pp. 313.
    22. 22)
      • 22. Song, R., Lewis, F.L., Wei, Q., et al: ‘Multiple actor-critic structures for continuous-time optimal control using input-output data’, IEEE Trans. Neural Netw. Learn. Syst., 2015, 26, (4), pp. 851865.
    23. 23)
      • 23. Al-Tamimi, A., Abu-Khalaf, M., Lewis, F.L.: ‘Adaptive critic designs for discrete-time zero-sum games with application to H control’, IEEE Trans. Syst. Man Cybern. B, Cybern., 2007, 37, (1), pp. 240247.
    24. 24)
      • 24. Zhang, H., Wei, Q., Liu, D.: ‘An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games’, Automatica, 2010, 47, (1), pp. 207214.
    25. 25)
      • 25. Vamvoudakis, K.G., Lewis, F.L.: ‘Online solution of nonlinear two-player zero-sum games using synchronous policy iteration’, Int. J. Robust Nonlinear Control, 2012, 22, (13), pp. 14601483.
    26. 26)
      • 26. Wang, D., Mu, C., Liu, D., et al: ‘On mixed data and event driven design for adaptive-critic-based nonlinear H control’, IEEE Trans. Neural Netw. Learn. Syst., 2016, PP, (99), pp. 113.
    27. 27)
      • 27. Wei, Q., Song, R., Yan, P.: ‘Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP’, IEEE Trans. Neural Netw. Learn. Sys., 2016, 27, (2), pp. 444458.
    28. 28)
      • 28. Liu, D., Li, H., Wang, D.: ‘Neural-network-based zerosum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm’, Neurocomputing, 2013, 110, (13), pp. 92100.
    29. 29)
      • 29. Zhang, H., Cui, L., Luo, Y.: ‘Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP’, IEEE Trans. Cybern., 2013, 43, (1), pp. 206216.
    30. 30)
      • 30. Lv, Y., Ren, X., Na, J.: ‘Online optimal solutions for multi-player nonzero-sum game with completely unknown dynamics’, Neurocomputing, 2018, 283, pp. 8797.
    31. 31)
      • 31. Liu, D., Li, H., Wang, D.: ‘Online synchronous approximate optimal learning algorithm for multi-player non-zero-sum games with unknown dynamics’, IEEE Trans. Syst. Man Cybern.: Syst., 2014, 44, (8), pp. 10151027.
    32. 32)
      • 32. Vamvoudakis, K.G., Lewis, F.L.: ‘Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem’, Automatica, 2010, 46, (5), pp. 878888.
    33. 33)
      • 33. Li, H., Liu, D., Wang, D.: ‘Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics’, IEEE Trans. Autom. Sci. Eng., 2014, 11, (3), pp. 706714.
    34. 34)
      • 34. Fu, Y., Chai, T.: ‘Online solution of two-player zero-sum games for continuous-time nonlinear systems with completely unknown dynamics’, IEEE Trans. Neural Netw. Learn. Syst., 2016, 27, (12), pp. 25772587.
    35. 35)
      • 35. van der Schaft, A.J.: ‘L2-gain analysis of nonlinear systems and nonlinear state-feedback H control’, IEEE Trans. Autom. Control, 1992, 37, (6), pp. 770784.
    36. 36)
      • 36. Wang, D., Mu, C., Zhang, Q., et al: ‘Event-based input-constrained nonlinear H state feedback with adaptive critic and neural implementation’, Neurocomputing, 2016, 214, pp. 848856.
    37. 37)
      • 37. Vrabie, D., Vamvoudakis, K., Lewis, F.: ‘Adaptive optimal controllers based on generalized policy iteration in a continuous-time framework’. 17th Mediterranean Conf. Control and Automation, Thessaloniki, 2009, pp. 14021409.
    38. 38)
      • 38. Basar, T., Bernhard, P.: ‘H optimal control and related minimax design problems: a dynamic game approach’ (Birkhuser, Boston, MA, 2008).
    39. 39)
      • 39. Vamvoudakis, K., Lewis, F.L.: ‘Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton–Jacobi equations’, Automatica, 2011, 47, (8), pp. 15561569.
    40. 40)
      • 40. Yang, X., Liu, D., Wei, Q.: ‘Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming’, IET Control Theory Appl., 2014, 8, (16), pp. 16761688.
    41. 41)
      • 41. Song, R., Lewis, F.L., Wei, Q.: ‘Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games’, IEEE Trans. Neural Netw. Learn. Syst., 2017, 28, (3), pp. 704713.
    42. 42)
      • 42. Wu, H.-N., Luo, B.: ‘Simultaneous policy update algorithms for learning the solution of linear continuous-time H state feedback control’, Inf. Sci., 2013, 222, pp. 472485.
    43. 43)
      • 43. Wu, H.-N., Luo, B.: ‘Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear H control’, IEEE Trans. Neural Netw. Learn. Syst., 2012, 23, (12), pp. 18841895.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cta.2018.5464
Loading

Related content

content/journals/10.1049/iet-cta.2018.5464
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address