http://iet.metastore.ingenta.com
1887

Neural-network-based online optimal control for uncertain non-linear continuous-time systems with control constraints

Neural-network-based online optimal control for uncertain non-linear continuous-time systems with control constraints

For access to this article, please select a purchase option:

Buy article PDF
$19.95
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Control Theory & Applications — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

In this study, an online adaptive optimal control scheme is developed for solving the infinite-horizon optimal control problem of uncertain non-linear continuous-time systems with the control policy having saturation constraints. A novel identifier-critic architecture is presented to approximate the Hamilton–Jacobi–Bellman equation using two neural networks (NNs): an identifier NN is used to estimate the uncertain system dynamics and a critic NN is utilised to derive the optimal control instead of typical action–critic dual networks employed in reinforcement learning. Based on the developed architecture, the identifier NN and the critic NN are tuned simultaneously. Meanwhile, unlike initial stabilising control indispensable in policy iteration, there is no special requirement imposed on the initial control. Moreover, by using Lyapunov's direct method, the weights of the identifier NN and the critic NN are guaranteed to be uniformly ultimately bounded, while keeping the closed-loop system stable. Finally, an example is provided to demonstrate the effectiveness of the present approach.

References

    1. 1)
      • 1. Mahmoud, M.S.: ‘Stabilization of dynamical systems with nonlinear actuators’, J. Franklin Inst., 1997, 334, (3), pp. 357375 (doi: 10.1016/S0016-0032(96)00097-X).
    2. 2)
      • 2. Haidar, A., Boukas, E.K., Xu, S., Lam, J.: ‘Exponential stability and static output feedback stabilisation of singular time-delay systems with saturating actuators’, IET Control Theory Appl., 2009, 3, (9), pp. 12931305 (doi: 10.1049/iet-cta.2008.0212).
    3. 3)
      • 3. Song, G., Zhang, Y., Xu, S.: ‘Stability and l2-gain analysis for a class of discrete-time nonlinear Markovian jump systems with actuator saturation and incomplete knowledge of transition probabilities’, IET Control Theory Appl., 2012, 6, (17), pp. 27162723 (doi: 10.1049/iet-cta.2012.0101).
    4. 4)
      • 4. Fan, J.H., Zhang, Y.M., Zheng, Z.Q.: ‘Robust fault-tolerant control against time-varying actuator faults and saturation’, IET Control Theory Appl., 2012, 6, (14), pp. 21982208 (doi: 10.1049/iet-cta.2011.0713).
    5. 5)
      • 5. Kokotović, P., Arcak, M.: ‘Constructive nonlinear control: a historical perspective’, Automatica, 2001, 37, (5), pp. 637662.
    6. 6)
      • 6. Abdollahi, F., Talebi, H.A., Patel, R.V.: ‘A stable neural network-based observer with application to flexible-joint manipulators’, IEEE Trans. Neural Netw., 2006, 17, (1), pp. 118129 (doi: 10.1109/TNN.2005.863458).
    7. 7)
      • 7. Lewis, F.L., Syrmos, V.L.: ‘Optimal Control’ (John Wiley & Sons, 1995).
    8. 8)
      • 8. Laub, A.J.: ‘A Schur method for solving algebraic Riccati equations’, IEEE Trans. Autom. Control, 1979, 24, (6), pp. 913921 (doi: 10.1109/TAC.1979.1102178).
    9. 9)
      • 9. Bellman, R.E.: ‘Dynamic Programming’ (Princeton University Press, 1957).
    10. 10)
      • 10. Werbos, P.J.: ‘Beyond regression: new tools for prediction and analysis in the Behavioral Sciences’. PhD thesis, Harvard University, 1974.
    11. 11)
      • 11. Werbos, P.J.: ‘Advanced forecasting methods for global crisis warning and models of intelligence’, Gen. Syst. Yearbook, 1977, 22, pp. 2538.
    12. 12)
      • 12. Werbos, P.J.: ‘Approximate dynamic programming for real-time control and neural modeling’, in: White, D.A., Sofge, D.A. (Ed.): ‘Handbook of intelligent control: neural, fuzzy, and adaptive approaches’ (Van Nostrand Reinhold, 1992).
    13. 13)
      • 13. Bertsekas, D.P., Tsitsiklis, J.N.: ‘Neuro-Dynamic Programming’ (Athena Scientific, 1996).
    14. 14)
      • 14. Prokhorov, D.V., Wunsch, D.C.: ‘Adaptive critic designs’, IEEE Trans. Neural Netw., 1997, 8, (5), pp. 9971007 (doi: 10.1109/72.623201).
    15. 15)
      • 15. Murray, J.J., Cox, C.J., Lendaris, G.G., Saeks, R.: ‘Adaptive dynamic programming’, IEEE Trans. Syst. Man Cybern. C, Appl. Rev., 2002, 32, (2), pp. 140153 (doi: 10.1109/TSMCC.2002.801727).
    16. 16)
      • 16. Wang, F.Y., Zhang, H., Liu, D.: ‘Adaptive dynamic programming: an introduction’, IEEE Comput. Intell. Mag., 2009, 4, (2), pp. 3947 (doi: 10.1109/MCI.2009.932261).
    17. 17)
      • 17. Li, H., Liu, D.: ‘Optimal control for discrete-time affine nonlinear systems using general value iteration’, IET Control Theory Appl., 2012, 6, (18), pp. 27252736 (doi: 10.1049/iet-cta.2011.0783).
    18. 18)
      • 18. Sutton, R.S., Barto, A.G.: ‘Reinforcement learning – an introduction’ (MIT Press, 1998).
    19. 19)
      • 19. Lewis, F.L., Vrabie, D.: ‘Reinforcement learning and adaptive dynamic programming for feedback control’, IEEE Circuits Syst. Mag., 2009, 9, (3), pp. 3250 (doi: 10.1109/MCAS.2009.933854).
    20. 20)
      • 20. Abu-Khalaf, M., Lewis, F.L.: ‘Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach’, Automatica, 2005, 41, (5), pp. 779791 (doi: 10.1016/j.automatica.2004.11.034).
    21. 21)
      • 21. Vamvoudakis, K.G., Lewis, F.L.: ‘Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem’, Automatica, 2010, 46, (5), pp. 878888 (doi: 10.1016/j.automatica.2010.02.018).
    22. 22)
      • 22. Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K.G., Lewis, F.L., Dixon, W.E.: ‘A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems’, Automatica, 2013, 49, (1), pp. 8292 (doi: 10.1016/j.automatica.2012.09.019).
    23. 23)
      • 23. Dierks, T., Jagannathan, S.: ‘Optimal control of affine nonlinear continuous-time systems’. American Control Conf., Baltimore, MD, USA, June–July 2010, pp. 15681573.
    24. 24)
      • 24. Zhang, H., Cui, L., Zhang, X., Luo, Y.: ‘Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method’, IEEE Trans. Neural Netw., 2011, 22, (12), pp. 22262236 (doi: 10.1109/TNN.2011.2168538).
    25. 25)
      • 25. Yu, W.: ‘Recent advances in intelligent control systems’ (Springer–Verlag, 2009).
    26. 26)
      • 26. Haykin, S.: ‘Neural networks and learning machines’ (Prentice-Hall, 2008).
    27. 27)
      • 27. Modares, H., Lewis, F.L., Sistani, M.: ‘Online solution of nonquadratic two-player zero-sum games arising in the H control of constrained input systems’, Int. J. Adapt. Control Signal Process., 2012, doi: 10.1002/acs.2348.
    28. 28)
      • 28. Khalil, H.: ‘Nonlinear systems’ (Prentice-Hall, 2002, 3rd edn.).
    29. 29)
      • 29. Hornic, K., Stinchombe, M.: ‘Multilayer feedforward neural networks are universal approximators’, Neural Netw., 1989, 2, (5), pp. 359366 (doi: 10.1016/0893-6080(89)90020-8).
    30. 30)
      • 30. Lewis, F.L., Jagannathan, S., Yesildirek, A.: ‘Neural network control of robot manipulators and nonlinear systems’ (Taylor & Francis, 1999).
    31. 31)
      • 31. Beard, R., Saridis, G., Wen, J.: ‘Galerkin approximations of the generalized Hamilton–Jacobi–Bellman equation’, Automatica, 1997, 33, (12), pp. 21592177 (doi: 10.1016/S0005-1098(97)00128-3).
    32. 32)
      • 32. Rudin, W.: ‘Principles of mathematical analysis’ (McGraw-Hill, 1976, 3rd edn.).
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cta.2013.0472
Loading

Related content

content/journals/10.1049/iet-cta.2013.0472
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address