Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming

Xiong Yang; Derong Liu; Qinglai Wei

Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming

View Fulltext

Author(s): Xiong Yang ¹ ; Derong Liu ¹ ; Qinglai Wei ¹
- Affiliations: 1: State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
Source: Volume 8, Issue 16, 06 November 2014, p. 1676 – 1688
DOI: 10.1049/iet-cta.2014.0186 , Print ISSN 1751-8644, Online ISSN 1751-8652

Received 19/02/2014, Accepted 06/06/2014, Revised 12/05/2014, Published 12/08/2014

In this study, a novel online adaptive dynamic programming (ADP)-based algorithm is developed for solving the optimal control problem of affine non-linear continuous-time systems with unknown internal dynamics. The present algorithm employs an observer–critic architecture to approximate the Hamilton–Jacobi–Bellman equation. Two neural networks (NNs) are used in this architecture: an NN state observer is constructed to estimate the unknown system dynamics and a critic NN is designed to derive the optimal control instead of typical action–critic dual networks employed in traditional ADP algorithms. Based on the developed architecture, the observer NN and the critic NN are tuned simultaneously. Meanwhile, unlike existing tuning laws for the critic, the newly developed critic update rule not only ensures convergence of the critic to the optimal control but also guarantees stability of the closed-loop system. No initial stabilising control is required, and by using recorded and instantaneous data simultaneously for the adaptation of the critic, the restrictive persistence of excitation condition is relaxed. In addition, Lyapunov direct method is utilised to demonstrate the uniform ultimate boundedness of the weights of the observer NN and the critic NN. Finally, an example is provided to verify the effectiveness of the present approach.

References

1. 1)
  - 32. Haykin, S.: ‘Neural networks and learning machines’ (Prentice-Hall, 2008, 3rd edn.).
2. 2)
  - J.J. Murray , C.J. Cox , G.G. Lendaris , R. Saeks . Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. B , 2 , 140 - 153
3. 3)
  - 20. Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: ‘Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers’, IEEE Control Syst. Mag., 2012, 32, (6), pp. 76–105 (doi: 10.1109/MCS.2012.2214134).
4. 4)
  - 40. Lewis, F.L., Jagannathan, S., Yesildirek, A.: ‘Neural network control of robot manipulators and nonlinear systems’ (Taylor & Francis, 1999).
5. 5)
  - R. Beard , G. Saridis , J. Wen . Galerkin approximation of the generalized Hamilton-Jacobi-Bellman equation. Automatica , 12 , 2159 - 2177
6. 6)
  - 3. Li, H., Liu, D.: ‘Optimal control for discrete-time affine nonlinear systems using general value iteration’, IET Control Theory Appl., 2012, 6, (18), pp. 2725–2736 (doi: 10.1049/iet-cta.2011.0783).
7. 7)
  - 4. Yang, X., Liu, D., Huang, Y: ‘Neural-network-based online optimal control for uncertain non-linear continuous-time systems with control constraints’, IET Control Theory Appl., 2013, 7, (17), pp. 2037–2047 (doi: 10.1049/iet-cta.2013.0472).
8. 8)
  - 35. Hornik, K., Stinchcombe, M., White, H.: ‘Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks’, Neural Netw., 1990, 3, (5), pp. 551–560 (doi: 10.1016/0893-6080(90)90005-6).
9. 9)
  - 19. Sutton, R.S., Barto, A.G.: ‘Reinforcement learning–an introduction’ (MIT Press, 1998).
10. 10)
  - 14. Powell, W.B.: ‘Approximate dynamic programming: solving the curses of dimensionality’ (Wiley, 2011, 2nd edn.).
11. 11)
  - 41. Gampbell, S.L., Meger, C.D.: ‘Generalized inverses of linear transformations’ (Dover Publications, 1991).
12. 12)
  - 2. Lewis, F.L., Vrabie, D., Syrmos, V.L.: ‘Optimal control’ (John Wiley & Sons, 2012).
13. 13)
  - 34. Abdollahi, F., Talebi, H.A., Patel, R.V.: ‘A stable neural network-based observer with application to flexible-joint manipulators’, IEEE Trans. Neural Netw., 2006, 17, (1), pp. 118–129 (doi: 10.1109/TNN.2005.863458).
14. 14)
  - 44. Chowdhary, G.V.: ‘Concurrent learning for convergence in adaptive control without persistency of excitation’. PhD thesis, Georgia Institute of Technology, 2010.
15. 15)
  - 33. Khalil, H.K.: ‘Nonlinear systems’ (Prentice-Hall, 2001, 3rd edn.).
16. 16)
  - K.G. Vamvoudakis , F.L. Lewis . Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica , 5 , 878 - 888
17. 17)
  - D. Liu , Y. Zhang . A self-learning call admission control scheme for CDMA cellular networks. IEEE Trans. Neural Netw. , 5 , 1219 - 1228
18. 18)
  - 7. Werbos, P.J.: ‘Beyond regression: new tools for prediction and analysis in the behavioral sciences’. PhD thesis, Harvard University, 1974.
19. 19)
  - M. Abu-Khalaf , F.L. Lewis . Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica , 5 , 779 - 791
20. 20)
  - 6. Bellman, R.E.: ‘Dynamic programming’ (Princeton University Press, 1957).
21. 21)
  - 29. Dierks, T., Jagannathan, S.: ‘Optimal control of affine nonlinear continuous-time systems’. Am. Control Conf., Baltimore, MD, USA, June–July 2010, pp. 1568–1573.
22. 22)
  - 30. Zhang, H., Cui, L., Luo, Y.: ‘Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP’, IEEE Trans. Cybern., 2013, 43, (1), pp. 206–216 (doi: 10.1109/TSMCB.2012.2203336).
23. 23)
  - 31. Nodland, D., Zargarzadeh, H., Jagannathan, S.: ‘Neural network-based optimal adaptive output feedback control of a helicopter UAV’, IEEE Trans. Neural Netw. Learn. Syst., 2013, 24, (7), pp. 1061–1073.
24. 24)
  - 5. Yang, X., Liu, D., Wang, D.: ‘Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints’, Int. J. Control, 2014, 87, (3), pp. 553–566 (doi: 10.1080/00207179.2013.848292).
25. 25)
  - 1. Bryson, A.E., Ho, Y.C.: ‘Applied optimal control: optimization’, Estimation and Control (Taylor & Francis, 1975).
26. 26)
  - J. Si , Y.-T. Wang . Online learning control by association and reinforcement. IEEE Trans. Neural Netw. , 264 - 276
27. 27)
  - 45. Padhi, R., Unnikrishnan, N., Wang, X., Balakrishnan, S.N.: ‘A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems’, Neural Netw., 2006, 19, (10), pp. 1648–1660 (doi: 10.1016/j.neunet.2006.08.010).
28. 28)
  - 22. Wu, H.N., Luo, B.: ‘Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear H∞ Control’, IEEE Trans. Neural Netw. Learn. Syst., 2012, 23, (12), pp. 1884–1895.
29. 29)
  - 12. Liu, D., Wei, Q.: ‘Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems’, IEEE Trans. Cybern., 2013, 43, (2), pp. 779–789 (doi: 10.1109/TSMCB.2012.2216523).
30. 30)
  - F.Y. Wang , H. Zhang , D. Liu . Adaptive dynamic programming: an introduction. IEEE Computat. Intell. Mag. , 2 , 39 - 47
31. 31)
  - D. Vrabie , F.L. Lewis . Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. , 3 , 237 - 246
32. 32)
  - 21. Liu, D., Yang, X., Li, H.: ‘Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics’, Neural Comput. Appl., 2013, 23, (7–8), pp. 1843–1850 (doi: 10.1007/s00521-012-1249-y).
33. 33)
  - F.L. Lewis , A. Yesildirek , K. Liu . Multilayer neural network robot controller with guaranteed tracking performance. IEEE Trans. Neural Netw. , 2 , 388 - 399
34. 34)
  - D. Prokhorov , D. Wunsch . Adaptive critic designs. IEEE Trans. Neural Netw. , 997 - 1007
35. 35)
  - 28. Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K.G., Lewis, F.L., Dixon, W.E.: ‘A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems’, Automatica, 2013, 49, (1), pp. 82–92 (doi: 10.1016/j.automatica.2012.09.019).
36. 36)
  - 24. Zhang, H., Cui, L., Zhang, X., Luo, Y.: ‘Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method’, IEEE Trans. Neural Netw., 2011, 22, (12), pp. 2226–2236 (doi: 10.1109/TNN.2011.2168538).
37. 37)
  - 46. Chowdhary, G.V.: ‘A singular value maximizing data recording algorithm for concurrent learning’. American Control Conf., San Francisco, CA, USA, 2011, pp. 3547–3552.
38. 38)
  - 39. Rudin, W.: ‘Principles of mathematical analysis’ (McGraw-Hill’, Inc., 1976, 3rd edn.).
39. 39)
  - 13. Wei, Q., Liu, D.: ‘Numerical adaptive learning control scheme for discrete-time non-linear systems’, IET Control Theory Appl., 2013, 7, (11), pp. 1472–1486 (doi: 10.1049/iet-cta.2012.0486).
40. 40)
  - 42. Horn, R.A., Johnson, C.R.: ‘Matrix analysis’ (Cambridge University Press, 2012, 2nd edn.).
41. 41)
  - 37. Yu, W.: ‘Recent advances in intelligent control systems’ (Springer-Verlag, 2009).
42. 42)
  - 23. Ni, Z., He, H., Wu, J.: ‘Adaptive learning in tracking control based on the dual critic network design’, IEEE Trans. Neural Netw. Learn. Syst., 2013, 24, (6), pp. 913–928.
43. 43)
  - 38. Yang, X., Liu, D., Wang, D., Wei, Q.: ‘Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning’, Neural Netw., 2014, 55, pp. 30–41 (doi: 10.1016/j.neunet.2014.03.008).
44. 44)
  - 8. Werbos, P.J.: ‘Approximate dynamic programming for real-time control and neural modeling’, in White, D.A., Sofge, D.A. (Eds.): ‘Handbook of intelligent control: neural, fuzzy, and adaptive approaches’ (Van Nostrand Reinhold, 1992).
45. 45)
  - D. Liu , H. Javaherian , O. Kovalenko , T. Huang . Adaptive critic learning techniques for engine torque and air-fuel ratio control. IEEE Trans. Syst. Man Cybern. B , 4 , 988 - 993
46. 46)
  - 11. Liu, D., Wang, D., Yang, X.: ‘An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs’, Inf. Sci., 2013, 220, pp. 331–342 (doi: 10.1016/j.ins.2012.07.006).

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming

References

Related content