The optimal tracking of non-linear systems without knowing system dynamics is an important and intractable problem. Based on the framework of reinforcement learning (RL) and adaptive dynamic programming, a model-free adaptive optimal tracking algorithm is proposed in this study. After constructing an augmented system with the tracking errors and the reference states, the tracking problem is converted to a regulation problem with respect to the new system. Several RL techniques are synthesised to form a novel algorithm which learns the optimal solution online in real time without any information of the system dynamics. Continuous adaptation laws are defined by the current observations and the past experience. The convergence is guaranteed by Lyapunov analysis. Two simulations on a linear and a non-linear systems demonstrate the performance of the proposed approach.

References

1. 1)
  - 26. K.G., Vamvoudakis, , , D., Vrabie, , , F.L., Lewis, : ‘Online adaptive algorithm for optimal control with integral reinforcement learning’, Int. J. Robust Nonlinear Control, 2014, 24, (17), pp. 2686–2710 (doi: 10.1002/rnc.3018).
2. 2)
  - 4. Wang, F.-Y., Zhang, H., Liu, D.: ‘Adaptive dynamic programming: an introduction’, IEEE Comput. Intell. Mag., 2009, 4, (2), pp. 39–47 (doi: 10.1109/MCI.2009.932261).
3. 3)
  - 2. F.L., Lewis, , , D., Liu, : ‘Reinforcement learning and approximate dynamic programming for feedback control’ (Wiley, New York, 2012).
4. 4)
  - 15. Modares, H., Lewis, F.L., Naghibi-Sistani, M.-B.: ‘Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems’, Automatica, 2014, 50, (1), pp. 193–202 (doi: 10.1016/j.automatica.2013.09.043).
5. 5)
  - 34. H., Modares, , , F., Lewis, , , Z.-P., Jiang, : ‘H∞ tracking control of completely unknown continuous-time systems via off-policy reinforcement learning’, IEEE Trans. Neural Netw. Learn. Syst., 2015, 26, (10), pp. 2550–2562 (doi: 10.1109/TNNLS.2015.2441749).
6. 6)
  - 16. X.-W., Jiang, , , Z.-H., Guan, , , G., Feng, , et al.: ‘Optimal tracking performance of networked control systems with channel input power constraint’, IET Control Theory Appl., 2012, 6, (11), pp. 1690–1698 (doi: 10.1049/iet-cta.2011.0329).
7. 7)
  - 18. Yang, X., Liu, D., Wei, Q.: ‘Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming’, IET Control Theory Appl., 2014, 8, (16), pp. 1676– 1688 (doi: 10.1049/iet-cta.2014.0186).
8. 8)
  - 15. Li, H., Liu, D.: ‘Optimal control for discrete-time affine nonlinear systems using general value iteration’, IET Control Theory Appl., 2012, 6, (18), pp. 2725–2736 (doi: 10.1049/iet-cta.2011.0783).
9. 9)
  - 25. Vrabie, D., Lewis, F.: ‘Neural network approach to continuoustime direct adaptive optimal control for partially unknown nonlinear systems’, Neural Netw., 2009, 22, (3), pp. 237–246 (doi: 10.1016/j.neunet.2009.03.008).
10. 10)
  - 26. D., Zhao, , , Y., Zhu, : ‘MEC – a near-optimal online reinforcement learning algorithm for continuous deterministic systems’, IEEE Trans. Neural Netw. Learn. Syst., 2015, 26, (2), pp. 346–356 (doi: 10.1109/TNNLS.2014.2371046).
11. 11)
  - 3. Zhang, H., Wei, Q., Luo, Y.: ‘A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy hdp iteration algorithm’, IEEE Trans. Syst. Man Cybern. B, Cybern., 2008, 38, (4), pp. 937–942 (doi: 10.1109/TSMCB.2008.920269).
12. 12)
  - 14. G., Toussaint, , , T., Basar, , , F., Bullo, : ‘H ∞-optimal tracking control techniques for nonlinear underactuated systems’, Proc. 39th IEEE Conf. on Decision and Control, 2000, vol.3, pp. 2078–2083 (doi: 10.1109/CDC.2000.914100).
13. 13)
  - 13. Y.-M., Park, , , M.-S., Choi, , , K., Lee, : ‘An optimal tracking neuro-controller for nonlinear dynamic systems’, IEEE Trans. Neural Netw., 1996, 7, (5), pp. 1099–1110 (doi: 10.1109/72.536307).
14. 14)
  - 41. Kiumarsi, B., Lewis, F.L., Modares, H., Karimpour, A., Naghibi-Sistani, M.: ‘Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics’, Automatica, 2014, 50, (4), pp. 1167–1175 (doi: 10.1016/j.automatica.2014.02.015).
15. 15)
  - 33. C., Qin, , , H., Zhang, , , Y., Luo, : ‘Online optimal tracking control of continuous-time linear systems with unknown dynamics by using adaptive dynamic programming’, Int. J. Control, 2014, 87, (5), pp. 1000–1009 (doi: 10.1080/00207179.2013.863432).
16. 16)
  - 19. T., Dierks, , , S., Jagannathan, : ‘Optimal control of affine nonlinear continuous-time systems’, 2010 American Control Conf. (ACC), June 2010, pp. 1568–1573.
17. 17)
  - 37. Wang, D., Liu, D., Wei, Q.: ‘Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach’, Neurocomputing, 2012, 78, (1), pp. 14–22 (doi: 10.1016/j.neucom.2011.03.058).
18. 18)
  - 28. J., Lee, , , J., Park, , , Y., Choi, : ‘Approximate dynamic programming for continuous-time linear quadratic regulator problems: relaxation of known input-coupling matrix assumption’, IET Control Theory Appl., 2012, 6, (13), pp. 2063–2075 (doi: 10.1049/iet-cta.2010.0521).
19. 19)
  - 6. M., Abu-Khalaf, , , F.L., Lewis, : ‘Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach’, Automatica, 2005, 41, (5), pp. 779–791 (doi: 10.1016/j.automatica.2004.11.034).
20. 20)
  - 32. C., Li, , , D., Liu, , , H., Li, : ‘Finite horizon optimal tracking control of partially unknown linear continuous-time systems using policy iteration’, IET Control Theory Appl., 2015, 9, (12), pp. 1791–1801 (doi: 10.1049/iet-cta.2014.1325).
21. 21)
  - 3. W.B., Powell, : ‘Approximate dynamic programming: solving the curses of dimensionality’ (Wiley-Interscience, 2007).
22. 22)
  - 18. T., Dierks, , , S., Jagannathan, : ‘Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics’. Proc. 48th IEEE Conf. on Decision and Control, held jointly with the 28th Chinese Control Conf., December 2009 SEP, pp. 6750–6755.
23. 23)
  - 17. Lewis, F.L., Vrabie, D.: ‘Reinforcement learning and adaptive dynamic programming for feedback control’, IEEE Circuits Syst. Mag., 2009, 9, (3), pp. 32–50 (doi: 10.1109/MCAS.2009.933854).
24. 24)
  - 32. Luo, B., Wu, H.N., Huang, T., et al.: ‘Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design’, Automatica, 2014, 50, pp. 3281–3290 (doi: 10.1016/j.automatica.2014.10.056).
25. 25)
  - 10. Y., Zhu, , , D., Zhao, , , D., Liu, : ‘Convergence analysis and application of fuzzy-HDP for nonlinear discrete-time HJB systems’, Neurocomputing, 2015, 149, Part A, pp. 124–131 (doi: 10.1016/j.neucom.2013.11.055).
26. 26)
  - 5. J.J., Murray, , , C.J., Cox, , , G.G., Lendaris, , et al.: ‘Adaptive dynamic programming’, IEEE Tran. Syst. Man Cybern. C, Appl. Rev., 2002, 32, (2), pp. 140–153 (doi: 10.1109/TSMCC.2002.801727).
27. 27)
  - 37. Kamalapurkar, R., Dinh, H., Bhasin, S., et al.: ‘Approximate optimal trajectory tracking for continuous-time nonlinear systems’, Automatica, 2015, 51, pp. 40–48 (doi: 10.1016/j.automatica.2014.10.103).
28. 28)
  - 40. Modares, H., Lewis, F.L.: ‘Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning’, Automatica, 2014, 50, (7), pp. 1780–1792 (doi: 10.1016/j.automatica.2014.05.011).
29. 29)
  - 39. Modares, H., Lewis, F.L.: ‘Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning’, IEEE Trans. Autom. Control, 2014, 59, (11), pp. 3051–3056 (doi: 10.1109/TAC.2014.2317301).
30. 30)
  - 4. H., Zhang, , , D., Liu, , , Y., Luo, , et al.: ‘Adaptive dynamic programming for control. Algorithms and stability’ (Springer-Verlag, London, 2012).
31. 31)
  - 26. Jiang, Y., Jiang, Z.-P.: ‘Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics’, Automatica, 2012, 48, (10), pp. 2699–2704 (doi: 10.1016/j.automatica.2012.06.096).
32. 32)
  - 15. E., Alameda-Hernandez, , , D., Blanco, , , D., Ruiz, , et al.: ‘Optimal tracking of time-varying systems with the overdetermined recursive instrumental variable algorithm’, IET Control Theory Appl., 2007, 1, (1), pp. 291–297 (doi: 10.1049/iet-cta:20060260).
33. 33)
  - 34. Zhang, H., Cui, L., Zhang, X., Luo, Y.: ‘Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method’, IEEE Trans. Neural Netw., 2011, 22, (12), pp. 2226–2236 (doi: 10.1109/TNN.2011.2168538).
34. 34)
  - 11. J.Y., Lee, , , J.B., Park, , , Y.H., Choi, : ‘Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations’, IEEE Trans. Neural Netw. Learn. Syst., 2015, 26, (5), pp. 916–932 (doi: 10.1109/TNNLS.2014.2328590).
35. 35)
  - 1. R.S., Sutton, , , A.G., Barto, : ‘Reinforcement learning: an introduction’ (MIT Press, Cambridge, MA, 1998).

Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics

References

Related content