Reinforcement Q-learning algorithm for the optimal tracking control problem with unknown dynamics and delays is proposed. Traditional reinforcement learning methods require an accurate system model, which is avoided by means of the Q-learning method. This is very meaningful in practical implementation because all or part of the model of the system is often difficult to obtain or requires an additional high cost. First, the augmented system composed of the original system and reference trajectory is constructed, then the corresponding augmented linear quadratic tracking (LQT) Bellman equation is derived. Based on this, the reinforcement Q-learning algorithm is presented at the end. To implement this method, the iteration equations are solved online by using the least squares technique.

References

1. 1)
  - 31. Lewis, F.L., Vamvoudakis, K.G.: ‘Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data’, IEEE Trans. Syst. Man Cybern.B, Cybern., 2011, 41, (1), pp. 14–25 (doi: 10.1109/TSMCB.2010.2043839).
2. 2)
  - 11. Zhang, H., Luo, Y., Liu, D.: ‘Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints’, IEEE Trans. Neural Netw., 2009, 20, (9), pp. 1490–1503 (doi: 10.1109/TNN.2009.2027233).
3. 3)
  - 37. Zhang, H., Zhang, J., Yang, G.H., et al.: ‘Leader-based optimal coordination control for the consensus problem of multi-agent differential games via fuzzy adaptive dynamic programming’, IEEE Trans. Fuzzy Syst., 2014, 23, pp. 152–163 (doi: 10.1109/TFUZZ.2014.2310238).
4. 4)
  - 12. Liu, Y., Zhang, H.G., Luo, Y.H., et al: ‘ADP based optimal tracking control for a class of linear discrete-time system with multiple delays’, J. Franklin Inst., 2016, 353, (9), pp. 2117–2136 (doi: 10.1016/j.jfranklin.2016.03.012).
5. 5)
  - 17. Zhang, H., Qin, C., Luo, Y.: ‘Neural-network-based constrained optimal control scheme for discrete-time switched nonlinear system using dual heuristic programming’, IEEE Trans. Autom. Sci. Eng., 2014, 11, (3), pp. 839–849 (doi: 10.1109/TASE.2014.2303139).
6. 6)
  - 3. Zhang, H., Wei, Q., Luo, Y.: ‘A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy hdp iteration algorithm’, IEEE Trans. Syst. Man Cybern. B, Cybern., 2008, 38, (4), pp. 937–942 (doi: 10.1109/TSMCB.2008.920269).
7. 7)
  - 5. Zhang, H., Wei, Q., Liu, D.: ‘An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games’, Automatica, 2001, 47, (1), pp. 207–214 (doi: 10.1016/j.automatica.2010.10.033).
8. 8)
  - 41. Kiumarsi, B., Lewis, F.L., Modares, H., Karimpour, A., Naghibi-Sistani, M.: ‘Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics’, Automatica, 2014, 50, (4), pp. 1167–1175 (doi: 10.1016/j.automatica.2014.02.015).
9. 9)
  - 1. Zhang, H.G., Liu, D.R., Luo, Y.H., et al: ‘Adaptive dynamic programming for control: algorithms and stability’ (Springer-Verlag, London, UK, 2013).
10. 10)
  - 8. Oh, J.H., Lee, B.H.: ‘Dynamic programming approach to visual place recognition in changing environments’, Electron. Lett., 2017, 53, (6), pp. 391–393 (doi: 10.1049/el.2017.0037).
11. 11)
  - 5. Jiang, H., Luo, Y.H.: ‘Data-driven approximate optimal tracking control schemes for unknown non-affine non-linear multi-player systems via adaptive dynamic programming’, Electron. Lett., 2017, 53, (7), pp. 465–467 (doi: 10.1049/el.2016.4756).
12. 12)
  - 34. Zhang, H., Cui, L., Zhang, X., Luo, Y.: ‘Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method’, IEEE Trans. Neural Netw., 2011, 22, (12), pp. 2226–2236 (doi: 10.1109/TNN.2011.2168538).
13. 13)
  - 2. Wei, Q.L., Lewis, F.L., Sun, Q.Y., et al: ‘Discrete-time deterministic Q-learning: a novel convergence analysis’, Trans. Cybern., 2017, 47, (5), pp. 1224–1237 (doi: 10.1109/TCYB.2016.2542923).
14. 14)
  - 8. H., Zhang, , , L., Cui, , , Y., Luo, : ‘Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network adp’, IEEE Trans. Cybern., 2013, 43, (1), pp. 206–216 (doi: 10.1109/TSMCB.2012.2203336).
15. 15)
  - 13. Olivieri, M., Menichelli, F., Mastrandrea, A.: ‘Optimal pipeline stage balancing in the presence of large isolated interconnect delay’, Electron. Lett., 2017, 53, (4), pp. 229–231 (doi: 10.1049/el.2016.4262).

Model-free optimal tracking control for discrete-time system with delays using reinforcement Q-learning

References

Related content