This study studies the online adaptive optimal control problems for a class of continuous-time Markov jump linear systems (MJLSs) based on a novel policy iteration algorithm. By utilising a new decoupling technique named subsystems transformation, the authors re-construct the MJLSs and a set of new coupled systems composed of N subsystems are obtained. The online policy iteration algorithm was used to solve the coupled algebraic matrix Riccati equations with partial knowledge regarding to the system dynamics, and the relevant optimal controllers equivalent to the investigated MJLSs are designed. Moreover, the convergence of the novel policy iteration algorithm is also established. Finally, a simulation example is given to illustrate the effectiveness and applicability of the proposed approach.

References

1. 1)
  - F.Y. Wang , H. Zhang , D. Liu . Adaptive dynamic programming: an introduction. IEEE Computat. Intell. Mag. , 2 , 39 - 47
2. 2)
  - D. Vrabie , O. Pastravanu , M. Abu-Khalaf , F.L. Lewis . Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica , 2 , 477 - 484
3. 3)
  - 32. Gajic, Z., Losada, R.: ‘Monotonicity of algebraic Lyapunov iterations for optimal control of jump parameter linear systems’, Syst. Control Lett., 2000, 41, (3), pp. 175–181 (doi: 10.1016/S0167-6911(00)00051-7).
4. 4)
  - D. Kleinman . On the iterative technique for Riccati equation computations. IEEE Trans. Autom. Control , 1 , 114 - 115
5. 5)
  - W.M. Wonham . On a matrix Riccati equation of stochastic control. SIAM J. Contr. , 681 - 697
6. 6)
  - 31. Li, Z., Zhou, B., Lam, J., Wang, Y.: ‘Positive operator based iterative algorithms for solving Lyapunov equations for Itô stochastic systems with Markovian jumps’, Appl. Math. Comput., 2011, 217, (21), pp. 8179–8195 (doi: 10.1016/j.amc.2011.01.031).
7. 7)
  - 17. Krasovskii, N.N., Lidskii, E.A.: ‘Analytical design of controllers in systems with random attributes’, 1961, 22, (1–3), pp. 1021–1025.
8. 8)
  - G.H. Golub , S. Nash , Loan C.F. Van . A Hessenberg-Schur method for the matrix problem AX+XB=C. IEEE Trans. Autom. Control , 6 , 909 - 913
9. 9)
  - 42. Damm, T., Hinrichsen, D.: ‘Newton's method for a rational matrix equation occurring in stochastic control’, Linear Algebr. Appl., 2001, 332, pp. 81–109 (doi: 10.1016/S0024-3795(00)00144-0).
10. 10)
  - 9. Jiang, Y., Jiang, Z.: ‘Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics’, Automatica, 2012, 48, (10), pp. 2699–2704 (doi: 10.1016/j.automatica.2012.06.096).
11. 11)
  - 43. Lee, J.Y., Park, J.B., Choi, Y.H.: ‘Policy-iteration-based adaptive optimal control for uncertain continuous-time linear systems with excitation signals’. Int. Conf. on Control, Automation and Systems, 2010, pp. 646–651.
12. 12)
  - H. Zhang , Q. Wei , Y. Luo . A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans. Syst. Man Cybern. B, Cybern. , 4 , 937 - 942
13. 13)
  - 10. Costa, O.L.V., Fragoso, M.D.: ‘Discrete-time LQ-optimal control-problems for infinite Markov jump parameter-systems’, IEEE Trans. Autom. Control, 1995, 40, (12), pp. 2076–2088 (doi: 10.1109/9.478328).
14. 14)
  - 71. Zhang, H., Lewis, F.: ‘Adaptive cooperative tracking control of higher-order nonlinear systems with unknown dynamics’, Automatica, 2012, 48, (7), pp. 1432–1439 (doi: 10.1016/j.automatica.2012.05.008).
15. 15)
  - 6. Vrabie, D., Lewis, F.: ‘Online adaptive optimal control based on reinforcement learning’, Optimization and Optimal Control, (SpringerNew York, 2010), pp. 309–323.
16. 16)
  - 5. Howard, R.A.: ‘Dynamic programming and Markov processes’ (MIT Press, Cambridge, MA, 1960).
17. 17)
  - 7. Borno, I.: ‘Parallel computation of the solutions of coupled algebraic Lyapunov equations’, Automatica, 1995, 31, pp. 1345–1347 (doi: 10.1016/0005-1098(95)00037-W).
18. 18)
  - 2. Gajic, Z., Borno, I.: ‘Lyapunov iterations for optimal control of jump linear systems at steady state’, IEEE Trans. Autom. Control, 1995, 40, (11), pp. 1971–1975 (doi: 10.1109/9.471227).
19. 19)
  - F.L. Lewis , D. Vrabie . Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. , 3 , 32 - 50
20. 20)
  - 10. Wu, H., Luo, B.: ‘Simultaneous policy update algorithms for learning the solution of linear continuous-time H∞ state feedback control’, Inf. Sci., 2013, 222, pp. 472–485 (doi: 10.1016/j.ins.2012.08.012).
21. 21)
  - Y. Ji , H.J. Chizeck . Controllability, stabilisability, and continuous-time Markovian jump linear quadratic control. IEEE Trans. Autom. Control , 777 - 788
22. 22)
  - 33. Sworder, D.D.: ‘Feedback control of a class of linear systems with jump parameters’, IEEE Trans. Autom. Contol, 1969, 14, (1), pp. 9–14 (doi: 10.1109/TAC.1969.1099088).
23. 23)
  - 19. Robinson, V., Sworder, D.: ‘A computational algorithm for design of regulators for linear jump parameter systems’, IEEE Trans. Autom. Control, 1974, 19, (1), pp. 47–49 (doi: 10.1109/TAC.1974.1100454).
24. 24)
  - 50. Chen, B., Niu, Y., Zou, Y.: ‘Sliding mode control for stochastic Markovian jumping systems with incomplete transition rate’, IET Control Theory Appl., 2013, 7, (10), pp. 1330–1338 (doi: 10.1049/iet-cta.2013.0083).
25. 25)
  - 36. Salama, A., Gourishankar, V.: ‘A computational algorithm for solving a system of coupled algebraic matrix Riccati equations’, IEEE Trans. Comput., 1974, 23, (1), pp. 100–102 (doi: 10.1109/T-C.1974.223788).
26. 26)
  - 9. Costa, O.L.V., do Val, J.B.R., Geromel, J.C.: ‘Continuous-time state-feedback H2-control of Markovian jump linear systems via convex analysis’, Automatica, 1999, 35, pp. 259–268 (doi: 10.1016/S0005-1098(98)00145-9).
27. 27)
  - P. He , S. Jagannathan . Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints. IEEE Trans. Syst. Man Cybern. B, Cybern. , 2 , 425 - 436
28. 28)
  - 28. Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K.G., Lewis, F.L., Dixon, W.E.: ‘A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems’, Automatica, 2013, 49, (1), pp. 82–92 (doi: 10.1016/j.automatica.2012.09.019).
29. 29)
  - 29. Ivanov, I.G.: ‘On some iterations for optimal control of jump linear equations’, Nonlinear Anal. Theory Method Appl., 2008, 69, (11), pp. 4012–4024 (doi: 10.1016/j.na.2007.10.034).
30. 30)
  - M. Abu-Khalaf , F.L. Lewis . Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica , 5 , 779 - 791
31. 31)
  - 35. Modares, H., Lewis, F.: ‘Online solution to the linear quadratic tracking problem of continuous-time systems using reinforcement learning’. IEEE 52nd Annual Conf. on Decision and Control (CDC), 2013, pp. 3851–3856.
32. 32)
  - D. Vrabie , F.L. Lewis . Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. , 3 , 237 - 246
33. 33)
  - L.X. Zhang , E.K. Boukas . Stability and stabilization of Markovian jump linear systems with partly unknown transition probabilities. Automatica , 463 - 468
34. 34)
  - 2. Sutton, R.S., Barto, A.G.: ‘Reinforcement learning: an introduction’ (MIT Press, Cambridge MA, 1998).
35. 35)
  - 16. Vamvoudakis, K.G., Lewis, F.: ‘Online solution of nonlinear two-player zero-sum games using synchronous policy iteration’, Int. J. Robust Nonlinear Control, 2012, 22, (13), pp. 1460–1483 (doi: 10.1002/rnc.1760).
36. 36)
  - 39. Bartels, R.H., Stewart, G.W.: ‘Solution of the matrix equation AX + BX = C’, Commun. ACM, 1972, 15, (9), pp. 820–826 (doi: 10.1145/361573.361582).
37. 37)
  - V. Dragan , T. Morozan . The linear quadratic optimization problems for a class of linear stochastic systems with multiplicative white noise and Markovian jumping. IEEE Trans. Autom. Control , 665 - 675
38. 38)
  - 16. Borno, I., Gajic, Z.: ‘Parallel algorithm for solving coupled algebraic Lyapunov equations of discrete-time jump linear systems’, Comput. Math. Appl., 1995, 30, (7), pp. 1–4 (doi: 10.1016/0898-1221(95)00119-J).
39. 39)
  - 41. Gajic, Z., Borno, I.: ‘General transformation for block diagonalization of weakly coupled linear systems composed of N-subsystems’, IEEE Tans. Circuits Syst. I, Fundam. Theory Appl., 2000, 47, (6), pp. 909–912 (doi: 10.1109/81.852944).
40. 40)
  - 17. Chen, B., Niu, Y., Zou, Y.: ‘Adaptive sliding mode control for stochastic Markovian jumping systems with actuator degradation’, Automatica, 2013, 49, pp. 1748–1754 (doi: 10.1016/j.automatica.2013.02.014).
41. 41)
  - 30. Zhou, B., Lam, J., Duan, G.: ‘Convergence of gradient-based iterative solution of coupled Markovian jump Lyapunov equations’, Comput. Math. Appl., 2008, 56, (12), pp. 3070–3078 (doi: 10.1016/j.camwa.2008.07.037).
42. 42)
  - J.J. Murray , C.J. Cox , G.G. Lendaris , R. Saeks . Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. B , 2 , 140 - 153
43. 43)
  - A. Al-Tamimi , F.L. Lewis , M.A. Khalaf . Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans. Syst. Man Cybern , 4 , 943 - 949

Online adaptive optimal control for continuous-time Markov jump linear systems using a novel policy iteration algorithm

References

Related content