access icon free Adaptive output-feedback quadratic tracking control of continuous-time systems via value iteration with its application

This study is dedicated to develop an adaptive output-feedback (OPFB) tracking control scheme for continuous-time linear systems to achieve the infinite-horizon linear quadratic tracking (LQT) solution. The existence conditions of the OPFB control for LQT solution are proposed and an upper bound is found for the discount factor to assure the stability of the OPFB solution. To develop an online learning solution without knowing the system drift dynamics, a novel value iteration (VI) algorithm is presented based on the integral reinforcement learning technique which requires measured augmented system states. Moreover, a convergence analysis is proposed for the VI algorithm. Compared to the policy iteration method, the VI algorithm relaxes the initial stabilising control policy requirement. Specifically, in order to further obviate the knowledge of system states, a neural network-based adaptive observer is used during learning and it is no longer needed after the online learning algorithm converges. Effectiveness of the proposed learning scheme is illustrated through an interesting application into a single-phase grid-connected PV power inverter.

Inspec keywords: neurocontrollers; nonlinear control systems; closed loop systems; stability; photovoltaic power systems; discrete time systems; adaptive control; linear quadratic control; learning (artificial intelligence); continuous time systems; feedback; dynamic programming; control system synthesis; observers; optimal control; linear systems; iterative methods

Other keywords: neural network-based adaptive observer; LQT solution; VI algorithm; existence conditions; augmented system states; OPFB solution; continuous-time linear systems; online learning solution; initial stabilising control policy requirement; policy iteration method; integral reinforcement learning technique; adaptive output-feedback tracking control scheme; learning scheme; adaptive output-feedback quadratic tracking control; infinite-horizon linear quadratic tracking solution; OPFB control; continuous-time systems; value iteration; system drift dynamics

Subjects: Solar power stations and photovoltaic power systems; Linear control systems; Optimal control; Stability in control theory; Control system analysis and synthesis methods; Interpolation and function approximation (numerical analysis); Discrete control systems; Optimisation techniques; Self-adjusting control systems; Nonlinear control systems

References

    1. 1)
      • 2. Zhang, H., Wei, Q., Luo, Y.: ‘A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm’, IEEE Trans. Syst. Man Cybern. Part B (Cybern.), 2008, 38, (4), pp. 937942.
    2. 2)
      • 18. Sun, W., Zhao, G., Peng, Y.: ‘Adaptive optimal output feedback tracking control for unknown discrete-time linear systems using a combined reinforcement Q-learning and internal model method’, IET Control Theory Applic., 2019, 13, (18), pp. 30753086.
    3. 3)
      • 26. Li, X., Xue, L., Sun, C.: ‘Linear quadratic tracking control of unknown discrete-time systems using value iteration algorithm’, Neurocomputing, 2018, 314, pp. 8693.
    4. 4)
      • 27. Abdollahi, F., Talebi, H.A., Patel, R.V.: ‘A stable neural network-based observer with application to flexible-joint manipulators’, IEEE Trans. Neural Netw., 2006, 17, (1), pp. 118129.
    5. 5)
      • 13. Lewis, F.L., Vrabie, D.: ‘Reinforcement learning and adaptive dynamic programming for feedback control’, IEEE Circuits Syst. Mag., 2009, 9, (3), pp. 3250.
    6. 6)
      • 4. Wang, D., Liu, D., Wei, Q.: ‘Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach’, Neurocomputing, 2012, 78, (1), pp. 1422.
    7. 7)
      • 28. Abdollahi, F., Talebi, H.A., Patel, R.V.: ‘A stable neural network observer with application to flexible-joint manipulators’. Proc. of the 9th Int. Conf. on Neural Information Processing, Singapore, 2002, vol. 4, pp. 19101914.
    8. 8)
      • 29. Lewis, F.L., Vamvoudakis, K.G.: ‘Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data’, IEEE Trans. Syst. Man Cybern. Part B (Cybern.), 2011, 41, (1), pp. 1425.
    9. 9)
      • 20. Liu, D., Wei, Q.: ‘Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems’, IEEE Trans. Neural Netw. Learning Syst., 2014, 25, (3), pp. 621634.
    10. 10)
      • 15. Jiang, Y., Jiang, Z.: ‘Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics’, Automatica, 2012, 48, (10), pp. 26992704.
    11. 11)
      • 12. Wang, F., Zhang, H., Liu, D.: ‘Adaptive dynamic programming: an introduction’, IEEE Comput. Intell. Mag., 2009, 4, (2), pp. 3947.
    12. 12)
      • 8. He, W., Dong, Y.: ‘Adaptive fuzzy neural network control for a constrained robot using impedance learning’, IEEE Trans. Neural Netw. Learning Syst., 2018, 29, (4), pp. 11741186.
    13. 13)
      • 16. Modares, H., Lewis, F.L.: ‘Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning’, IEEE Trans. Autom. Control, 2014, 59, (11), pp. 30513056.
    14. 14)
      • 21. Zhu, L., Modares, H., Peen, G.O., et al: ‘Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning’, IEEE Trans. Control Syst. Technol., 2015, 23, (1), pp. 264273.
    15. 15)
      • 5. Huang, Y., Liu, D.: ‘Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative adp algorithm’, Neurocomputing, 2014, 125, pp. 4656.
    16. 16)
      • 6. Huang, J.: ‘Nonlinear output regulation: theory and applications’ (SIAM, Philadelphia, 2004).
    17. 17)
      • 30. Gokhale, K.P., Kawamura, A., Hoft, R.G.: ‘Dead beat microprocessor control of PWM inverter for sinusoidal output waveform synthesis’. IEEE Trans. Ind. Appl., 1987, IA-23, (5), pp. 901910.
    18. 18)
      • 14. Vrabie, D., Pastravanu, O., Abu-Khalaf, M., et al: ‘Adaptive optimal control for continuous-time linear systems based on policy iteration’, Automatica, 2009, 45, pp. 477484.
    19. 19)
      • 24. He, S., Song, J., Ding, Z., et al: ‘Online adaptive optimal control for continuous-time Markov jump linear systems using a novel policy iteration algorithm’, IET Control Theory Appl., 2015, 9, (10), pp. 15361543.
    20. 20)
      • 19. Jiang, Z., Jiang, Y.: ‘Robust adaptive dynamic programming for linear and nonlinear systems: an overview’, Eur. J. Control, 2013, 19, (5), pp. 417425.
    21. 21)
      • 17. Gao, W., Huang, M., Jiang, Z., et al: ‘Sampled-data-based adaptive optimal output-feedback control of a 2-degree-of-freedom helicopter’, IET Control Theory Applic., 2016, 10, (12), pp. 14401447.
    22. 22)
      • 1. Lewis, F.L., Vrabie, D.L., Syrmos, V.L.: ‘Optimal control’ (John Wiley & Sons, New York, USA, 2015, 3rd edn.).
    23. 23)
      • 22. Moghadam, R., Lewis, F.L.: ‘Output-feedback H quadratic tracking control of linear systems using reinforcement learning’, Int. J. Adapt. Control Signal Process., 2019, 33, (2), pp. 300313.
    24. 24)
      • 9. He, W., Meng, T., Huang, D., et al: ‘Adaptive boundary iterative learning control for an euler-bernoulli beam system with input constraint’, IEEE Trans. Neural Netw. Learning Syst., 2018, 29, (5), pp. 15391549.
    25. 25)
      • 10. Sutton, R.S., Barto, A.G.: ‘Reinforcement learning’ (MIT Press, Cambridge, MA, USA, 1998).
    26. 26)
      • 25. Song, J., He, S., Liu, F., et al: ‘Data-driven policy iteration algorithm for optimal control of continuous-time itô stochastic systems with Markovian jumps’, IET Control Theory Appl., 2016, 10, (12), pp. 14311439.
    27. 27)
      • 23. Rizvi, S.A.A., Lin, Z.: ‘Reinforcement learning-based linear quadratic regulation of continuous-time systems using dynamic output feedback’, IEEE Trans. Cybern., 2020, 50, (11), pp. 46704679, doi: 10.1109/TCYB.2018.2886735.
    28. 28)
      • 7. Ouyang, Y., He, W., Li, X.: ‘Reinforcement learning control of a single-link flexible robotic manipulator’, IET Control Theory Applic., 2017, 11, (9), pp. 14261433.
    29. 29)
      • 11. Powell, W.B.: ‘Approximate dynamic programming: solving the curses of dimensionality’ (John Wiley & Sons, 2011).
    30. 30)
      • 3. Zhang, H., Cui, L., Zhang, X., et al: ‘Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method’, IEEE Trans. Neural Netw., 2011, 22, (12), pp. 22262236.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cta.2020.0255
Loading

Related content

content/journals/10.1049/iet-cta.2020.0255
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading