Value iteration for continuous-time systems

Author(s): Draguna Vrabie ; Kyriakos G. Vamvoudakis ; Frank L. Lewis
DOI: 10.1049/PBCE081E_ch6

For access to this article, please select a purchase option:

Buy chapter PDF

Buy Knowledge Pack

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership

Recommend Title Publication to library

Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Author(s): Draguna Vrabie ; Kyriakos G. Vamvoudakis ; Frank L. Lewis

Source: Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles,2012

Publication date January 2012

The idea of value iteration has been applied to the online learning of optimal controllers for discrete-time (DT) systems for many years. In the work of Werbos (1974, 1989, 1991, 1992, 2009) a family of DT learning control algorithms based on value iteration ideas has been developed. These techniques are known as approximate dynamic programming or adaptive dynamic programming (ADP). ADP includes heuristic dynamic programming (HDP) (which is value iteration), dual heuristic programming and action-based variants of those algorithms, which are equivalent to Q learning for DT dynamical system xk+1 = f (xk) + g(xk)uk. Value iteration algorithms rely on the special form of the DT Bellman equation V(xk) = r(xk, uk) + yV(xk+1), with r(xk, uk) the utility or stage cost of the value function. This equation has two occurrences of the value function evaluated at two times k and k + 1 and does not depend on the system dynamics f (xk),g(xk).

Chapter Contents:

6.1 Continuous-time heuristic dynamic programming for the LQR problem
6.1.1 Continuous-time HDP formulation using integral reinforcement learning
6.1.2 Online tuning value iteration algorithm for partially unknown systems
6.2 Mathematical formulation of the HDP algorithm
6.3 Simulation results for online CT-HDP design
6.3.1 System model and motivation
6.3.2 Simulation setup and results
6.3.3 Comments on the convergence of CT-HDP algorithm
6.4 Conclusion

Inspec keywords: iterative methods; discrete time systems; learning systems; optimal control; dynamic programming; continuous time systems

Other keywords: DT systems; ADP; optimal controllers; dual heuristic programming; online learning; approximate dynamic programming; HDP; value iteration algorithm; heuristic dynamic programming; Q learning; adaptive dynamic programming; action-based variants; DT learning control algorithms; continuous-time systems; DT Bellman equation; discrete-time systems; DT dynamical system

Subjects: Discrete control systems; Self-adjusting control systems; Optimisation techniques; Optimal control; Interpolation and function approximation (numerical analysis)

Book DOI: 10.1049/PBCE081E
Chapter DOI: 10.1049/PBCE081E_ch6
ISBN: 9781849194891
e-ISBN: 9781849194907

Preview this chapter:

Value iteration for continuous-time systems, Page 1 of 2

< Previous page | Next page > /docserver/preview/fulltext/books/ce/pbce081e/PBCE081E_ch6-1.gif /docserver/preview/fulltext/books/ce/pbce081e/PBCE081E_ch6-2.gif

Value iteration for continuous-time systems

Value iteration for continuous-time systems

Buy chapter PDF

Buy Knowledge Pack

Thank you

Related content