Value iteration for continuous-time systems

Access Full Text

Value iteration for continuous-time systems

For access to this article, please select a purchase option:

Buy chapter PDF
£10.00
(plus tax if applicable)
Buy Knowledge Pack
10 chapters for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Author(s): Draguna Vrabie ; Kyriakos G. Vamvoudakis ; Frank L. Lewis
Source: Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles,2012
Publication date January 2012

The idea of value iteration has been applied to the online learning of optimal controllers for discrete-time (DT) systems for many years. In the work of Werbos (1974, 1989, 1991, 1992, 2009) a family of DT learning control algorithms based on value iteration ideas has been developed. These techniques are known as approximate dynamic programming or adaptive dynamic programming (ADP). ADP includes heuristic dynamic programming (HDP) (which is value iteration), dual heuristic programming and action-based variants of those algorithms, which are equivalent to Q learning for DT dynamical system xk+1 = f (xk) + g(xk)uk. Value iteration algorithms rely on the special form of the DT Bellman equation V(xk) = r(xk, uk) + yV(xk+1), with r(xk, uk) the utility or stage cost of the value function. This equation has two occurrences of the value function evaluated at two times k and k + 1 and does not depend on the system dynamics f (xk),g(xk).

Chapter Contents:

  • 6.1 Continuous-time heuristic dynamic programming for the LQR problem
  • 6.1.1 Continuous-time HDP formulation using integral reinforcement learning
  • 6.1.2 Online tuning value iteration algorithm for partially unknown systems
  • 6.2 Mathematical formulation of the HDP algorithm
  • 6.3 Simulation results for online CT-HDP design
  • 6.3.1 System model and motivation
  • 6.3.2 Simulation setup and results
  • 6.3.3 Comments on the convergence of CT-HDP algorithm
  • 6.4 Conclusion

Inspec keywords: iterative methods; discrete time systems; learning systems; optimal control; dynamic programming; continuous time systems

Other keywords: DT systems; ADP; optimal controllers; dual heuristic programming; online learning; approximate dynamic programming; HDP; value iteration algorithm; heuristic dynamic programming; Q learning; adaptive dynamic programming; action-based variants; DT learning control algorithms; continuous-time systems; DT Bellman equation; discrete-time systems; DT dynamical system

Subjects: Discrete control systems; Self-adjusting control systems; Optimisation techniques; Optimal control; Interpolation and function approximation (numerical analysis)

Preview this chapter:
Zoom in
Zoomout

Value iteration for continuous-time systems, Page 1 of 2

| /docserver/preview/fulltext/books/ce/pbce081e/PBCE081E_ch6-1.gif /docserver/preview/fulltext/books/ce/pbce081e/PBCE081E_ch6-2.gif

Related content

content/books/10.1049/pbce081e_ch6
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading