© The Institution of Engineering and Technology
Motivated by the optimisation of network communication systems, this paper presents a hierarchical analytical model for event-driven switching control of stochastic dynamic systems. First, the model called semi-Markov switching state-space control processes is introduced. The semi-Markov kernel and equivalent infinitesimal generator are constructed to characterise the hierarchical dynamics, and the sensitivity formula for performance difference under average criterion is derived based on potential theory. Then, by exploiting the structure of dynamic hierarchy and the features of event-driven policy, an online adaptive optimisation algorithm that combines potentials estimation and policy iteration is proposed. The convergence of this algorithm is also proved. Finally, as an illustrative example, the dynamic service composition problem in service overlay networks is formulated and addressed. Simulation results demonstrate the effectiveness of the presented approach.
References
-
-
1)
-
R.M. Wheeler ,
K.S. Narendra
.
Decentralized learning in Markov chains.
IEEE Trans. Autom. Control
,
6 ,
519 -
526
-
2)
-
M.L. Puterman
.
(1994)
Markov decision processes: discrete stochastic dynamic programming.
-
3)
-
X. Gu ,
K. Nahrstedt
.
Distributed multimedia service composition with statistical QoS assurances.
IEEE Trans. Multimedia
,
1 ,
141 -
151
-
4)
-
A.G. Barto ,
S. Mahadevan
.
Recent advances in hierarchical reinforcement learning.
Discrete Event Dyn. Syst.
,
4 ,
341 -
379
-
5)
-
H.S. Chang ,
P.J. Fard ,
S.I. Marcus ,
M. Shayman
.
Multitime scale Markov decision processes.
IEEE Trans. Autom. Control
,
6 ,
976 -
987
-
6)
-
X.-R. Cao
.
The potential structure of sample paths and performance sensitivities of Markov systems.
IEEE Trans. Autom. Control
,
12 ,
2129 -
2142
-
7)
-
A. Gosavi ,
Tapas Das ,
S. Sudeep
.
A simulation-based learning automata framework for solving semi-Markov decision problems under long-run average reward.
IIE Trans.
,
6 ,
557 -
567
-
8)
-
X.-R. Cao
.
(2007)
Stochastic learning and optimization: a sensitivity-based approach.
-
9)
-
S.-P. Hsu
.
Continuous-time controlled Markov chains with safety upper bound.
IET Control Theory Appl.
,
2 ,
397 -
401
-
10)
-
Ren, Z.-Y., Krogh, B.H.: `State aggregate in Markov decision processes', Proc. 41st IEEE Conf. Decision and Control, 2002, Las Vegas, NV, USA, p. 3819–3824.
-
11)
-
H.-T. Fang ,
X.-R. Cao
.
Potential-based online policy iteration algorithms for Markov decision processes.
IEEE Trans. Autom. Control
,
4 ,
493 -
505
-
12)
-
X.-R. Cao ,
Z.-Y. Ren ,
S. Bhatnagar ,
M. Fu ,
S. Marcus
.
A time aggregation approach to Markov decision processes.
Automatica
,
6 ,
929 -
943
-
13)
-
L. Giovanini
.
Robust adaptive control using multiple modes, switching and tuning.
IET Control Theory Appl.
,
8 ,
2168 -
2178
-
14)
-
T.G. Dietterich
.
Hierarchical reinforcement learning with the MAXQ value function decomposition.
J. Artif. Intell. Res.
,
227 -
303
-
15)
-
Bannazadeh, H., Leon-Garcia, A.: `Allocating service to applications using Markov decision processes', Proc. IEEE Int. Conf. Service-Oriented Computing and Applications, 2007, p. 141–146.
-
16)
-
Y.-W. Wan ,
X.-R. Cao
.
The control of a two-level Markov decision process by time aggregation.
Automatica
,
3 ,
393 -
403
-
17)
-
X.-R. Cao ,
H.-F. Chen
.
Potentials perturbation realization, and sensitivity analysis of Markov processes.
IEEE Trans. Autom. Control
,
10 ,
1382 -
1393
-
18)
-
S. Bhatnagar ,
J.R. Panigrahi
.
Actor-critic algorithms for hierarchical Markov decision processes.
Automatica
,
10 ,
637 -
644
-
19)
-
Parr, R.: `Hierarchical control and learning for Markov decision processes', 1998, Phd, University of California, Berkeley, CA, USA.
-
20)
-
A. Gosavi
.
A reinforcement learning algorithm based on policy iteration for average reward: empirical results with yield management and convergence analysis.
Mach. Learn.
,
1 ,
5 -
29
-
21)
-
D.P. Bertsekas
.
(1995)
Dynamic programming and optimal control.
-
22)
-
X.-R. Cao
.
Semi-Markov decision problems and performance sensitivity analysis.
IEEE Trans. Autom. Control
,
5 ,
758 -
768
-
23)
-
R.S. Sutton ,
D. Precup ,
S. Singh
.
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning.
Artif. Intell.
,
181 -
211
-
24)
-
Ren, Z.-Y., Krogh, B.H.: `Switching control in multi-mode Markov decision processes', Proc. 40th IEEE Conf. on Decision and Control, 2001, Orlando, FL, USA, p. 2095–2101.
-
25)
-
X.-R. Cao
.
Basic ideas for event-based optimization of Markov systems.
Discrete Event Dyn. Syst., Theory Appl.
,
2 ,
169 -
197
-
26)
-
C. Yang ,
Z.-H. Guan ,
J. Huang ,
T. Qian
.
Design of stochastic switching controller of networked control systems based on greedy algorithm.
IET Control Theory Appl.
,
1 ,
164 -
172
-
27)
-
Q. Jiang ,
H.-S. Xi ,
B.-Q. Yin
.
Adaptive optimization of timeout policy for dynamic power management based on semi-Markov control processes.
IET Control Theory Appl.
,
10 ,
1945 -
1958
-
28)
-
G.-P. Dai ,
B.-Q. Yin ,
Y.-J. Li ,
H.-S. Xi
.
Performance optimization algorithms based on potentials for semi-Markov control processes.
Int. J. Control
,
11 ,
801 -
812
-
29)
-
J. Forestier ,
P. Varaiya
.
Multilayer control of large Markov chains.
IEEE Trans. Autom. Control
,
2 ,
298 -
305
-
30)
-
W.L. Cooper ,
S.G. Henderson ,
M.E. Lewis
.
Convergence of simulation-based policy iteration.
Probab. Eng. Inf. Sci.
,
213 -
234
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cta.2011.0174
Related content
content/journals/10.1049/iet-cta.2011.0174
pub_keyword,iet_inspecKeyword,pub_concept
6
6