© The Institution of Engineering and Technology
A novel hardware architecture for performing the core computations required by dynamic programming (DP) techniques is introduced. The latter pertain to a vast range of applications that necessitate an optimal sequence of decisions to be obtained. An underlying assumption is that a complete model of the environment is provided, whereby the dynamics are governed by a Markov decision process. Existing DP implementations have traditionally focused on software-based mechanisms. Here, the authors present a method for exploiting the inherent parallelism associated with computing both the value function and optimal policy. This allows for the optimal policy to be obtained several orders of magnitude faster than traditional software implementations, establishing the viability of the approach for demanding, real-time applications. The well-known rental car management problem has been studied as a benchmark for which a field-programmable gate array-based implementation was designed. The results highlight the advantages of the proposed approach with respect to the execution speed and the scalability properties.
References
-
-
1)
-
W. Usaha ,
J. Barria
.
Markov decision theory framework for resource allocation in leo satellite considerations.
IEE Proc. Commun.
,
56 ,
270 -
276
-
2)
-
Padberg, F.: `On the potential of process simulation in software project schedule optimization', 29thAnnual Int., Computer Software and Applications Conf., COMPSAC 2005, 2, p. 127–130.
-
3)
-
Sutton RS: On the significance of markov decision processes ICANN 1997 273–282.
-
4)
-
R. Bellman
.
(1957)
Dynamic programming.
-
5)
-
S. Kim ,
M.E. Lewis ,
C.C. White
.
Optimal vehicle routing with real-time traffic information.
IEEE Trans. Intell. Transp. Syst.
,
2 ,
178 -
188
-
6)
-
Haas, Z., Halpern, J.Y., Li, L.: `A decision-theoretic approach to resource allocation in wireless multimedia networks', Proc. 4th int. workshop Discrete algorithms and methods for mobile computing and communications DIALM '00, 2000, p. 86–95.
-
7)
-
‘Xilinx Virtex-4 technical documentation’, available at: http://www.xilinx.com.
-
8)
-
‘Altera Stratix II technical documentation’, available at: http://www.altera.com.
-
9)
-
R.S. Sutton ,
A.G. Barto
.
(1998)
Reinforecement learning: an introduction.
-
10)
-
Ferguson, D., Stentz, A.: `Focussed processing of mdps for path planning', 16thIEEE Int. Conf. Tools with Artificial Intelligence, ICTAI 2004, p. 310–317.
-
11)
-
Kotsalis, G., Dahleh, M.: `Model reduction of irreducible markov chains', Proc. 42nd IEEE Conf. Decision and Control, 2003, 6, p. 5727–5728.
-
12)
-
K. Katsikopoulos ,
S. Engelbrecht
.
Markov decision processes with delays and asynchronous cost collection.
IEEE Trans. Autom. Control
,
4 ,
568 -
574
-
13)
-
T. Javidi ,
D. Teneketzis
.
Sensitivity analysis for an optimal routing policy in an ad hoc wireless network.
IEEE Trans. Autom. Control
,
8 ,
1303 -
1316
-
14)
-
D.P. Bertsekas ,
J.N. Tsitsiklis
.
(1996)
Neuro-dynamic programming.
-
15)
-
Laroche, P., Charpillet, F., Schott, R.: `Mobile robotics planning using abstract Markov decision processes', Proc. 11th IEEE Int. Conf. Tools with Artificial Intelligence ICTAI '99, 1999, Washington, DC, USA, IEEE Computer Society, p. 299.
-
16)
-
Kang, J., Kolmanovsky, I., Grizzle, J.: `Approximate dynamic programming solutions for lean burn engine aftertreatment', 38thIEEE Conf. Decision and Control, 1999, 2, p. 1703–1708.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cdt_20070027
Related content
content/journals/10.1049/iet-cdt_20070027
pub_keyword,iet_inspecKeyword,pub_concept
6
6