© The Institution of Engineering and Technology
In this study, the authors develop a novel approach to a class of pursuit-evasion problems modelled in the form of discrete time feedback control systems, where the opposing parties have asymmetric capability. The authors assume the control policy of the evader, described as a random variable, is unknown to the pursuer. This pursuit-evasion problem is formulated as a quadratic optimisation problem from the perspective of the pursuer. Due to the curse of dimensionality, this pursuit-evasion problem cannot be practically solved by dynamic programming. In this study, the authors reformulate it as a multi-armed bandit problem. A heuristic policy based on the Gittins index is proposed to solve this problem, which can be computed based on a forward induction. Simulation results show the proposed policy outperforms a random decision policy.
References
-
-
1)
-
17. Sanchez-Lopez, J.L., Pestana, J., Collumeau, J.F., et al.: ‘A vision based aerial robot solution for the mission 7 of the International Aerial Robotics Competition’. 2015 Int. Conf. on Unmanned Aircraft Systems, (ICUAS), 2015, pp. 1391–1400.
-
2)
-
11. Gittins, J.C.: ‘Bandit processes and dynamic allocation indices’, J. R. Stat. Soc. B, Methods, 1979, 41, pp. 148–177.
-
3)
-
3. Nowakowski, R., Winkler, P.: ‘Vertex-to-vertex pursuit in a graph’, J. Discrete Math., 1983, 43, (2-3), pp. 235–239.
-
4)
-
12. Moore, J.B., Zhou, X., Lim, A.E.B.: ‘Discrete time LQG controls with control dependent noise’, Syst. Control Lett., 1999, 36, (3), pp. 199–206.
-
5)
-
6. Li, W.: ‘A dynamics perspective of pursuit-evasion: capturing and escaping when the pursuer runs faster than the agile evader’, IEEE Trans. Autom. Control, 2017, 62, (1), pp. 451–457.
-
6)
-
1. Basar, T., Olsder, G.J.: ‘Dynamic noncooperative game theory’ (SIAM, Philadelphia, 1999).
-
7)
-
8. Mahajan, A., Teneketzis, D.: ‘Multi-armed bandit problems’, Found. Appl. Sen. Mana., 2008, pp. 121–151.
-
8)
-
16. Pandelis, D.G., Teneketzis, D.: ‘On the optimality of the Gittins index rule for multi-armed bandits with multiple plays’, Math. Methods Oper. Res., 1999, 50, (3), pp. 449–461.
-
9)
-
18. Do, M.N., Vetterli, M.: ‘Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance’, IEEE Trans. Image Proc., 2002, 11, (2), pp. 146–158.
-
10)
-
4. Parsons, T.D.: ‘Pursuit-evasion in a graph’, Theory and Applications of Graphs, 1976, pp. 426–441.
-
11)
-
9. Bubeck, S., Cesa-Bianchi, N.: ‘Regret analysis of stochastic and nonstochastic multi-armed bandit problems’, Found. Tre. Mach. Lear., 2012, 5, (1), pp. 1–122.
-
12)
-
13. Judd, K.L.: ‘The law of large numbers with a continuum of iid random variables’, J. Econ. Theory, 1985, 35, (1), pp. 19–25.
-
13)
-
2. Ho, Y., Bryson, A., Baron, S.: ‘Differential games and optimal pursuit-evasion strategies’, IEEE Trans. Autom. Control, 1965, 10, (4), pp. 385–389.
-
14)
-
10. Liu, K., Zhao, Q.: ‘Distributed learning in multi-armed bandit with multiple players’, IEEE Trans. Signal Process., 2010, 58, (11), pp. 5667–5681.
-
15)
-
14. Weber, R.R.: ‘On the Gittins index for multiarmed bandits’, Ann. Probab., 1992, 2, (4), pp. 1024–1033.
-
16)
-
7. Gupta, A., Nayyar, A., Langbort, C., et al.: ‘Common information based Markov perfect equilibria for linear-Gaussian games with asymmetric information’, SIAM J. Control Optim., 2014, 52, (5), pp. 3228–3260.
-
17)
-
5. Sugihara, K., Suzuki, I.: ‘Optimal algorithms for a pursuit-evasion problem in grids’, SIAM J. Discrete Math., 1989, 2, (1), pp. 126–143.
-
18)
-
15. Varaiya, P., Walrand, J., Buyukkoc, C.: ‘Extensions of the multiarmed bandit problem: the discounted case’, IEEE Trans. Autom. Control, 1985, 30, (5), pp. 426–439.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cta.2017.0398
Related content
content/journals/10.1049/iet-cta.2017.0398
pub_keyword,iet_inspecKeyword,pub_concept
6
6