In this study, the authors develop a novel approach to a class of pursuit-evasion problems modelled in the form of discrete time feedback control systems, where the opposing parties have asymmetric capability. The authors assume the control policy of the evader, described as a random variable, is unknown to the pursuer. This pursuit-evasion problem is formulated as a quadratic optimisation problem from the perspective of the pursuer. Due to the curse of dimensionality, this pursuit-evasion problem cannot be practically solved by dynamic programming. In this study, the authors reformulate it as a multi-armed bandit problem. A heuristic policy based on the Gittins index is proposed to solve this problem, which can be computed based on a forward induction. Simulation results show the proposed policy outperforms a random decision policy.

References

1. 1)
  - 17. Sanchez-Lopez, J.L., Pestana, J., Collumeau, J.F., et al.: ‘A vision based aerial robot solution for the mission 7 of the International Aerial Robotics Competition’. 2015 Int. Conf. on Unmanned Aircraft Systems, (ICUAS), 2015, pp. 1391–1400.
2. 2)
  - 11. Gittins, J.C.: ‘Bandit processes and dynamic allocation indices’, J. R. Stat. Soc. B, Methods, 1979, 41, pp. 148–177.
3. 3)
  - 3. Nowakowski, R., Winkler, P.: ‘Vertex-to-vertex pursuit in a graph’, J. Discrete Math., 1983, 43, (2-3), pp. 235–239.
4. 4)
  - 12. Moore, J.B., Zhou, X., Lim, A.E.B.: ‘Discrete time LQG controls with control dependent noise’, Syst. Control Lett., 1999, 36, (3), pp. 199–206.
5. 5)
  - 6. Li, W.: ‘A dynamics perspective of pursuit-evasion: capturing and escaping when the pursuer runs faster than the agile evader’, IEEE Trans. Autom. Control, 2017, 62, (1), pp. 451–457.
6. 6)
  - 1. Basar, T., Olsder, G.J.: ‘Dynamic noncooperative game theory’ (SIAM, Philadelphia, 1999).
7. 7)
  - 8. Mahajan, A., Teneketzis, D.: ‘Multi-armed bandit problems’, Found. Appl. Sen. Mana., 2008, pp. 121–151.
8. 8)
  - 16. Pandelis, D.G., Teneketzis, D.: ‘On the optimality of the Gittins index rule for multi-armed bandits with multiple plays’, Math. Methods Oper. Res., 1999, 50, (3), pp. 449–461.
9. 9)
  - 18. Do, M.N., Vetterli, M.: ‘Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance’, IEEE Trans. Image Proc., 2002, 11, (2), pp. 146–158.
10. 10)
  - 4. Parsons, T.D.: ‘Pursuit-evasion in a graph’, Theory and Applications of Graphs, 1976(Lecture Notes in Mathematics), pp. 426–441.
11. 11)
  - 9. Bubeck, S., Cesa-Bianchi, N.: ‘Regret analysis of stochastic and nonstochastic multi-armed bandit problems’, Found. Tre. Mach. Lear., 2012, 5, (1), pp. 1–122.
12. 12)
  - 13. Judd, K.L.: ‘The law of large numbers with a continuum of iid random variables’, J. Econ. Theory, 1985, 35, (1), pp. 19–25.
13. 13)
  - 2. Ho, Y., Bryson, A., Baron, S.: ‘Differential games and optimal pursuit-evasion strategies’, IEEE Trans. Autom. Control, 1965, 10, (4), pp. 385–389.
14. 14)
  - 10. Liu, K., Zhao, Q.: ‘Distributed learning in multi-armed bandit with multiple players’, IEEE Trans. Signal Process., 2010, 58, (11), pp. 5667–5681.
15. 15)
  - 14. Weber, R.R.: ‘On the Gittins index for multiarmed bandits’, Ann. Probab., 1992, 2, (4), pp. 1024–1033.
16. 16)
  - 7. Gupta, A., Nayyar, A., Langbort, C., et al.: ‘Common information based Markov perfect equilibria for linear-Gaussian games with asymmetric information’, SIAM J. Control Optim., 2014, 52, (5), pp. 3228–3260.
17. 17)
  - 5. Sugihara, K., Suzuki, I.: ‘Optimal algorithms for a pursuit-evasion problem in grids’, SIAM J. Discrete Math., 1989, 2, (1), pp. 126–143.
18. 18)
  - 15. Varaiya, P., Walrand, J., Buyukkoc, C.: ‘Extensions of the multiarmed bandit problem: the discounted case’, IEEE Trans. Autom. Control, 1985, 30, (5), pp. 426–439.

Gittins index based control policy for a class of pursuit-evasion problems

References

Related content