access icon free Gittins index based control policy for a class of pursuit-evasion problems

In this study, the authors develop a novel approach to a class of pursuit-evasion problems modelled in the form of discrete time feedback control systems, where the opposing parties have asymmetric capability. The authors assume the control policy of the evader, described as a random variable, is unknown to the pursuer. This pursuit-evasion problem is formulated as a quadratic optimisation problem from the perspective of the pursuer. Due to the curse of dimensionality, this pursuit-evasion problem cannot be practically solved by dynamic programming. In this study, the authors reformulate it as a multi-armed bandit problem. A heuristic policy based on the Gittins index is proposed to solve this problem, which can be computed based on a forward induction. Simulation results show the proposed policy outperforms a random decision policy.

Inspec keywords: discrete time systems; feedback; quadratic programming; probability

Other keywords: quadratic optimisation problem; forward induction; pursuit-evasion problems; asymmetric capability; Gittins index based control policy; heuristic policy; evader control policy; discrete time feedback control systems; random decision policy; random variable

Subjects: Discrete control systems; Other topics in statistics; Optimisation techniques

References

    1. 1)
      • 17. Sanchez-Lopez, J.L., Pestana, J., Collumeau, J.F., et al.: ‘A vision based aerial robot solution for the mission 7 of the International Aerial Robotics Competition’. 2015 Int. Conf. on Unmanned Aircraft Systems, (ICUAS), 2015, pp. 13911400.
    2. 2)
      • 11. Gittins, J.C.: ‘Bandit processes and dynamic allocation indices’, J. R. Stat. Soc. B, Methods, 1979, 41, pp. 148177.
    3. 3)
      • 3. Nowakowski, R., Winkler, P.: ‘Vertex-to-vertex pursuit in a graph’, J. Discrete Math., 1983, 43, (2-3), pp. 235239.
    4. 4)
      • 12. Moore, J.B., Zhou, X., Lim, A.E.B.: ‘Discrete time LQG controls with control dependent noise’, Syst. Control Lett., 1999, 36, (3), pp. 199206.
    5. 5)
      • 6. Li, W.: ‘A dynamics perspective of pursuit-evasion: capturing and escaping when the pursuer runs faster than the agile evader’, IEEE Trans. Autom. Control, 2017, 62, (1), pp. 451457.
    6. 6)
      • 1. Basar, T., Olsder, G.J.: ‘Dynamic noncooperative game theory’ (SIAM, Philadelphia, 1999).
    7. 7)
      • 8. Mahajan, A., Teneketzis, D.: ‘Multi-armed bandit problems’, Found. Appl. Sen. Mana., 2008, pp. 121151.
    8. 8)
      • 16. Pandelis, D.G., Teneketzis, D.: ‘On the optimality of the Gittins index rule for multi-armed bandits with multiple plays’, Math. Methods Oper. Res., 1999, 50, (3), pp. 449461.
    9. 9)
      • 18. Do, M.N., Vetterli, M.: ‘Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance’, IEEE Trans. Image Proc., 2002, 11, (2), pp. 146158.
    10. 10)
      • 4. Parsons, T.D.: ‘Pursuit-evasion in a graph’, Theory and Applications of Graphs, 1976(Lecture Notes in Mathematics), pp. 426441.
    11. 11)
      • 9. Bubeck, S., Cesa-Bianchi, N.: ‘Regret analysis of stochastic and nonstochastic multi-armed bandit problems’, Found. Tre. Mach. Lear., 2012, 5, (1), pp. 1122.
    12. 12)
      • 13. Judd, K.L.: ‘The law of large numbers with a continuum of iid random variables’, J. Econ. Theory, 1985, 35, (1), pp. 1925.
    13. 13)
      • 2. Ho, Y., Bryson, A., Baron, S.: ‘Differential games and optimal pursuit-evasion strategies’, IEEE Trans. Autom. Control, 1965, 10, (4), pp. 385389.
    14. 14)
      • 10. Liu, K., Zhao, Q.: ‘Distributed learning in multi-armed bandit with multiple players’, IEEE Trans. Signal Process., 2010, 58, (11), pp. 56675681.
    15. 15)
      • 14. Weber, R.R.: ‘On the Gittins index for multiarmed bandits’, Ann. Probab., 1992, 2, (4), pp. 10241033.
    16. 16)
      • 7. Gupta, A., Nayyar, A., Langbort, C., et al.: ‘Common information based Markov perfect equilibria for linear-Gaussian games with asymmetric information’, SIAM J. Control Optim., 2014, 52, (5), pp. 32283260.
    17. 17)
      • 5. Sugihara, K., Suzuki, I.: ‘Optimal algorithms for a pursuit-evasion problem in grids’, SIAM J. Discrete Math., 1989, 2, (1), pp. 126143.
    18. 18)
      • 15. Varaiya, P., Walrand, J., Buyukkoc, C.: ‘Extensions of the multiarmed bandit problem: the discounted case’, IEEE Trans. Autom. Control, 1985, 30, (5), pp. 426439.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cta.2017.0398
Loading

Related content

content/journals/10.1049/iet-cta.2017.0398
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading