Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon openaccess Time-in-action RL

The authors propose a novel reinforcement learning (RL) framework, where agent behaviour is governed by traditional control theory. This integrated approach, called time-in-action RL, enables RL to be applicable to many real-world systems, where underlying dynamics are known in their control theoretical formalism. The key insight to facilitate this integration is to model the explicit time function, mapping the state-action pair to the time accomplishing the action by its underlying controller. In their framework, they describe an action by its value (action value), and the time that it takes to perform (action time). An action-value results from the policy of RL regarding a state. Action time is estimated by an explicit time model learnt from the measured activities of the underlying controller. RL value network is then trained with embedded time model to predict action time. This approach is tested using a variant of Atari Pong and proved to be convergent.

References

    1. 1)
      • 22. Watkins, C.J., Dayan, P.: ‘Q-learning’, Mach. Learn., 1992, 8, (3–4), pp. 279292.
    2. 2)
      • 24. Bertsekas, D.P.: ‘Dynamic programming and optimal control’, vol. II, (Athena Scientific, Belmont, MA, USA, 2007).
    3. 3)
      • 14. Sutton, R.S.: ‘TD models: modeling the world at a mixture of time scales’. Proc. Int. Conf. on Machine Learnin, Tahoe City, USA, 1995, pp. 531539.
    4. 4)
      • 12. Sutton, R.S., Precup, D., Singh, S.: ‘Between MDPS and semi-MDPS: a frame-work for temporal abstraction in reinforcement learning’, Artif. Intell., 1999, 112, (1–2), pp. 181211.
    5. 5)
      • 13. Bradtke, S.J., Duff, M.O.: ‘Reinforcement learning methods for continuous-time Markov decision problems’. Advances in Neural Information Processing Systems, Denver, USA, 1995, pp. 393400.
    6. 6)
      • 23. Brockman, G., Cheung, V., Pettersson, L., et al: ‘OpenAI Gym’, 2016, arXiv preprint arXiv:1606.01540.
    7. 7)
      • 3. Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al: ‘Continuous control with deep reinforcement learning’, 2015, arXiv preprint arXiv:1509.02971.
    8. 8)
      • 9. Kawato, M., Furukawa, K., Suzuki, R.: ‘A hierarchical neural-network model for control and learning of voluntary movement’, Biol. Cybern., 1987, 57, (3), pp. 169185.
    9. 9)
      • 2. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Human-level control through deep reinforcement learning’, Nature, 2015, 518, (7540), pp. 529533.
    10. 10)
      • 18. Hauskrecht, M., Meuleau, N., Kaelbling, L.P., et al: ‘Hierarchical solution of Markov decision processes using macro-actions’. Proc. 14th Conf. Uncertainty in Artificial Intelligence, Madison, USA, 1998, pp. 220229.
    11. 11)
      • 1. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Playing Atari with deep reinforcement learning’, 2013, arXiv preprint arXiv:1312.5602.
    12. 12)
      • 25. Jaakkola, T., Jordan, M.I., Singh, S.P.: ‘On the convergence of stochastic iterative dynamic programming algorithms’, Neural Comput., 1993, 6, (6), pp. 11851201.
    13. 13)
      • 4. Schulman, J., Levine, S., Abbeel, P., et al: ‘Trust region policy optimization’. Int. Conf. Machine Learning, Lille, France, 2015, pp. 18891897.
    14. 14)
      • 21. Arulkumaran, K., Dilokthanakul, N., Shanahan, M., et al: ‘Classifying options for deep reinforcement learning’, 2016, arXiv preprint arXiv:1604.08153.
    15. 15)
      • 7. Åström, K.J., Hägglund, T.: ‘PID controllers: theory, design, and tuning’ vol. 2, (Instrument Society of America Research, Triangle Park, NC, 1995).
    16. 16)
      • 10. Rosenbaum, D.A., Kenny, S.B., Derr, M.A.: ‘Hierarchical control of rapid movement sequences’, J. Exp. Psychol. Human Percept. Perform., 1983, 9, (1), p. 86.
    17. 17)
      • 20. Kulkarni, T.D., Narasimhan, K., Saeedi, A., et al: ‘Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation’. Advances in Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 36753683.
    18. 18)
      • 16. Hernandez-Gardiol, N., Mahadevan, S.: ‘Hierarchical memory-based reinforcement learning’. Advances in Neural Information Processing Systems, Vancouver, Canada, 2001, pp. 10471053.
    19. 19)
      • 8. Han, J.: ‘From PID to active disturbance rejection control’, IEEE Trans. Ind. Electron., 2009, 56, (3), pp. 900906.
    20. 20)
      • 5. Schulman, J., Wolski, F., Dhariwal, P., et al: ‘Proximal policy optimization algorithms’, 2017, arXiv preprint arXiv:1707.06347.
    21. 21)
      • 17. Vezhnevets, A.S., Osindero, S., Schaul, T., et al: ‘Feudal networks for hierarchical reinforcement learning’, 2017, arXiv preprint arXiv:1703.01161.
    22. 22)
      • 6. Bellman, R., Glicksberg, I., Gross, O.: ‘On the ‘bang–bang’ control problem’, Q. Appl. Math., 1956, 14, (1), pp. 1118.
    23. 23)
      • 19. Durugkar, I.P., Rosenbaum, C., Dernbach, S., et al: ‘Deep reinforcement learning with macro-actions’, 2016, arXiv preprint arXiv:1606.04615.
    24. 24)
      • 15. Dietterich, T.G.: ‘Hierarchical reinforcement learning with the MAXQ value function decomposition’, J. Artif. Intell. Res., 2000, 13, pp. 227303.
    25. 25)
      • 11. Ivry, R.B., Keele, S.W.: ‘Timing functions of the cerebellum’, J. Cogn. Neurosci., 1989, 1, (2), pp. 136152.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-csr.2018.0001
Loading

Related content

content/journals/10.1049/iet-csr.2018.0001
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address