http://iet.metastore.ingenta.com
1887

access icon openaccess Time-in-action RL

Loading full text...

Full text loading...

/deliver/fulltext/iet-csr/1/1/IET-CSR.2018.0001.html;jsessionid=17c1e2xqpaj5g.x-iet-live-01?itemId=%2fcontent%2fjournals%2f10.1049%2fiet-csr.2018.0001&mimeType=html&fmt=ahah

References

    1. 1)
      • 1. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Playing Atari with deep reinforcement learning’, 2013, arXiv preprint arXiv:1312.5602.
    2. 2)
      • 2. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Human-level control through deep reinforcement learning’, Nature, 2015, 518, (7540), pp. 529533.
    3. 3)
      • 3. Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al: ‘Continuous control with deep reinforcement learning’, 2015, arXiv preprint arXiv:1509.02971.
    4. 4)
      • 4. Schulman, J., Levine, S., Abbeel, P., et al: ‘Trust region policy optimization’. Int. Conf. Machine Learning, Lille, France, 2015, pp. 18891897.
    5. 5)
      • 5. Schulman, J., Wolski, F., Dhariwal, P., et al: ‘Proximal policy optimization algorithms’, 2017, arXiv preprint arXiv:1707.06347.
    6. 6)
      • 6. Bellman, R., Glicksberg, I., Gross, O.: ‘On the ‘bang–bang’ control problem’, Q. Appl. Math., 1956, 14, (1), pp. 1118.
    7. 7)
      • 7. Åström, K.J., Hägglund, T.: ‘PID controllers: theory, design, and tuning’ vol. 2, (Instrument Society of America Research, Triangle Park, NC, 1995).
    8. 8)
      • 8. Han, J.: ‘From PID to active disturbance rejection control’, IEEE Trans. Ind. Electron., 2009, 56, (3), pp. 900906.
    9. 9)
      • 9. Kawato, M., Furukawa, K., Suzuki, R.: ‘A hierarchical neural-network model for control and learning of voluntary movement’, Biol. Cybern., 1987, 57, (3), pp. 169185.
    10. 10)
      • 10. Rosenbaum, D.A., Kenny, S.B., Derr, M.A.: ‘Hierarchical control of rapid movement sequences’, J. Exp. Psychol. Human Percept. Perform., 1983, 9, (1), p. 86.
    11. 11)
      • 11. Ivry, R.B., Keele, S.W.: ‘Timing functions of the cerebellum’, J. Cogn. Neurosci., 1989, 1, (2), pp. 136152.
    12. 12)
      • 12. Sutton, R.S., Precup, D., Singh, S.: ‘Between MDPS and semi-MDPS: a frame-work for temporal abstraction in reinforcement learning’, Artif. Intell., 1999, 112, (1–2), pp. 181211.
    13. 13)
      • 13. Bradtke, S.J., Duff, M.O.: ‘Reinforcement learning methods for continuous-time Markov decision problems’. Advances in Neural Information Processing Systems, Denver, USA, 1995, pp. 393400.
    14. 14)
      • 14. Sutton, R.S.: ‘TD models: modeling the world at a mixture of time scales’. Proc. Int. Conf. on Machine Learnin, Tahoe City, USA, 1995, pp. 531539.
    15. 15)
      • 15. Dietterich, T.G.: ‘Hierarchical reinforcement learning with the MAXQ value function decomposition’, J. Artif. Intell. Res., 2000, 13, pp. 227303.
    16. 16)
      • 16. Hernandez-Gardiol, N., Mahadevan, S.: ‘Hierarchical memory-based reinforcement learning’. Advances in Neural Information Processing Systems, Vancouver, Canada, 2001, pp. 10471053.
    17. 17)
      • 17. Vezhnevets, A.S., Osindero, S., Schaul, T., et al: ‘Feudal networks for hierarchical reinforcement learning’, 2017, arXiv preprint arXiv:1703.01161.
    18. 18)
      • 18. Hauskrecht, M., Meuleau, N., Kaelbling, L.P., et al: ‘Hierarchical solution of Markov decision processes using macro-actions’. Proc. 14th Conf. Uncertainty in Artificial Intelligence, Madison, USA, 1998, pp. 220229.
    19. 19)
      • 19. Durugkar, I.P., Rosenbaum, C., Dernbach, S., et al: ‘Deep reinforcement learning with macro-actions’, 2016, arXiv preprint arXiv:1606.04615.
    20. 20)
      • 20. Kulkarni, T.D., Narasimhan, K., Saeedi, A., et al: ‘Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation’. Advances in Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 36753683.
    21. 21)
      • 21. Arulkumaran, K., Dilokthanakul, N., Shanahan, M., et al: ‘Classifying options for deep reinforcement learning’, 2016, arXiv preprint arXiv:1604.08153.
    22. 22)
      • 22. Watkins, C.J., Dayan, P.: ‘Q-learning’, Mach. Learn., 1992, 8, (3–4), pp. 279292.
    23. 23)
      • 23. Brockman, G., Cheung, V., Pettersson, L., et al: ‘OpenAI Gym’, 2016, arXiv preprint arXiv:1606.01540.
    24. 24)
      • 24. Bertsekas, D.P.: ‘Dynamic programming and optimal control’, vol. II, (Athena Scientific, Belmont, MA, USA, 2007).
    25. 25)
      • 25. Jaakkola, T., Jordan, M.I., Singh, S.P.: ‘On the convergence of stochastic iterative dynamic programming algorithms’, Neural Comput., 1993, 6, (6), pp. 11851201.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-csr.2018.0001
Loading

Related content

content/journals/10.1049/iet-csr.2018.0001
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address