http://iet.metastore.ingenta.com
1887

access icon openaccess Deep imitation reinforcement learning with expert demonstration data

  • PDF
    3.332015037536621MB
  • XML
    64.4619140625Kb
  • HTML
    71.708984375Kb
Loading full text...

Full text loading...

/deliver/fulltext/joe/2018/16/JOE.2018.8314.html;jsessionid=et6qae022c7fk.x-iet-live-01?itemId=%2fcontent%2fjournals%2f10.1049%2fjoe.2018.8314&mimeType=html&fmt=ahah

References

    1. 1)
      • 1. Sutton, R., Barto, A.: ‘Reinforcement learning: an introduction’ (MIT Press, Cambridge, 1998).
    2. 2)
      • 2. Lecun, Y., Bengio, Y., Hinton, G.: ‘Deep learning’, Nature, 2015, 521, (7553), pp. 436444.
    3. 3)
      • 3. Silver, D., Huang, A., Maddison, C.J., et al: ‘Mastering the game of Go with deep neural networks and tree search’, Nature, 2016, 529, (7587), pp. 484489.
    4. 4)
      • 4. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Playing atari with deep reinforcement learning’. Proc. of NeuralInformation Processing Systems – Deep Learning Workshop, 2013.
    5. 5)
      • 5. Levine, S., Finn, C., Darrell, T., et al: ‘End-to-end training of deep visuomotor policies’, J. Mach. Learn. Res., 2016, 17, (1), pp. 13341373.
    6. 6)
      • 6. Abbeel, P., Coates, A., Ng, A.Y.: ‘Autonomous helicopter aerobatics through apprenticeship learning’, Int. J. Robot. Res., 2010, 29, (13), pp. 16081639.
    7. 7)
      • 7. Bojarski, M., Del Testa, D., Dworakowski, D., et al: ‘End to end learning for self-driving cars’, arXiv preprint, arXiv: 1604.07316, 2016.
    8. 8)
      • 8. Osband, I., Russo, D., Van Roy, B.: ‘(More) efficient reinforcement learning via posterior sampling’, Adv. Neural. Inf. Process. Syst., 2013, pp. 30033011.
    9. 9)
      • 9. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Human-level control through deep reinforcement learning’, Nature, 2015, 518, (7540), p. 529.
    10. 10)
      • 10. Osband, I., Blundell, C., Pritzel, A., et al: ‘Deep exploration via bootstrapped DQN’. Neural Information Processing Systems (NIPS), 2016, pp. 40264034.
    11. 11)
      • 11. Lee, S.G., Chung, T.C.: ‘A reinforcement learning algorithm using temporal difference error in ant model’. Int. Work-Conf. Artificial Neural Networks Springer Berlin Heidelberg, 2015, pp. 217224.
    12. 12)
      • 12. Schaul, T., Quan, J., Antonoglou, I., et al: ‘Prioritized experience replay’. Int. Conf. Learning Representations (ICLR), 2016.
    13. 13)
      • 13. Machwe, A.T., Parmee, I.C.: ‘Introducing machine learning within an interactive evolutionary design environment’. Int. Design Conf.–Design, 2006, pp. 1518.
    14. 14)
      • 14. Griffth, S., Subramanian, K., Scholz, J., et al: ‘Policy shaping: integrating human feedback with reinforcement learning’. Advances in Neural Information Processing Systems, Lake Tahoe, USA, 2013, pp. 26252633.
    15. 15)
      • 15. Ho, J., Gupta, J.K., Ermon, S.: ‘Model-free imitation learning with policy optimization’. Int. Conf. Machine Learning (ICML), 2016.
    16. 16)
      • 16. Ho, J., Ermon, S.: ‘Generative adversarial imitation learning’. Advances in Neural Information Processing Systems, 2016, pp. 45654573.
    17. 17)
      • 17. Duan, Y., Andrychowicz, M., Stadie, B., et al: ‘One-shot imitation learning’, arXiv preprint arXiv: 1703.07326, 2017.
    18. 18)
      • 18. Stadie, B.C., Abbeel, P., Sutskever, I.: ‘Third-person imitation learning’. Int. Conf. Learning Representations (ICLR), 2017.
    19. 19)
      • 19. Calinon, S., Guenter, F., Billard, A.: ‘On learning, representing, and generalizing a task in a humanoid robot’, IEEE Trans. Syst. Man Cybern., B, Cybern., 2007, 37, (2), pp. 286298.
    20. 20)
      • 20. Doerr, A., Ratliff, N., Bohg, J., et al: ‘Direct loss minimization inverse optimal control’. Proc. Int. Conf. Robotics: Science and Systems (RSS), 2015.
    21. 21)
      • 21. Sermanet, P., Xu, K., Levine, S.: ‘Unsupervised perceptual rewards for imitation learning’. Proc. Int. Conf. Robotics: Science and Systems (RSS), 2017.
    22. 22)
      • 22. Liu, Y.X., Gupta, A., Levine, S., et al: ‘Imitation from observation: learning to imitate behaviors from raw video via context translation’. Int. Conf. Neural Information Processing Systems (NIPS), 2017.
    23. 23)
      • 23. Hester, T., Vecerik, M., Pietquin, O., et al: ‘Deep Q-learning from demonstrations’. Proc. of AAAI, 2018.
    24. 24)
      • 24. Taylor, M., Suay, H.B., Chernova, S.: ‘Integrating reinforcement learning with human demonstrations of varying ability’. Proc. Int. Conf. AAMAS, 2011, pp. 617624.
    25. 25)
      • 25. Cederborg, T., Grover, I., Isbell, C.L., et al: ‘Policy shaping with human teachers’. Proc. AAAI, 2015, pp. 33663372.
    26. 26)
      • 26. Brys, T., Harutyunyan, A., Suay, H.B., et al: ‘Reinforcement learning from demonstration through shaping’. Proc. AAAI, 2015, pp. 33523358.
    27. 27)
      • 27. Subramanian, K., Thomaz, A.L.: ‘Exploration from demonstration for interactive reinforcement learning’. Proc. Int. Conf. Autonomous Agents & Multiagent Systems, 2016, pp. 447456.
http://iet.metastore.ingenta.com/content/journals/10.1049/joe.2018.8314
Loading

Related content

content/journals/10.1049/joe.2018.8314
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address