Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon openaccess Deep imitation reinforcement learning with expert demonstration data

In recent years, deep reinforcement learning (DRL) has made impressive achievements in many fields. However, existing DRL algorithms usually require a large amount of exploration to obtain a good action policy. In addition, in many complex situations, the reward function cannot be well designed to meet task requirements. These two problems will make it difficult for DRL to learn a good action policy within a relatively short period. The use of expert data can provide effective guidance and avoid unnecessary exploration. This study proposes a deep imitation reinforcement learning (DIRL) algorithm that uses a certain amount of expert demonstration data to speed up the training of DRL. In the proposed method, the learning agent imitates the expert's action policy by learning from demonstration data. After imitation learning, DRL is used to optimise the action policy in a self-learning way. By experimental comparison on a video game called the Mario racing game, it is shown that the proposed DIRL algorithm with expert demonstration data can obtain much better performance than previous DRL algorithms without expert guidance.

References

    1. 1)
      • 13. Machwe, A.T., Parmee, I.C.: ‘Introducing machine learning within an interactive evolutionary design environment’. Int. Design Conf.–Design, 2006, pp. 1518.
    2. 2)
      • 23. Hester, T., Vecerik, M., Pietquin, O., et al: ‘Deep Q-learning from demonstrations’. Proc. of AAAI, 2018.
    3. 3)
      • 19. Calinon, S., Guenter, F., Billard, A.: ‘On learning, representing, and generalizing a task in a humanoid robot’, IEEE Trans. Syst. Man Cybern., B, Cybern., 2007, 37, (2), pp. 286298.
    4. 4)
      • 3. Silver, D., Huang, A., Maddison, C.J., et al: ‘Mastering the game of Go with deep neural networks and tree search’, Nature, 2016, 529, (7587), pp. 484489.
    5. 5)
      • 20. Doerr, A., Ratliff, N., Bohg, J., et al: ‘Direct loss minimization inverse optimal control’. Proc. Int. Conf. Robotics: Science and Systems (RSS), 2015.
    6. 6)
      • 14. Griffth, S., Subramanian, K., Scholz, J., et al: ‘Policy shaping: integrating human feedback with reinforcement learning’. Advances in Neural Information Processing Systems, Lake Tahoe, USA, 2013, pp. 26252633.
    7. 7)
      • 16. Ho, J., Ermon, S.: ‘Generative adversarial imitation learning’. Advances in Neural Information Processing Systems, 2016, pp. 45654573.
    8. 8)
      • 9. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Human-level control through deep reinforcement learning’, Nature, 2015, 518, (7540), p. 529.
    9. 9)
      • 21. Sermanet, P., Xu, K., Levine, S.: ‘Unsupervised perceptual rewards for imitation learning’. Proc. Int. Conf. Robotics: Science and Systems (RSS), 2017.
    10. 10)
      • 6. Abbeel, P., Coates, A., Ng, A.Y.: ‘Autonomous helicopter aerobatics through apprenticeship learning’, Int. J. Robot. Res., 2010, 29, (13), pp. 16081639.
    11. 11)
      • 8. Osband, I., Russo, D., Van Roy, B.: ‘(More) efficient reinforcement learning via posterior sampling’, Adv. Neural. Inf. Process. Syst., 2013, pp. 30033011.
    12. 12)
      • 27. Subramanian, K., Thomaz, A.L.: ‘Exploration from demonstration for interactive reinforcement learning’. Proc. Int. Conf. Autonomous Agents & Multiagent Systems, 2016, pp. 447456.
    13. 13)
      • 12. Schaul, T., Quan, J., Antonoglou, I., et al: ‘Prioritized experience replay’. Int. Conf. Learning Representations (ICLR), 2016.
    14. 14)
      • 17. Duan, Y., Andrychowicz, M., Stadie, B., et al: ‘One-shot imitation learning’, arXiv preprint arXiv: 1703.07326, 2017.
    15. 15)
      • 1. Sutton, R., Barto, A.: ‘Reinforcement learning: an introduction’ (MIT Press, Cambridge, 1998).
    16. 16)
      • 25. Cederborg, T., Grover, I., Isbell, C.L., et al: ‘Policy shaping with human teachers’. Proc. AAAI, 2015, pp. 33663372.
    17. 17)
      • 24. Taylor, M., Suay, H.B., Chernova, S.: ‘Integrating reinforcement learning with human demonstrations of varying ability’. Proc. Int. Conf. AAMAS, 2011, pp. 617624.
    18. 18)
      • 7. Bojarski, M., Del Testa, D., Dworakowski, D., et al: ‘End to end learning for self-driving cars’, arXiv preprint, arXiv: 1604.07316, 2016.
    19. 19)
      • 15. Ho, J., Gupta, J.K., Ermon, S.: ‘Model-free imitation learning with policy optimization’. Int. Conf. Machine Learning (ICML), 2016.
    20. 20)
      • 11. Lee, S.G., Chung, T.C.: ‘A reinforcement learning algorithm using temporal difference error in ant model’. Int. Work-Conf. Artificial Neural Networks Springer Berlin Heidelberg, 2015, pp. 217224.
    21. 21)
      • 4. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Playing atari with deep reinforcement learning’. Proc. of NeuralInformation Processing Systems – Deep Learning Workshop, 2013.
    22. 22)
      • 18. Stadie, B.C., Abbeel, P., Sutskever, I.: ‘Third-person imitation learning’. Int. Conf. Learning Representations (ICLR), 2017.
    23. 23)
      • 22. Liu, Y.X., Gupta, A., Levine, S., et al: ‘Imitation from observation: learning to imitate behaviors from raw video via context translation’. Int. Conf. Neural Information Processing Systems (NIPS), 2017.
    24. 24)
      • 26. Brys, T., Harutyunyan, A., Suay, H.B., et al: ‘Reinforcement learning from demonstration through shaping’. Proc. AAAI, 2015, pp. 33523358.
    25. 25)
      • 2. Lecun, Y., Bengio, Y., Hinton, G.: ‘Deep learning’, Nature, 2015, 521, (7553), pp. 436444.
    26. 26)
      • 5. Levine, S., Finn, C., Darrell, T., et al: ‘End-to-end training of deep visuomotor policies’, J. Mach. Learn. Res., 2016, 17, (1), pp. 13341373.
    27. 27)
      • 10. Osband, I., Blundell, C., Pritzel, A., et al: ‘Deep exploration via bootstrapped DQN’. Neural Information Processing Systems (NIPS), 2016, pp. 40264034.
http://iet.metastore.ingenta.com/content/journals/10.1049/joe.2018.8314
Loading

Related content

content/journals/10.1049/joe.2018.8314
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address