Deep imitation reinforcement learning with expert demonstration data

Menglong Yi; Xin Xu; Yujun Zeng; Seul Jung

Deep imitation reinforcement learning with expert demonstration data

View Fulltext

Author(s): Menglong Yi¹ ; Xin Xu^{1, 2} ; Yujun Zeng¹ ; Seul Jung³
- Affiliations: 1: College of Intelligence Science and Technology, National University of Defense Technology , Changsha , People's Republic of China ;
  2: Laboratory of Science and Technology on Integrated Logistics Support , National University of Defense Technology , Changsha , People's Republic of China ;
  3: Department of Mechatronics Engineering , Chungnam National University , Daejeon , Republic of Korea
Source: Volume 2018, Issue 16, November 2018, p. 1567 – 1573
DOI: 10.1049/joe.2018.8314 , Online ISSN 2051-3305

This is an open access article published by the IET under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/)

Received 18/07/2018, Accepted 26/07/2018, Published 16/08/2018

In recent years, deep reinforcement learning (DRL) has made impressive achievements in many fields. However, existing DRL algorithms usually require a large amount of exploration to obtain a good action policy. In addition, in many complex situations, the reward function cannot be well designed to meet task requirements. These two problems will make it difficult for DRL to learn a good action policy within a relatively short period. The use of expert data can provide effective guidance and avoid unnecessary exploration. This study proposes a deep imitation reinforcement learning (DIRL) algorithm that uses a certain amount of expert demonstration data to speed up the training of DRL. In the proposed method, the learning agent imitates the expert's action policy by learning from demonstration data. After imitation learning, DRL is used to optimise the action policy in a self-learning way. By experimental comparison on a video game called the Mario racing game, it is shown that the proposed DIRL algorithm with expert demonstration data can obtain much better performance than previous DRL algorithms without expert guidance.

References

1. 1)
  - 13. Machwe, A.T., Parmee, I.C.: ‘Introducing machine learning within an interactive evolutionary design environment’. Int. Design Conf.–Design, 2006, pp. 15–18.
2. 2)
  - 23. Hester, T., Vecerik, M., Pietquin, O., et al: ‘Deep Q-learning from demonstrations’. Proc. of AAAI, 2018.
3. 3)
  - 19. Calinon, S., Guenter, F., Billard, A.: ‘On learning, representing, and generalizing a task in a humanoid robot’, IEEE Trans. Syst. Man Cybern., B, Cybern., 2007, 37, (2), pp. 286–298.
4. 4)
  - 3. Silver, D., Huang, A., Maddison, C.J., et al: ‘Mastering the game of Go with deep neural networks and tree search’, Nature, 2016, 529, (7587), pp. 484–489.
5. 5)
  - 20. Doerr, A., Ratliff, N., Bohg, J., et al: ‘Direct loss minimization inverse optimal control’. Proc. Int. Conf. Robotics: Science and Systems (RSS), 2015.
6. 6)
  - 14. Griffth, S., Subramanian, K., Scholz, J., et al: ‘Policy shaping: integrating human feedback with reinforcement learning’. Advances in Neural Information Processing Systems, Lake Tahoe, USA, 2013, pp. 2625–2633.
7. 7)
  - 16. Ho, J., Ermon, S.: ‘Generative adversarial imitation learning’. Advances in Neural Information Processing Systems, 2016, pp. 4565–4573.
8. 8)
  - 9. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Human-level control through deep reinforcement learning’, Nature, 2015, 518, (7540), p. 529.
9. 9)
  - 21. Sermanet, P., Xu, K., Levine, S.: ‘Unsupervised perceptual rewards for imitation learning’. Proc. Int. Conf. Robotics: Science and Systems (RSS), 2017.
10. 10)
  - 6. Abbeel, P., Coates, A., Ng, A.Y.: ‘Autonomous helicopter aerobatics through apprenticeship learning’, Int. J. Robot. Res., 2010, 29, (13), pp. 1608–1639.
11. 11)
  - 8. Osband, I., Russo, D., Van Roy, B.: ‘(More) efficient reinforcement learning via posterior sampling’, Adv. Neural. Inf. Process. Syst., 2013, pp. 3003–3011.
12. 12)
  - 27. Subramanian, K., Thomaz, A.L.: ‘Exploration from demonstration for interactive reinforcement learning’. Proc. Int. Conf. Autonomous Agents & Multiagent Systems, 2016, pp. 447–456.
13. 13)
  - 12. Schaul, T., Quan, J., Antonoglou, I., et al: ‘Prioritized experience replay’. Int. Conf. Learning Representations (ICLR), 2016.
14. 14)
  - 17. Duan, Y., Andrychowicz, M., Stadie, B., et al: ‘One-shot imitation learning’, arXiv preprint arXiv: 1703.07326, 2017.
15. 15)
  - 1. Sutton, R., Barto, A.: ‘Reinforcement learning: an introduction’ (MIT Press, Cambridge, 1998).
16. 16)
  - 25. Cederborg, T., Grover, I., Isbell, C.L., et al: ‘Policy shaping with human teachers’. Proc. AAAI, 2015, pp. 3366–3372.
17. 17)
  - 24. Taylor, M., Suay, H.B., Chernova, S.: ‘Integrating reinforcement learning with human demonstrations of varying ability’. Proc. Int. Conf. AAMAS, 2011, pp. 617–624.
18. 18)
  - 7. Bojarski, M., Del Testa, D., Dworakowski, D., et al: ‘End to end learning for self-driving cars’, arXiv preprint, arXiv: 1604.07316, 2016.
19. 19)
  - 15. Ho, J., Gupta, J.K., Ermon, S.: ‘Model-free imitation learning with policy optimization’. Int. Conf. Machine Learning (ICML), 2016.
20. 20)
  - 11. Lee, S.G., Chung, T.C.: ‘A reinforcement learning algorithm using temporal difference error in ant model’. Int. Work-Conf. Artificial Neural Networks Springer Berlin Heidelberg, 2015, pp. 217–224.
21. 21)
  - 4. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Playing atari with deep reinforcement learning’. Proc. of NeuralInformation Processing Systems – Deep Learning Workshop, 2013.
22. 22)
  - 18. Stadie, B.C., Abbeel, P., Sutskever, I.: ‘Third-person imitation learning’. Int. Conf. Learning Representations (ICLR), 2017.
23. 23)
  - 22. Liu, Y.X., Gupta, A., Levine, S., et al: ‘Imitation from observation: learning to imitate behaviors from raw video via context translation’. Int. Conf. Neural Information Processing Systems (NIPS), 2017.
24. 24)
  - 26. Brys, T., Harutyunyan, A., Suay, H.B., et al: ‘Reinforcement learning from demonstration through shaping’. Proc. AAAI, 2015, pp. 3352–3358.
25. 25)
  - 2. Lecun, Y., Bengio, Y., Hinton, G.: ‘Deep learning’, Nature, 2015, 521, (7553), pp. 436–444.
26. 26)
  - 5. Levine, S., Finn, C., Darrell, T., et al: ‘End-to-end training of deep visuomotor policies’, J. Mach. Learn. Res., 2016, 17, (1), pp. 1334–1373.
27. 27)
  - 10. Osband, I., Blundell, C., Pritzel, A., et al: ‘Deep exploration via bootstrapped DQN’. Neural Information Processing Systems (NIPS), 2016, pp. 4026–4034.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Deep imitation reinforcement learning with expert demonstration data

References

Related content