In this study, a motion control algorithm based on deep imitation reinforcement learning is proposed for the unmanned underwater vehicles (UUVs). The algorithm is called imitation learning (IL) twin delay deep deterministic policy gradient (DDPG) (TD3). It combines IL with DDPG (TD3). In order to accelerate the training process of reinforcement learning, the supervised learning method is used in IL for behaviour cloning from the closed-loop control data. The deep reinforcement learning employs actor–critic architecture. The actor part executes the control strategy and the critic part evaluates current control strategy. The training efficiency of IL-TD3 is compared with DDPG and TD3. The simulation results show that the training results of IL-TD3 converge faster and the training process is more stable than both of them, the convergence rate of IL-TD3 algorithm during training is about double that of DDPG and TD3. The control performance via IL-TD3 is superior to PID in UUVs motion control tasks. The average track error of IL-TD3 is reduced by $70 %$ than PID control. The average tracking error under thruster fault is almost the same as under normal condition.

References

1. 1)
  - 4. Zhao, W., Qin, X., Wang, C.: ‘Yaw and lateral stability control for four-wheel steer-by-wire system’, IEEE/ASME Trans. Mechatronics, 2018, 23, (6), pp. 2628–2637.
2. 2)
  - 23. Chu, Z., Meng, F., Zhu, D., et al: ‘Fault reconstruction using a terminal sliding mode observer for a class of second-order MIMO uncertain nonlinear systems’, ISA Trans., 2020, 97, pp. 67–75, doi: 10.1016/j.isatra.2019.07.024.
3. 3)
  - 13. Yi, M., Xu, X., Zeng, Y., et al: ‘Deep imitation reinforcement learning with expert demonstration data’, J. Eng., 2018, 2018, (16), pp. 1567–1573.
4. 4)
  - 19. Vaandrager, M., Grondman, I., Busoniu, L., et al: ‘Efficient model learning methods for actor–critic control’, IEEE Trans. Syst., Man, Cybern., 2012, 42, (3), pp. 591–602.
5. 5)
  - 17. Qiao, L., Yi, B., Wu, D., et al: ‘Design of three exponentially convergent robust controllers for the trajectory tracking of autonomous underwater vehicles’, Ocean Eng., 2017, 134, pp. 157–172.
6. 6)
  - 21. Kingma, D.P., Ba, J.: ‘Adam: a method for stochastic optimization’. Int. Conf. on Learning Representations (ICLR), San Diego, CA, USA, May 2015.
7. 7)
  - 9. Silver, D., Liver, G., Heess, N., et al: ‘Deterministic policy gradient algorithms’. Int. Conf. on Machine Learning (ICML), Beijing, People's Republic of China, November 2014.
8. 8)
  - 5. Shen, C., Shi, Y., Buckham, B.: ‘Path-following control of an AUV: A multi objective model predictive control approach’, IEEE Trans. Control Syst. Technol., 2019, 27, (3), pp. 1334–1342.
9. 9)
  - 10. Fujimoto, S., van Hoof, H., Meger, D.: ‘Addressing function approximation error in actor-critic methods’. Int. Conf. on Machine Learning (ICML), Stockholm, Sweden, July 2018.
10. 10)
  - 3. Fan, S., Li, B., Xu, W., et al: ‘Impact of current disturbances on AUV docking: model-based motion prediction and countering approaches’, Ocean. Eng., 2018, 43, (4), pp. 888–904.
11. 11)
  - 8. Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al: ‘Continuous control with deep reinforcement learning’. Int. Conf. on Learning Representations (ICLR), San Juan, Puerto Rico, May 2016.
12. 12)
  - 2. Wang, Z., Qin, Y., Hu, C., et al: ‘Fuzzy observer-based prescribed performance control of vehicle roll behavior via controllable damper’, IEEE Access, 2019, 7, pp. 19471–19487.
13. 13)
  - 15. Carlucho, I., Paula, M.D., Wang, S., et al: ‘Adaptive low-level control of autonomous underwater vehicles using deep reinforcement learning’, Robot. Auton. Syst., 2018, 107, pp. 71–86.
14. 14)
  - 12. Liu, Y., Gupta, A., Abbeel, P., et al: ‘Imitation from observation: learning to imitate behaviors from raw video via context translation’. IEEE Int. Conf. on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 2018, pp. 1118–1125.
15. 15)
  - 22. Podder, T.K., Sarkar, N.: ‘Fault-tolerant control of an autonomous underwater vehicle under thruster redundancy’, Robot. Auton. Syst., 2001, 34, (1), pp. 39–52.
16. 16)
  - 11. Todorov, E., Erez, T., Tassa, Y.M.: ‘A physics engine for model-based control’. Intelligent Robots and Systems (IROS), Vilamoura, Portugal, October 2012, pp. 5026–5033.
17. 17)
  - 20. Shi, W., Song, S., Wu, C., et al: ‘Multi Pseudo Q-learning-based deterministic policy gradient for tracking control of autonomous underwater vehicles’, IEEE Trans. Neural Netw. Learn. Syst., 2019, 30, (12), pp. 3534–3546.
18. 18)
  - 1. Li, X., Zhu, D.: ‘An adaptive SOM neural network method for distributed formation control of a group of AUVs’, IEEE Trans. Ind. Electron., 2018, 65, (10), pp. 8260–8270.
19. 19)
  - 16. Wu, H., Song, S., You, K., et al: ‘Depth control of model-free AUVs via reinforcement learning’. IEEE Trans. Syst., Man, Cybern.: Syst., 2019, 49, (12), pp. 2499–2510.
20. 20)
  - 7. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Human-level control through deep reinforcement learning’, Nature, 2015, 518, (7540), pp. 529–533.
21. 21)
  - 6. Shen, C., Shi, Y., Buckham, B.: ‘Trajectory tracking control of an autonomous underwater vehicle using lyapunov-based model predictive control’, IEEE Trans. Ind. Electron., 2018, 65, (7), pp. 5796–5805.
22. 22)
  - 14. Cui, R., Yang, C., Li, Y., et al: ‘Adaptive neural network control of AUVs with control input nonlinearities using reinforcement learning’, IEEE Trans. Syst., Man, Cybern.: Syst., 2017, 47, (6), pp. 1019–1029.
23. 23)
  - 18. Sutton, R.S., Barto, A.G.: ‘Reinforcement learning: an Introduction’ (MIT Press, Cambridge, MA, USA, 1998), pp. 6–12.

Motion control of unmanned underwater vehicles via deep imitation reinforcement learning algorithm

References

Related content