Decision making for self-driving cars is usually tackled by manually encoding rules from drivers’ behaviours or imitating drivers’ manipulation using supervised learning techniques. Both of them rely on mass driving data to cover all possible driving scenarios. This study presents a hierarchical reinforcement learning method for decision making of self-driving cars, which does not depend on a large amount of labelled driving data. This method comprehensively considers both high-level manoeuvre selection and low-level motion control in both lateral and longitudinal directions. The authors firstly decompose the driving tasks into three manoeuvres, including driving in lane, right lane change and left lane change, and learn the sub-policy for each manoeuvre. Then, a master policy is learned to choose the manoeuvre policy to be executed in the current state. All policies, including master policy and manoeuvre policies, are represented by fully-connected neural networks and trained by using asynchronous parallel reinforcement learners, which builds a mapping from the sensory outputs to driving decisions. Different state spaces and reward functions are designed for each manoeuvre. They apply this method to a highway driving scenario, which demonstrates that it can realise smooth and safe decision making for self-driving cars.

References

1. 1)
  - 19. Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al: ‘Continuous control with deep reinforcement learning’, arXiv preprint arXiv:150902971, 2015.
2. 2)
  - 21. Daniel, C., Van Hoof, H., Peters, J., et al: ‘Probabilistic inference for determining options in reinforcement learning’, Mach. Learn., 2016, 104, (2–3), pp. 337–357.
3. 3)
  - 24. Degris, T., Pilarski, P.M., Sutton, R.S.: ‘Model-free reinforcement learning with continuous action in practice’. American Control Conf. (ACC), Montreal, QC, Canada, 2012, pp. 2177–2182.
4. 4)
  - 22. Wei, J., Snider, J.M., Gu, T., et al: ‘A behavioral planning framework for autonomous driving’. IEEE Intelligent Vehicles Symp. (IV), Dearborn, MI, USA, 2014, pp. 458–464.
5. 5)
  - 20. Mnih, V., Badia, A.P., Mirza, M., et al: ‘Asynchronous methods for deep reinforcement learning’. Int. Conf. on Machine Learning, New York, NY, USA, 2016, pp. 1928–1937.
6. 6)
  - 35. de Ponte Müller, F.: ‘Survey on ranging sensors and cooperative techniques for relative positioning of vehicles’, Sensors, 2017, 17, (2), pp. 1–27.
7. 7)
  - 4. Montemerlo, M., Becker, J., Bhat, S., et al: ‘Junior: the Stanford entry in the urban challenge’, J. Field Robot., 2008, 25, (9), pp. 569–597.
8. 8)
  - 15. Sutton, R.S., Barto, A.G.: ‘Reinforcement learning: an introduction’ (MIT Press, Cambridge, MA, USA, 2018).
9. 9)
  - 17. Silver, D., Huang, A., Maddison, C.J., et al: ‘Mastering the game of go with deep neural networks and tree search’, Nature, 2016, 529, (7587), pp. 484–489.
10. 10)
  - 5. Furda, A., Vlacic, L.: ‘Enabling safe autonomous driving in real-world city traffic using multiple criteria decision making’, IEEE Intell. Transp. Syst. Mag., 2011, 3, (1), pp. 4–17.
11. 11)
  - 37. Park, K.Y., Hwang, S.Y.: ‘Robust range estimation with a monocular camera for vision-based forward collision warning system’, Scientific World J., 2014, 2014, pp. 1–9.
12. 12)
  - 29. Toledo, T., Koutsopoulos, H.N., Ben Akiva, M.: ‘Integrated driving behavior modeling’, Transp. Res. C, Emerg. Technol., 2007, 15, (2), pp. 96–112.
13. 13)
  - 33. Gao, Y., Gray, A., Frasch, J. V., et al: ‘Semi-autonomous vehicle control for road departure and obstacle avoidance’, IFAC Control Transp. Syst., 2012, 2, pp. 1–6.
14. 14)
  - 32. Katriniok, A., Maschuw, J. P., Christen, F., et al: ‘Optimal vehicle dynamics control for combined longitudinal and lateral autonomous vehicle guidance’. European Control Conf. (ECC), Zurich, Switzerland, 2013, pp. 974–979.
15. 15)
  - 12. Pomerleau, D.A.: ‘Alvinn: an autonomous land vehicle in a neural network’. Advances in Neural Information Processing Systems, Denver, CO, USA, 1989, pp. 305–313.
16. 16)
  - 2. Bojarski, M., Yeres, P., Choromanska, A., et al: ‘Explaining how a deep neural network trained with end-to-end learning steers a car’, arXiv preprint arXiv:170407911, 2017.
17. 17)
  - 3. Katrakazas, C., Quddus, M., Chen, W.H., et al: ‘Real-time motion planning methods for autonomous on-road driving: state-of-the-art and future research directions’, Transp. Res. C, Emerg. Technol., 2015, 60, pp. 416–442.
18. 18)
  - 34. Li, G., Li, S.E., Cheng, B., et al: ‘Estimation of driving style in naturalistic highway traffic using maneuver transition probabilities’, Transp. Res. C, Emerg. Technol., 2017, 74, pp. 113–125.
19. 19)
  - 38. Guan, H., Li, J., Cao, S., et al: ‘Use of mobile lidar in road information inventory: a review’, Int. J. Image Data Fusion, 2016, 7, (3), pp. 219–242.
20. 20)
  - 16. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Human-level control through deep reinforcement learning’, Nature, 2015, 518, (7540), pp. 529–533.
21. 21)
  - 6. Glaser, S., Vanholme, B., Mammar, S., et al: ‘Maneuver-based trajectory planning for highly autonomous vehicles on real road with traffic and driver interaction’, IEEE Trans. Intell. Transp. Syst., 2010, 11, (3), pp. 589–606.
22. 22)
  - 39. Cao, Z., Yang, D., Jiang, K., et al: ‘A geometry-driven car-following distance estimation algorithm robust to road slopes’, Transp. Res. C, Emerg. Technol., 2019, 102, pp. 274–288.
23. 23)
  - 10. Li, G., Yang, Y., Qu, X.: ‘Deep learning approaches on pedestrian detection in hazy weather’, IEEE Trans. Ind. Electron., 2019, pp. 1–1, doi: 10.1109/TIE.2019.2945295.
24. 24)
  - 1. Paden, B., Čáp, M., Yong, S.Z., et al: ‘A survey of motion planning and control techniques for self-driving urban vehicles’, IEEE Trans. Intell. Veh., 2016, 1, (1), pp. 33–55.
25. 25)
  - 36. Jeng, S.L., Chieng, W.H., Lu, H.P.: ‘Estimating speed using a side-looking single-radar vehicle detector’, IEEE Trans. Intell. Transp. Syst., 2013, 15, (2), pp. 607–614.
26. 26)
  - 28. Liao, Y., Li, S.E., Wang, W., et al: ‘Detection of driver cognitive distraction: a comparison study of stop-controlled intersection and speed-limited highway’, IEEE Trans. Intell. Transp. Syst., 2016, 17, (6), pp. 1628–1637.
27. 27)
  - 11. LeCun, Y., Bengio, Y., Hinton, G.: ‘Deep learning’, Nature, 2015, 521, (7553), pp. 436–444.
28. 28)
  - 8. Duan, J., Li, R., Hou, L., et al: ‘Driver braking behavior analysis to improve autonomous emergency braking systems in typical Chinese vehicle-bicycle conflicts’, Accident Anal. Prev., 2017, 108, pp. 74–82.
29. 29)
  - 27. Gopalan, N., Littman, M.L., MacGlashan, J., et al: ‘Planning with abstract Markov decision processes’. Twenty-Seventh Int. Conf. on Automated Planning and Scheduling (ICAPS), Pittsburgh, PA, USA, 2017.
30. 30)
  - 30. Li, R., Li, Y., Li, S.E., et al: ‘Driver-automation indirect shared control of highly automated vehicles with intention-aware authority transition’. Intelligent Vehicles Symp. (IV), Los Angeles, CA, USA, 2017, pp. 26–32.
31. 31)
  - 18. Silver, D., Schrittwieser, J., Simonyan, K., et al: ‘Mastering the game of go without human knowledge’, Nature, 2017, 550, (7676), p. 354.
32. 32)
  - 13. LeCun, Y., Cosatto, E., Ben, J., et al: ‘Dave: Autonomous off-road vehicle control using end-to-end learning’, Technical Report DARPA-IPTO Final Report, Courant Institute/CBLL, http://www.cs.nyu.edu/yann/research/dave/index.html, 2004.
33. 33)
  - 7. Kala, R., Warwick, K.: ‘Motion planning of autonomous vehicles in a non-autonomous vehicle environment without speed lanes’, Eng. Appl. Artif. Intell., 2013, 26, (5), pp. 1588–1601.
34. 34)
  - 31. Attia, R., Orjuela, R., Basset, M.: ‘Coupled longitudinal and lateral control strategy improving lateral stability for autonomous vehicle’. American Control Conf. (ACC), Montreal, QC, Canada, 2012, pp. 6509–6514.
35. 35)
  - 14. Chen, C., Seff, A., Kornhauser, A., et al: ‘Deepdriving: learning affordance for direct perception in autonomous driving’. Proc. of the IEEE Int. Conf. on Computer Vision, Santiago, Chile, 2015, pp. 2722–2730.
36. 36)
  - 9. Hou, L., Duan, J., Wang, W., et al: ‘Drivers braking behaviors in different motion patterns of vehicle-bicycle conflicts’, J. Adv. Transp., 2019, 2019, pp. 1–17, doi: 10.1155/2019/4023970.
37. 37)
  - 25. Bhatnagar, S., Sutton, R., Ghavamzadeh, M., et al: ‘Natural actor-critic algorithms’, Automatica, 2009, 45, (11), pp. 2471–2482.
38. 38)
  - 23. Sutton, R.S., McAllester, D.A., Singh, S.P., et al: ‘Policy gradient methods for reinforcement learning with function approximation’. Advances in Neural Information Processing Systems, Denver, CO, USA, 2000, pp. 1057–1063.
39. 39)
  - 26. Sutton, R.S., Precup, D., Singh, S.: ‘Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning’, Artif. Intell., 1999, 112, (1–2), pp. 181–211.

Hierarchical reinforcement learning for self-driving decision-making without reliance on labelled driving data

References

Related content