Traffic light control using deep policy-gradient and value-function-based reinforcement learning
- Author(s): Seyed Sajad Mousavi 1 ; Michael Schukat 1 ; Enda Howley 1
-
-
View affiliations
-
Affiliations:
1:
Discipline of Information Technology , National University of Ireland , Galway, Galway , Ireland
-
Affiliations:
1:
Discipline of Information Technology , National University of Ireland , Galway, Galway , Ireland
- Source:
Volume 11, Issue 7,
September
2017,
p.
417 – 423
DOI: 10.1049/iet-its.2017.0153 , Print ISSN 1751-956X, Online ISSN 1751-9578
Recent advances in combining deep neural network architectures with reinforcement learning (RL) techniques have shown promising potential results in solving complex control problems with high-dimensional state and action spaces. Inspired by these successes, in this study, the authors built two kinds of RL algorithms: deep policy-gradient (PG) and value-function-based agents which can predict the best possible traffic signal for a traffic intersection. At each time step, these adaptive traffic light control agents receive a snapshot of the current state of a graphical traffic simulator and produce control signals. The PG-based agent maps its observation directly to the control signal; however, the value-function-based agent first estimates values for all legal control signals. The agent then selects the optimal control action with the highest value. Their methods show promising results in a traffic network simulated in the simulation of urban mobility traffic simulator, without suffering from instability issues during the training process.
Inspec keywords: road traffic control; adaptive control; traffic engineering computing; learning (artificial intelligence); digital simulation; gradient methods; control engineering computing
Other keywords: training process; complex control problems; high-dimensional state space; urban mobility traffic simulator; action spaces; PG-based agent maps; deep neural network architectures; traffic light control; graphical traffic simulator; traffic intersection; traffic signal; value-function-based reinforcement learning; deep policy-gradient RL algorithm; optimal control; control signals; value-function-based agent RL algorithms; adaptive traffic light control agents
Subjects: Knowledge engineering techniques; Self-adjusting control systems; Road-traffic system control; Traffic engineering computing; Control engineering computing; Optimisation techniques
References
-
-
1)
-
26. Deng, L.: ‘A tutorial survey of architectures, algorithms, and applications for deep learning’, APSIPA Trans. Signal Inf. Process., 2014, 3, p.e2.
-
-
2)
-
44. Williams, R.J.: ‘Simple statistical gradient-following algorithms for connectionist reinforcement learning’, Mach. Learn., 1992, 8, (3-4), pp. 229–256.
-
-
3)
-
24. Khamis, M.A., Gomaa, W.: ‘Enhanced multiagent multi-objective reinforcement learning for urban traffic light control’. 2012 11th Int. Conf. Machine Learning and Applications (ICMLA), 2012, vol. 1, pp. 586–591.
-
-
4)
-
1. Li, L., Wen, D., Yao, D.: ‘A survey of traffic control with vehicular communications’, IEEE Trans. Intell. Transp. Syst., 2014, 15, (1), pp. 425–432.
-
-
5)
-
40. Schulman, J., Heess, N., Weber, T., et al: ‘Gradient estimation using stochastic computation graphs’. Advances in Neural Information Processing Systems, 2015, pp. 3528–3536.
-
-
6)
-
25. Ritcher, S.: ‘Traffic light scheduling using policy-gradient reinforcement learning’. Int. Conf. Automated Planning and Scheduling., 2007.
-
-
7)
-
16. Wiering, M., et al: ‘Multi-agent reinforcement learning for traffic light control’. ICML, 2000, pp. 1151–1158.
-
-
8)
-
5. El. Tantawy, S., Abdulhai, B., Abdelgawad, H.: ‘Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (marlin-atsc): methodology and large-scale application on downtown Toronto’, IEEE Trans. Intell. Transp. Syst., 2013, 14, (3), pp. 1140–1150.
-
-
9)
-
38. Sutton, R.S., McAllester, D.A., Singh, S.P., et al: ‘Policy gradient methods for reinforcement learning with function approximation’. NIPS, 1999, vol. 99, pp. 1057–1063.
-
-
10)
-
46. Degris, T., Pilarski, P.M., Sutton, R.S.: ‘Model-free reinforcement learning with continuous action in practice’. 2012 American Control Conf. (ACC), 2012, pp. 2177–2182.
-
-
11)
-
21. Chin, Y.K., Bolong, N., Kiring, A., et al: ‘Q-learning based traffic optimization in management of signal timing plan’, Int. J. Simul. Syst. Sci. Technol., 2011, 12, (3), pp. 29–35.
-
-
12)
-
17. Abdulhai, B., Pringle, R., Karakoulas, G.J.: ‘Reinforcement learning for true adaptive traffic signal control’, J. Transp. Eng., 2003, 129, (3), pp. 278–285.
-
-
13)
-
30. Barto, A.G., Mahadevan, S.: ‘Recent advances in hierarchical reinforcement learning’, Discrete Event Dyn. Syst., 2003, 13, (4), pp. 341–379.
-
-
14)
-
20. Khamis, M.A., Gomaa, W., El.Shishiny, H.: ‘Multi-objective traffic light control system based on Bayesian probability interpretation’. 2012 15th Int. IEEE Conf. Intelligent Transportation Systems (ITSC), 2012, pp. 995–1000.
-
-
15)
-
9. Prashanth, L., Bhatnagar, S.: ‘Reinforcement learning with function approximation for traffic signal control’, IEEE Trans. Intell. Transp. Syst., 2011, 12, (2), pp. 412–421.
-
-
16)
-
31. Ghazanfari, B., Mozayani, N.: ‘Enhancing nash q-learning and team q-learning mechanisms by using bottlenecks’, J. Intell. Fuzzy Syst., 2014, 26, (6), pp. 2771–2783.
-
-
17)
-
27. Khamis, M., Gomaa, W., Galal, B.: ‘Deep learning is competing random forest in computational docking’, arXiv preprint arXiv:160806665, 2016.
-
-
18)
-
33. Ghazanfari, B., Mozayani, N.: ‘Extracting bottlenecks for reinforcement learning agent by holonic concept clustering and attentional functions’, Expert Syst. Appl., 2016, 54, pp. 61–77.
-
-
19)
-
14. van der Pol, E., Oliehoek, F.A.: ‘Coordinated deep reinforcement learners for traffic light control’, 2016.
-
-
20)
-
28. Li, L., Lv, Y., Wang, F.Y.: ‘Traffic signal timing via deep reinforcement learning’, IEEE/CAA J. Autom. Sin., 2016, 3, (3), pp. 247–254.
-
-
21)
-
2. Balaji, P., German, X., Srinivasan, D.: ‘Urban traffic signal control using reinforcement learning agents’, IET Intell. Transp. Syst., 2010, 4, (3), pp. 177–188.
-
-
22)
-
45. Mnih, V., Badia, A.P., Mirza, M., et al: ‘Asynchronous methods for deep reinforcement learning’. Int. Conf. Machine Learning, 2016.
-
-
23)
-
34. Lange, S., Riedmiller, M.: ‘Deep auto-encoder neural networks in reinforcement learning’. 2010 Int. Joint Conf. Neural Networks (IJCNN), 2010, pp. 1–8.
-
-
24)
-
7. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Human-level control through deep reinforcement learning’, Nature, 2015, 518, (7540), pp. 529–533.
-
-
25)
-
39. Baxter, J., Bartlett, P.L., Weaver, L.: ‘Experiments with infinite-horizon, policy gradient estimation’, J. Artif. Intell. Res., 2001, 15, pp. 351–381.
-
-
26)
-
47. Kingma, D., Ba, J.: ‘Adam: a method for stochastic optimization’, arXiv preprint arXiv:14126980, 2014.
-
-
27)
-
23. Khamis, M.A., Gomaa, W.: ‘Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multiagent framework’, Eng. Appl. Artif. Intell., 2014, 29, pp. 134–151.
-
-
28)
-
3. Abdoos, M., Mozayani, N., Bazzan, A.L.: ‘Holonic multi-agent system for traffic signals control’, Eng. Appl. Artif. Intell., 2013, 26, (5), pp. 1575–1587.
-
-
29)
-
32. Mousavi, S.S., Ghazanfari, B., Mozayani, N., et al: ‘Automatic abstraction controller in reinforcement learning agent via automata’, Appl. Soft Comput., 2014, 25, pp. 118–128.
-
-
30)
-
6. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Playing atari with deep reinforcement learning’, arXiv preprint arXiv:13125602, 2013.
-
-
31)
-
12. Silver, D., Huang, A., Maddison, C.J., et al: ‘Mastering the game of go with deep neural networks and tree search’, Nature, 2016, 529, (7587), pp. 484–489.
-
-
32)
-
35. Mousavi, S.S., Schukat, M., Howley, E.: ‘Deep reinforcement learning: an overview’. Intelligent Systems Conf., 2016.
-
-
33)
-
43. Thorpe, T.L., Anderson, C.W.: ‘Traffic light control using SARSA with three state representations’ (IBM Corporation, 1996).
-
-
34)
-
18. Brockfeld, E., Barlovic, R., Schadschneider, A., et al: ‘Optimizing traffic lights in a cellular automaton model for city traffic’, Phys. Rev. E, 2001, 64, (5), p. 056132.
-
-
35)
-
22. Arel, I., Liu, C., Urbanik, T., et al: ‘Reinforcement learning-based multi-agent system for network traffic signal control’, IET Intell. Transp. Syst., 2010, 4, (2), pp. 128–135.
-
-
36)
-
37. Lin, L.J.: ‘Reinforcement learning for robots using neural networks’ (Fujitsu Laboratories Ltd., 1993).
-
-
37)
-
19. Khamis, M.A., Gomaa, W., El.Mahdy, A., et al: ‘Adaptive traffic control system based on Bayesian probability interpretation’. Conf. Electronics, Communications and Computers (JEC-ECC), 2012, Japan-Egypt, 2012, pp. 151–156.
-
-
38)
-
42. Krajzewicz, D., Erdmann, J., Behrisch, M., et al: ‘Recent development and applications of SUMO-simulation of urban mobility’, Int. J. Adv. Syst. Meas., 2012, 5, (3&4), pp. 128–138.
-
-
39)
-
15. Genders, W., Razavi, S.: ‘Using a deep reinforcement learning agent for traffic signal control’, arXiv preprint arXiv:161101142, 2016.
-
-
40)
-
36. Tsitsiklis, J.N., Van Roy, B.: ‘An analysis of temporal-difference learning with function approximation’, IEEE Trans. Autom. Control, 1997, 42, (5), pp. 674–690.
-
-
41)
-
41. Wierstra, D., Förster, A., Peters, J., et al: ‘Recurrent policy gradients’, Log. J. IGPL, 2010, 18, (5), pp. 620–634.
-
-
42)
-
11. Duggan, M., Duggan, J., Howley, E., et al: ‘An autonomous network aware VM migration strategy in cloud data centres’. 2016 Int. Conf. Cloud and Autonomic Computing (ICCAC), 2016, pp. 24–32.
-
-
43)
-
8. Sutton, R.S., Barto, A.G.: ‘Introduction to reinforcement learning’ (MIT Press, 1998).
-
-
44)
-
10. Duggan, M., Flesk, K., Duggan, J., et al: ‘A reinforcement learning approach for dynamic selection of virtual machines in cloud data centres’. Sixth Int. Conf. Innovating Computing Technology, 2016.
-
-
45)
-
29. Xu, X., Zuo, L., Huang, Z.: ‘Reinforcement learning algorithms with function approximation: recent advances and applications’, Inf. Sci., 2014, 261, pp. 1–31.
-
-
46)
-
13. LeCun, Y., Bottou, L., Bengio, Y., et al: ‘Gradient-based learning applied to document recognition’, Proc. IEEE, 1998, 86, (11), pp. 2278–2324.
-
-
47)
-
4. Li, L., Wen, D.: ‘Parallel systems for traffic control: a rethinking’, IEEE Trans. Intell. Transp. Syst., 2016, 17, (4), pp. 1179–1182.
-
-
1)