Traffic light control using deep policy-gradient and value-function-based reinforcement learning

Seyed Sajad Mousavi; Michael Schukat; Enda Howley

Traffic light control using deep policy-gradient and value-function-based reinforcement learning

View Fulltext

Author(s): Seyed Sajad Mousavi¹ ; Michael Schukat¹ ; Enda Howley¹
- Affiliations: 1: Discipline of Information Technology , National University of Ireland , Galway, Galway , Ireland
Source: Volume 11, Issue 7, September 2017, p. 417 – 423
DOI: 10.1049/iet-its.2017.0153 , Print ISSN 1751-956X, Online ISSN 1751-9578

Received 09/05/2017, Accepted 10/08/2017, Revised 08/07/2017, Published 11/08/2017

Recent advances in combining deep neural network architectures with reinforcement learning (RL) techniques have shown promising potential results in solving complex control problems with high-dimensional state and action spaces. Inspired by these successes, in this study, the authors built two kinds of RL algorithms: deep policy-gradient (PG) and value-function-based agents which can predict the best possible traffic signal for a traffic intersection. At each time step, these adaptive traffic light control agents receive a snapshot of the current state of a graphical traffic simulator and produce control signals. The PG-based agent maps its observation directly to the control signal; however, the value-function-based agent first estimates values for all legal control signals. The agent then selects the optimal control action with the highest value. Their methods show promising results in a traffic network simulated in the simulation of urban mobility traffic simulator, without suffering from instability issues during the training process.

References

1. 1)
  - 26. Deng, L.: ‘A tutorial survey of architectures, algorithms, and applications for deep learning’, APSIPA Trans. Signal Inf. Process., 2014, 3, p.e2.
2. 2)
  - 44. Williams, R.J.: ‘Simple statistical gradient-following algorithms for connectionist reinforcement learning’, Mach. Learn., 1992, 8, (3-4), pp. 229–256.
3. 3)
  - 24. Khamis, M.A., Gomaa, W.: ‘Enhanced multiagent multi-objective reinforcement learning for urban traffic light control’. 2012 11th Int. Conf. Machine Learning and Applications (ICMLA), 2012, vol. 1, pp. 586–591.
4. 4)
  - 1. Li, L., Wen, D., Yao, D.: ‘A survey of traffic control with vehicular communications’, IEEE Trans. Intell. Transp. Syst., 2014, 15, (1), pp. 425–432.
5. 5)
  - 40. Schulman, J., Heess, N., Weber, T., et al: ‘Gradient estimation using stochastic computation graphs’. Advances in Neural Information Processing Systems, 2015, pp. 3528–3536.
6. 6)
  - 25. Ritcher, S.: ‘Traffic light scheduling using policy-gradient reinforcement learning’. Int. Conf. Automated Planning and Scheduling., 2007.
7. 7)
  - 16. Wiering, M., et al: ‘Multi-agent reinforcement learning for traffic light control’. ICML, 2000, pp. 1151–1158.
8. 8)
  - 5. El. Tantawy, S., Abdulhai, B., Abdelgawad, H.: ‘Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (marlin-atsc): methodology and large-scale application on downtown Toronto’, IEEE Trans. Intell. Transp. Syst., 2013, 14, (3), pp. 1140–1150.
9. 9)
  - 38. Sutton, R.S., McAllester, D.A., Singh, S.P., et al: ‘Policy gradient methods for reinforcement learning with function approximation’. NIPS, 1999, vol. 99, pp. 1057–1063.
10. 10)
  - 46. Degris, T., Pilarski, P.M., Sutton, R.S.: ‘Model-free reinforcement learning with continuous action in practice’. 2012 American Control Conf. (ACC), 2012, pp. 2177–2182.
11. 11)
  - 21. Chin, Y.K., Bolong, N., Kiring, A., et al: ‘Q-learning based traffic optimization in management of signal timing plan’, Int. J. Simul. Syst. Sci. Technol., 2011, 12, (3), pp. 29–35.
12. 12)
  - 17. Abdulhai, B., Pringle, R., Karakoulas, G.J.: ‘Reinforcement learning for true adaptive traffic signal control’, J. Transp. Eng., 2003, 129, (3), pp. 278–285.
13. 13)
  - 30. Barto, A.G., Mahadevan, S.: ‘Recent advances in hierarchical reinforcement learning’, Discrete Event Dyn. Syst., 2003, 13, (4), pp. 341–379.
14. 14)
  - 20. Khamis, M.A., Gomaa, W., El.Shishiny, H.: ‘Multi-objective traffic light control system based on Bayesian probability interpretation’. 2012 15th Int. IEEE Conf. Intelligent Transportation Systems (ITSC), 2012, pp. 995–1000.
15. 15)
  - 9. Prashanth, L., Bhatnagar, S.: ‘Reinforcement learning with function approximation for traffic signal control’, IEEE Trans. Intell. Transp. Syst., 2011, 12, (2), pp. 412–421.
16. 16)
  - 31. Ghazanfari, B., Mozayani, N.: ‘Enhancing nash q-learning and team q-learning mechanisms by using bottlenecks’, J. Intell. Fuzzy Syst., 2014, 26, (6), pp. 2771–2783.
17. 17)
  - 27. Khamis, M., Gomaa, W., Galal, B.: ‘Deep learning is competing random forest in computational docking’, arXiv preprint arXiv:160806665, 2016.
18. 18)
  - 33. Ghazanfari, B., Mozayani, N.: ‘Extracting bottlenecks for reinforcement learning agent by holonic concept clustering and attentional functions’, Expert Syst. Appl., 2016, 54, pp. 61–77.
19. 19)
  - 14. van der Pol, E., Oliehoek, F.A.: ‘Coordinated deep reinforcement learners for traffic light control’, 2016.
20. 20)
  - 28. Li, L., Lv, Y., Wang, F.Y.: ‘Traffic signal timing via deep reinforcement learning’, IEEE/CAA J. Autom. Sin., 2016, 3, (3), pp. 247–254.
21. 21)
  - 2. Balaji, P., German, X., Srinivasan, D.: ‘Urban traffic signal control using reinforcement learning agents’, IET Intell. Transp. Syst., 2010, 4, (3), pp. 177–188.
22. 22)
  - 45. Mnih, V., Badia, A.P., Mirza, M., et al: ‘Asynchronous methods for deep reinforcement learning’. Int. Conf. Machine Learning, 2016.
23. 23)
  - 34. Lange, S., Riedmiller, M.: ‘Deep auto-encoder neural networks in reinforcement learning’. 2010 Int. Joint Conf. Neural Networks (IJCNN), 2010, pp. 1–8.
24. 24)
  - 7. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Human-level control through deep reinforcement learning’, Nature, 2015, 518, (7540), pp. 529–533.
25. 25)
  - 39. Baxter, J., Bartlett, P.L., Weaver, L.: ‘Experiments with infinite-horizon, policy gradient estimation’, J. Artif. Intell. Res., 2001, 15, pp. 351–381.
26. 26)
  - 47. Kingma, D., Ba, J.: ‘Adam: a method for stochastic optimization’, arXiv preprint arXiv:14126980, 2014.
27. 27)
  - 23. Khamis, M.A., Gomaa, W.: ‘Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multiagent framework’, Eng. Appl. Artif. Intell., 2014, 29, pp. 134–151.
28. 28)
  - 3. Abdoos, M., Mozayani, N., Bazzan, A.L.: ‘Holonic multi-agent system for traffic signals control’, Eng. Appl. Artif. Intell., 2013, 26, (5), pp. 1575–1587.
29. 29)
  - 32. Mousavi, S.S., Ghazanfari, B., Mozayani, N., et al: ‘Automatic abstraction controller in reinforcement learning agent via automata’, Appl. Soft Comput., 2014, 25, pp. 118–128.
30. 30)
  - 6. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Playing atari with deep reinforcement learning’, arXiv preprint arXiv:13125602, 2013.
31. 31)
  - 12. Silver, D., Huang, A., Maddison, C.J., et al: ‘Mastering the game of go with deep neural networks and tree search’, Nature, 2016, 529, (7587), pp. 484–489.
32. 32)
  - 35. Mousavi, S.S., Schukat, M., Howley, E.: ‘Deep reinforcement learning: an overview’. Intelligent Systems Conf., 2016.
33. 33)
  - 43. Thorpe, T.L., Anderson, C.W.: ‘Traffic light control using SARSA with three state representations’ (IBM Corporation, 1996).
34. 34)
  - 18. Brockfeld, E., Barlovic, R., Schadschneider, A., et al: ‘Optimizing traffic lights in a cellular automaton model for city traffic’, Phys. Rev. E, 2001, 64, (5), p. 056132.
35. 35)
  - 22. Arel, I., Liu, C., Urbanik, T., et al: ‘Reinforcement learning-based multi-agent system for network traffic signal control’, IET Intell. Transp. Syst., 2010, 4, (2), pp. 128–135.
36. 36)
  - 37. Lin, L.J.: ‘Reinforcement learning for robots using neural networks’ (Fujitsu Laboratories Ltd., 1993).
37. 37)
  - 19. Khamis, M.A., Gomaa, W., El.Mahdy, A., et al: ‘Adaptive traffic control system based on Bayesian probability interpretation’. Conf. Electronics, Communications and Computers (JEC-ECC), 2012, Japan-Egypt, 2012, pp. 151–156.
38. 38)
  - 42. Krajzewicz, D., Erdmann, J., Behrisch, M., et al: ‘Recent development and applications of SUMO-simulation of urban mobility’, Int. J. Adv. Syst. Meas., 2012, 5, (3&4), pp. 128–138.
39. 39)
  - 15. Genders, W., Razavi, S.: ‘Using a deep reinforcement learning agent for traffic signal control’, arXiv preprint arXiv:161101142, 2016.
40. 40)
  - 36. Tsitsiklis, J.N., Van Roy, B.: ‘An analysis of temporal-difference learning with function approximation’, IEEE Trans. Autom. Control, 1997, 42, (5), pp. 674–690.
41. 41)
  - 41. Wierstra, D., Förster, A., Peters, J., et al: ‘Recurrent policy gradients’, Log. J. IGPL, 2010, 18, (5), pp. 620–634.
42. 42)
  - 11. Duggan, M., Duggan, J., Howley, E., et al: ‘An autonomous network aware VM migration strategy in cloud data centres’. 2016 Int. Conf. Cloud and Autonomic Computing (ICCAC), 2016, pp. 24–32.
43. 43)
  - 8. Sutton, R.S., Barto, A.G.: ‘Introduction to reinforcement learning’ (MIT Press, 1998).
44. 44)
  - 10. Duggan, M., Flesk, K., Duggan, J., et al: ‘A reinforcement learning approach for dynamic selection of virtual machines in cloud data centres’. Sixth Int. Conf. Innovating Computing Technology, 2016.
45. 45)
  - 29. Xu, X., Zuo, L., Huang, Z.: ‘Reinforcement learning algorithms with function approximation: recent advances and applications’, Inf. Sci., 2014, 261, pp. 1–31.
46. 46)
  - 13. LeCun, Y., Bottou, L., Bengio, Y., et al: ‘Gradient-based learning applied to document recognition’, Proc. IEEE, 1998, 86, (11), pp. 2278–2324.
47. 47)
  - 4. Li, L., Wen, D.: ‘Parallel systems for traffic control: a rethinking’, IEEE Trans. Intell. Transp. Syst., 2016, 17, (4), pp. 1179–1182.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Traffic light control using deep policy-gradient and value-function-based reinforcement learning

References

Related content