© The Institution of Engineering and Technology
Recent advances in combining deep neural network architectures with reinforcement learning (RL) techniques have shown promising potential results in solving complex control problems with high-dimensional state and action spaces. Inspired by these successes, in this study, the authors built two kinds of RL algorithms: deep policy-gradient (PG) and value-function-based agents which can predict the best possible traffic signal for a traffic intersection. At each time step, these adaptive traffic light control agents receive a snapshot of the current state of a graphical traffic simulator and produce control signals. The PG-based agent maps its observation directly to the control signal; however, the value-function-based agent first estimates values for all legal control signals. The agent then selects the optimal control action with the highest value. Their methods show promising results in a traffic network simulated in the simulation of urban mobility traffic simulator, without suffering from instability issues during the training process.
References
-
-
1)
-
26. Deng, L.: ‘A tutorial survey of architectures, algorithms, and applications for deep learning’, APSIPA Trans. Signal Inf. Process., 2014, 3, p.e2.
-
2)
-
44. Williams, R.J.: ‘Simple statistical gradient-following algorithms for connectionist reinforcement learning’, Mach. Learn., 1992, 8, (3-4), pp. 229–256.
-
3)
-
24. Khamis, M.A., Gomaa, W.: ‘Enhanced multiagent multi-objective reinforcement learning for urban traffic light control’. 2012 11th Int. Conf. Machine Learning and Applications (ICMLA), 2012, vol. 1, pp. 586–591.
-
4)
-
1. Li, L., Wen, D., Yao, D.: ‘A survey of traffic control with vehicular communications’, IEEE Trans. Intell. Transp. Syst., 2014, 15, (1), pp. 425–432.
-
5)
-
40. Schulman, J., Heess, N., Weber, T., et al: ‘Gradient estimation using stochastic computation graphs’. Advances in Neural Information Processing Systems, 2015, pp. 3528–3536.
-
6)
-
25. Ritcher, S.: ‘Traffic light scheduling using policy-gradient reinforcement learning’. Int. Conf. Automated Planning and Scheduling., 2007.
-
7)
-
16. Wiering, M., et al: ‘Multi-agent reinforcement learning for traffic light control’. ICML, 2000, pp. 1151–1158.
-
8)
-
5. El. Tantawy, S., Abdulhai, B., Abdelgawad, H.: ‘Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (marlin-atsc): methodology and large-scale application on downtown Toronto’, IEEE Trans. Intell. Transp. Syst., 2013, 14, (3), pp. 1140–1150.
-
9)
-
38. Sutton, R.S., McAllester, D.A., Singh, S.P., et al: ‘Policy gradient methods for reinforcement learning with function approximation’. NIPS, 1999, vol. 99, pp. 1057–1063.
-
10)
-
46. Degris, T., Pilarski, P.M., Sutton, R.S.: ‘Model-free reinforcement learning with continuous action in practice’. 2012 American Control Conf. (ACC), 2012, pp. 2177–2182.
-
11)
-
21. Chin, Y.K., Bolong, N., Kiring, A., et al: ‘Q-learning based traffic optimization in management of signal timing plan’, Int. J. Simul. Syst. Sci. Technol., 2011, 12, (3), pp. 29–35.
-
12)
-
17. Abdulhai, B., Pringle, R., Karakoulas, G.J.: ‘Reinforcement learning for true adaptive traffic signal control’, J. Transp. Eng., 2003, 129, (3), pp. 278–285.
-
13)
-
30. Barto, A.G., Mahadevan, S.: ‘Recent advances in hierarchical reinforcement learning’, Discrete Event Dyn. Syst., 2003, 13, (4), pp. 341–379.
-
14)
-
20. Khamis, M.A., Gomaa, W., El.Shishiny, H.: ‘Multi-objective traffic light control system based on Bayesian probability interpretation’. 2012 15th Int. IEEE Conf. Intelligent Transportation Systems (ITSC), 2012, pp. 995–1000.
-
15)
-
9. Prashanth, L., Bhatnagar, S.: ‘Reinforcement learning with function approximation for traffic signal control’, IEEE Trans. Intell. Transp. Syst., 2011, 12, (2), pp. 412–421.
-
16)
-
31. Ghazanfari, B., Mozayani, N.: ‘Enhancing nash q-learning and team q-learning mechanisms by using bottlenecks’, J. Intell. Fuzzy Syst., 2014, 26, (6), pp. 2771–2783.
-
17)
-
27. Khamis, M., Gomaa, W., Galal, B.: ‘Deep learning is competing random forest in computational docking’, , 2016.
-
18)
-
33. Ghazanfari, B., Mozayani, N.: ‘Extracting bottlenecks for reinforcement learning agent by holonic concept clustering and attentional functions’, Expert Syst. Appl., 2016, 54, pp. 61–77.
-
19)
-
14. van der Pol, E., Oliehoek, F.A.: ‘Coordinated deep reinforcement learners for traffic light control’, 2016.
-
20)
-
28. Li, L., Lv, Y., Wang, F.Y.: ‘Traffic signal timing via deep reinforcement learning’, IEEE/CAA J. Autom. Sin., 2016, 3, (3), pp. 247–254.
-
21)
-
2. Balaji, P., German, X., Srinivasan, D.: ‘Urban traffic signal control using reinforcement learning agents’, IET Intell. Transp. Syst., 2010, 4, (3), pp. 177–188.
-
22)
-
45. Mnih, V., Badia, A.P., Mirza, M., et al: ‘Asynchronous methods for deep reinforcement learning’. Int. Conf. Machine Learning, 2016.
-
23)
-
34. Lange, S., Riedmiller, M.: ‘Deep auto-encoder neural networks in reinforcement learning’. 2010 Int. Joint Conf. Neural Networks (IJCNN), 2010, pp. 1–8.
-
24)
-
7. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Human-level control through deep reinforcement learning’, Nature, 2015, 518, (7540), pp. 529–533.
-
25)
-
39. Baxter, J., Bartlett, P.L., Weaver, L.: ‘Experiments with infinite-horizon, policy gradient estimation’, J. Artif. Intell. Res., 2001, 15, pp. 351–381.
-
26)
-
47. Kingma, D., Ba, J.: ‘Adam: a method for stochastic optimization’, , 2014.
-
27)
-
23. Khamis, M.A., Gomaa, W.: ‘Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multiagent framework’, Eng. Appl. Artif. Intell., 2014, 29, pp. 134–151.
-
28)
-
3. Abdoos, M., Mozayani, N., Bazzan, A.L.: ‘Holonic multi-agent system for traffic signals control’, Eng. Appl. Artif. Intell., 2013, 26, (5), pp. 1575–1587.
-
29)
-
32. Mousavi, S.S., Ghazanfari, B., Mozayani, N., et al: ‘Automatic abstraction controller in reinforcement learning agent via automata’, Appl. Soft Comput., 2014, 25, pp. 118–128.
-
30)
-
6. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Playing atari with deep reinforcement learning’, , 2013.
-
31)
-
12. Silver, D., Huang, A., Maddison, C.J., et al: ‘Mastering the game of go with deep neural networks and tree search’, Nature, 2016, 529, (7587), pp. 484–489.
-
32)
-
35. Mousavi, S.S., Schukat, M., Howley, E.: ‘Deep reinforcement learning: an overview’. Intelligent Systems Conf., 2016.
-
33)
-
43. Thorpe, T.L., Anderson, C.W.: ‘Traffic light control using SARSA with three state representations’ (IBM Corporation, 1996).
-
34)
-
18. Brockfeld, E., Barlovic, R., Schadschneider, A., et al: ‘Optimizing traffic lights in a cellular automaton model for city traffic’, Phys. Rev. E, 2001, 64, (5), p. 056132.
-
35)
-
22. Arel, I., Liu, C., Urbanik, T., et al: ‘Reinforcement learning-based multi-agent system for network traffic signal control’, IET Intell. Transp. Syst., 2010, 4, (2), pp. 128–135.
-
36)
-
37. Lin, L.J.: ‘Reinforcement learning for robots using neural networks’ (Fujitsu Laboratories Ltd., 1993).
-
37)
-
19. Khamis, M.A., Gomaa, W., El.Mahdy, A., et al: ‘Adaptive traffic control system based on Bayesian probability interpretation’. Conf. Electronics, Communications and Computers (JEC-ECC), 2012, Japan-Egypt, 2012, pp. 151–156.
-
38)
-
42. Krajzewicz, D., Erdmann, J., Behrisch, M., et al: ‘Recent development and applications of SUMO-simulation of urban mobility’, Int. J. Adv. Syst. Meas., 2012, 5, (3&4), pp. 128–138.
-
39)
-
15. Genders, W., Razavi, S.: ‘Using a deep reinforcement learning agent for traffic signal control’, , 2016.
-
40)
-
36. Tsitsiklis, J.N., Van Roy, B.: ‘An analysis of temporal-difference learning with function approximation’, IEEE Trans. Autom. Control, 1997, 42, (5), pp. 674–690.
-
41)
-
41. Wierstra, D., Förster, A., Peters, J., et al: ‘Recurrent policy gradients’, Log. J. IGPL, 2010, 18, (5), pp. 620–634.
-
42)
-
11. Duggan, M., Duggan, J., Howley, E., et al: ‘An autonomous network aware VM migration strategy in cloud data centres’. 2016 Int. Conf. Cloud and Autonomic Computing (ICCAC), 2016, pp. 24–32.
-
43)
-
8. Sutton, R.S., Barto, A.G.: ‘Introduction to reinforcement learning’ (MIT Press, 1998).
-
44)
-
10. Duggan, M., Flesk, K., Duggan, J., et al: ‘A reinforcement learning approach for dynamic selection of virtual machines in cloud data centres’. Sixth Int. Conf. Innovating Computing Technology, 2016.
-
45)
-
29. Xu, X., Zuo, L., Huang, Z.: ‘Reinforcement learning algorithms with function approximation: recent advances and applications’, Inf. Sci., 2014, 261, pp. 1–31.
-
46)
-
13. LeCun, Y., Bottou, L., Bengio, Y., et al: ‘Gradient-based learning applied to document recognition’, Proc. IEEE, 1998, 86, (11), pp. 2278–2324.
-
47)
-
4. Li, L., Wen, D.: ‘Parallel systems for traffic control: a rethinking’, IEEE Trans. Intell. Transp. Syst., 2016, 17, (4), pp. 1179–1182.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-its.2017.0153
Related content
content/journals/10.1049/iet-its.2017.0153
pub_keyword,iet_inspecKeyword,pub_concept
6
6