access icon free Traffic light control using deep policy-gradient and value-function-based reinforcement learning

Recent advances in combining deep neural network architectures with reinforcement learning (RL) techniques have shown promising potential results in solving complex control problems with high-dimensional state and action spaces. Inspired by these successes, in this study, the authors built two kinds of RL algorithms: deep policy-gradient (PG) and value-function-based agents which can predict the best possible traffic signal for a traffic intersection. At each time step, these adaptive traffic light control agents receive a snapshot of the current state of a graphical traffic simulator and produce control signals. The PG-based agent maps its observation directly to the control signal; however, the value-function-based agent first estimates values for all legal control signals. The agent then selects the optimal control action with the highest value. Their methods show promising results in a traffic network simulated in the simulation of urban mobility traffic simulator, without suffering from instability issues during the training process.

Inspec keywords: road traffic control; adaptive control; traffic engineering computing; learning (artificial intelligence); digital simulation; gradient methods; control engineering computing

Other keywords: training process; complex control problems; high-dimensional state space; urban mobility traffic simulator; action spaces; PG-based agent maps; deep neural network architectures; traffic light control; graphical traffic simulator; traffic intersection; traffic signal; value-function-based reinforcement learning; deep policy-gradient RL algorithm; optimal control; control signals; value-function-based agent RL algorithms; adaptive traffic light control agents

Subjects: Knowledge engineering techniques; Self-adjusting control systems; Road-traffic system control; Traffic engineering computing; Control engineering computing; Optimisation techniques

References

    1. 1)
      • 26. Deng, L.: ‘A tutorial survey of architectures, algorithms, and applications for deep learning’, APSIPA Trans. Signal Inf. Process., 2014, 3, p.e2.
    2. 2)
      • 44. Williams, R.J.: ‘Simple statistical gradient-following algorithms for connectionist reinforcement learning’, Mach. Learn., 1992, 8, (3-4), pp. 229256.
    3. 3)
      • 24. Khamis, M.A., Gomaa, W.: ‘Enhanced multiagent multi-objective reinforcement learning for urban traffic light control’. 2012 11th Int. Conf. Machine Learning and Applications (ICMLA), 2012, vol. 1, pp. 586591.
    4. 4)
      • 1. Li, L., Wen, D., Yao, D.: ‘A survey of traffic control with vehicular communications’, IEEE Trans. Intell. Transp. Syst., 2014, 15, (1), pp. 425432.
    5. 5)
      • 40. Schulman, J., Heess, N., Weber, T., et al: ‘Gradient estimation using stochastic computation graphs’. Advances in Neural Information Processing Systems, 2015, pp. 35283536.
    6. 6)
      • 25. Ritcher, S.: ‘Traffic light scheduling using policy-gradient reinforcement learning’. Int. Conf. Automated Planning and Scheduling., 2007.
    7. 7)
      • 16. Wiering, M., et al: ‘Multi-agent reinforcement learning for traffic light control’. ICML, 2000, pp. 11511158.
    8. 8)
      • 5. El. Tantawy, S., Abdulhai, B., Abdelgawad, H.: ‘Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (marlin-atsc): methodology and large-scale application on downtown Toronto’, IEEE Trans. Intell. Transp. Syst., 2013, 14, (3), pp. 11401150.
    9. 9)
      • 38. Sutton, R.S., McAllester, D.A., Singh, S.P., et al: ‘Policy gradient methods for reinforcement learning with function approximation’. NIPS, 1999, vol. 99, pp. 10571063.
    10. 10)
      • 46. Degris, T., Pilarski, P.M., Sutton, R.S.: ‘Model-free reinforcement learning with continuous action in practice’. 2012 American Control Conf. (ACC), 2012, pp. 21772182.
    11. 11)
      • 21. Chin, Y.K., Bolong, N., Kiring, A., et al: ‘Q-learning based traffic optimization in management of signal timing plan’, Int. J. Simul. Syst. Sci. Technol., 2011, 12, (3), pp. 2935.
    12. 12)
      • 17. Abdulhai, B., Pringle, R., Karakoulas, G.J.: ‘Reinforcement learning for true adaptive traffic signal control’, J. Transp. Eng., 2003, 129, (3), pp. 278285.
    13. 13)
      • 30. Barto, A.G., Mahadevan, S.: ‘Recent advances in hierarchical reinforcement learning’, Discrete Event Dyn. Syst., 2003, 13, (4), pp. 341379.
    14. 14)
      • 20. Khamis, M.A., Gomaa, W., El.Shishiny, H.: ‘Multi-objective traffic light control system based on Bayesian probability interpretation’. 2012 15th Int. IEEE Conf. Intelligent Transportation Systems (ITSC), 2012, pp. 9951000.
    15. 15)
      • 9. Prashanth, L., Bhatnagar, S.: ‘Reinforcement learning with function approximation for traffic signal control’, IEEE Trans. Intell. Transp. Syst., 2011, 12, (2), pp. 412421.
    16. 16)
      • 31. Ghazanfari, B., Mozayani, N.: ‘Enhancing nash q-learning and team q-learning mechanisms by using bottlenecks’, J. Intell. Fuzzy Syst., 2014, 26, (6), pp. 27712783.
    17. 17)
      • 27. Khamis, M., Gomaa, W., Galal, B.: ‘Deep learning is competing random forest in computational docking’, arXiv preprint arXiv:160806665, 2016.
    18. 18)
      • 33. Ghazanfari, B., Mozayani, N.: ‘Extracting bottlenecks for reinforcement learning agent by holonic concept clustering and attentional functions’, Expert Syst. Appl., 2016, 54, pp. 6177.
    19. 19)
      • 14. van der Pol, E., Oliehoek, F.A.: ‘Coordinated deep reinforcement learners for traffic light control’, 2016.
    20. 20)
      • 28. Li, L., Lv, Y., Wang, F.Y.: ‘Traffic signal timing via deep reinforcement learning’, IEEE/CAA J. Autom. Sin., 2016, 3, (3), pp. 247254.
    21. 21)
      • 2. Balaji, P., German, X., Srinivasan, D.: ‘Urban traffic signal control using reinforcement learning agents’, IET Intell. Transp. Syst., 2010, 4, (3), pp. 177188.
    22. 22)
      • 45. Mnih, V., Badia, A.P., Mirza, M., et al: ‘Asynchronous methods for deep reinforcement learning’. Int. Conf. Machine Learning, 2016.
    23. 23)
      • 34. Lange, S., Riedmiller, M.: ‘Deep auto-encoder neural networks in reinforcement learning’. 2010 Int. Joint Conf. Neural Networks (IJCNN), 2010, pp. 18.
    24. 24)
      • 7. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Human-level control through deep reinforcement learning’, Nature, 2015, 518, (7540), pp. 529533.
    25. 25)
      • 39. Baxter, J., Bartlett, P.L., Weaver, L.: ‘Experiments with infinite-horizon, policy gradient estimation’, J. Artif. Intell. Res., 2001, 15, pp. 351381.
    26. 26)
      • 47. Kingma, D., Ba, J.: ‘Adam: a method for stochastic optimization’, arXiv preprint arXiv:14126980, 2014.
    27. 27)
      • 23. Khamis, M.A., Gomaa, W.: ‘Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multiagent framework’, Eng. Appl. Artif. Intell., 2014, 29, pp. 134151.
    28. 28)
      • 3. Abdoos, M., Mozayani, N., Bazzan, A.L.: ‘Holonic multi-agent system for traffic signals control’, Eng. Appl. Artif. Intell., 2013, 26, (5), pp. 15751587.
    29. 29)
      • 32. Mousavi, S.S., Ghazanfari, B., Mozayani, N., et al: ‘Automatic abstraction controller in reinforcement learning agent via automata’, Appl. Soft Comput., 2014, 25, pp. 118128.
    30. 30)
      • 6. Mnih, V., Kavukcuoglu, K., Silver, D., et al: ‘Playing atari with deep reinforcement learning’, arXiv preprint arXiv:13125602, 2013.
    31. 31)
      • 12. Silver, D., Huang, A., Maddison, C.J., et al: ‘Mastering the game of go with deep neural networks and tree search’, Nature, 2016, 529, (7587), pp. 484489.
    32. 32)
      • 35. Mousavi, S.S., Schukat, M., Howley, E.: ‘Deep reinforcement learning: an overview’. Intelligent Systems Conf., 2016.
    33. 33)
      • 43. Thorpe, T.L., Anderson, C.W.: ‘Traffic light control using SARSA with three state representations’ (IBM Corporation, 1996).
    34. 34)
      • 18. Brockfeld, E., Barlovic, R., Schadschneider, A., et al: ‘Optimizing traffic lights in a cellular automaton model for city traffic’, Phys. Rev. E, 2001, 64, (5), p. 056132.
    35. 35)
      • 22. Arel, I., Liu, C., Urbanik, T., et al: ‘Reinforcement learning-based multi-agent system for network traffic signal control’, IET Intell. Transp. Syst., 2010, 4, (2), pp. 128135.
    36. 36)
      • 37. Lin, L.J.: ‘Reinforcement learning for robots using neural networks’ (Fujitsu Laboratories Ltd., 1993).
    37. 37)
      • 19. Khamis, M.A., Gomaa, W., El.Mahdy, A., et al: ‘Adaptive traffic control system based on Bayesian probability interpretation’. Conf. Electronics, Communications and Computers (JEC-ECC), 2012, Japan-Egypt, 2012, pp. 151156.
    38. 38)
      • 42. Krajzewicz, D., Erdmann, J., Behrisch, M., et al: ‘Recent development and applications of SUMO-simulation of urban mobility’, Int. J. Adv. Syst. Meas., 2012, 5, (3&4), pp. 128138.
    39. 39)
      • 15. Genders, W., Razavi, S.: ‘Using a deep reinforcement learning agent for traffic signal control’, arXiv preprint arXiv:161101142, 2016.
    40. 40)
      • 36. Tsitsiklis, J.N., Van Roy, B.: ‘An analysis of temporal-difference learning with function approximation’, IEEE Trans. Autom. Control, 1997, 42, (5), pp. 674690.
    41. 41)
      • 41. Wierstra, D., Förster, A., Peters, J., et al: ‘Recurrent policy gradients’, Log. J. IGPL, 2010, 18, (5), pp. 620634.
    42. 42)
      • 11. Duggan, M., Duggan, J., Howley, E., et al: ‘An autonomous network aware VM migration strategy in cloud data centres’. 2016 Int. Conf. Cloud and Autonomic Computing (ICCAC), 2016, pp. 2432.
    43. 43)
      • 8. Sutton, R.S., Barto, A.G.: ‘Introduction to reinforcement learning’ (MIT Press, 1998).
    44. 44)
      • 10. Duggan, M., Flesk, K., Duggan, J., et al: ‘A reinforcement learning approach for dynamic selection of virtual machines in cloud data centres’. Sixth Int. Conf. Innovating Computing Technology, 2016.
    45. 45)
      • 29. Xu, X., Zuo, L., Huang, Z.: ‘Reinforcement learning algorithms with function approximation: recent advances and applications’, Inf. Sci., 2014, 261, pp. 131.
    46. 46)
      • 13. LeCun, Y., Bottou, L., Bengio, Y., et al: ‘Gradient-based learning applied to document recognition’, Proc. IEEE, 1998, 86, (11), pp. 22782324.
    47. 47)
      • 4. Li, L., Wen, D.: ‘Parallel systems for traffic control: a rethinking’, IEEE Trans. Intell. Transp. Syst., 2016, 17, (4), pp. 11791182.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-its.2017.0153
Loading

Related content

content/journals/10.1049/iet-its.2017.0153
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading