The scheduling of traffic signal at intersections is involved in an application of artificial intelligence system. This study presents a new forward search algorithm based on dynamic programming (FSDP) under a decision tree, and explores an efficient solution for real-time adaptive traffic signal control policy. Traffic signal control with cases of fixed phase sequence and variable phase sequence are both considered in the algorithm. Owing to the properties of forward research dynamic programming and the process optimisation of repeated or invalid traffic states the authors proposed, FSDP algorithm reduces the number of states and saves much computation time. Consequently, FSDP is certain to be an on-line algorithm through its application to a complicated traffic control problem. Moreover, the labelled position method is firstly proposed in the author's study to search the optimal policy after reaching the goal state. For practical operations, this new algorithm is extended by adding the rolling horizon approach, and some derived methods are compared with the optimal fixed-time control and adaptive control on the evaluation of traffic delay. Experimental results obtained by the simulations of symmetrical and asymmetrical traffic flow scenarios show that the FSDP method can perform quite well with high efficiency and good qualities in traffic control.

References

1. 1)
  - 28. Hansen, E.A., Zilberstein, S.: ‘Lao*: A heuristic search algorithm that finds solutions with loops’, Artif. Intell., 2001, 129, (1), pp. 35–62 (doi: 10.1016/S0004-3702(01)00106-0).
2. 2)
  - 20. Xu, X., Zuo, L., Huang, Z.H.: ‘Reinforcement learning algorithms with function approximation: recent advances and applications’, Inf. Sc., 2014, 261, pp. 1–31 (doi: 10.1016/j.ins.2013.08.037).
3. 3)
  - I. Arel , C. Liu , T. Urbanik , A.G. Kohls . Reinforcement learning-based multi-agent system for network traffic signal control. IET Intell. Transp. Syst. , 2 , 128 - 135
4. 4)
  - 6. Ceylan, H., Bell, M.G.: ‘Traffic signal timing optimisation based on genetic algorithm approach, including drivers’ routing’, Transp. Res. B, 2004, 38, (4), pp. 329–342 (doi: 10.1016/S0191-2615(03)00015-8).
5. 5)
  - 21. Si, J., Barto, A.G., Powell, W.B., Wunsch, D.C.: ‘Handbook of learning and approximate dynamic programming’ (IEEE Press, 2004).
6. 6)
  - 8. Kuyer, L., Whiteson, S., Bakker, B., Vlassis, N.: ‘Multiagent reinforcement learning for urban traffic control using coordination graphs’. in Daelemans, W., Morik, K. (Eds.): ‘Machine learning and knowledge discovery in databases’, (Springer, 2008), pp. 656–671.
7. 7)
  - 17. Yu, X.H., Recker, W.W.: ‘Stochastic adaptive control model for traffic signal systems’, Transp. Res. C, 2006, 14, (4), pp. 263–282 (doi: 10.1016/j.trc.2006.08.002).
8. 8)
  - 2. Haijema, R., van der Wal, J.: ‘An mdp decomposition approach for traffic control at isolated signalized intersections’, Probab. Eng. Inf. Sci., 2008, 22, (4), pp. 587–602 (doi: 10.1017/S026996480800034X).
9. 9)
  - 32. Smith, T., Simmons, R.: ‘Focused real-time dynamic programming for mdps: squeezing more out of a heuristic’. Proc. Natl. Conf. on Artificial Intelligence, 2006, vol. 21, no. 2, pp. 1227–1232.
10. 10)
  - 19. Khan, S.G., Herrmann, G., Lewis, F.L., Pipe, T., Melhuish, C.: ‘Reinforcement learning and optimal adaptive control: an overview and implementation examples’, Annu. Rev. Control, 2012, 36, (1), pp. 42–59 (doi: 10.1016/j.arcontrol.2012.03.004).
11. 11)
  - 1. Cheng, D., Messer, C.J., Tian, Z.Z., Liu, J.: ‘Modification of Webster's minimum delay cycle length equation based on hcm 2000’. 81th Ann. Meeting of the Transportation Research Board, Washington, DC, 2003.
12. 12)
  - 25. Li, T., Zhao, D.B., Yi, J.Q.: ‘Adaptive dynamic programming for multi-intersections traffic signal intelligent control’. 11th Int. IEEE Conf. on Intelligent Transportation Systems (ITSC), 2008, pp. 286–291.
13. 13)
  - 13. Head, K.L., Mirchandani, P.B., Sheppard, D.: ‘Hierarchical framework for real-time traffic control’, Transp. Res. Rec., 1992, 1360, pp. 82–88.
14. 14)
  - 18. Gartner, N.H., Pooran, F.J., Andrews, C.M.: ‘Evaluation of optimized policies for adaptive control strategy’, Transp. Res. Rec., 1991, 1324, pp. 105–114.
15. 15)
  - 11. Gartner, N.H., Pooran, F.J., Andrews, C.M.: ‘Implementation of the opac adaptive control strategy in a traffic signal network’. Proc. IEEE Int. Conf. Intelligent Transportation Systems, 2001, pp. 195–200.
16. 16)
  - 10. Gartner, N.H.: ‘Opac: A demand-responsive strategy for traffic signal control’, Transp. Res. Rec., 1983, 906, pp. 75–81.
17. 17)
  - 31. McMahan, H.B., Likhachev, M., Gordon, G.J.: ‘Bounded real-time dynamic programming: Rtdp with monotone upper bounds and performance guarantees’. Proc. 22th Int. Conf. on Machine learning, 2005, pp. 569–576.
18. 18)
  - 27. Cai, C., Wong, C.K., Heydecker, B.G.: ‘Adaptive traffic signal control using approximate dynamic programming’, Transp. Res. C, 2009, 17, (5), pp. 456–474 (doi: 10.1016/j.trc.2009.04.005).
19. 19)
  - 29. Bonet, B., Geffner, H.: ‘Labeled rtdp: improving the convergence of real-time dynamic programming’. Int. Conf. on Automated Planning and Scheduling (ICAPS), 2003, vol. 3, pp. 12–21.
20. 20)
  - 16. Yin, B., Dridi, M., EL Moudni, A.: ‘Markov decision process for traffic control at an isolated intersection’. IEEE 25th Int. Conf. on Tools with Artificial Intelligence (ICTAI), 2013, pp. 789–794.
21. 21)
  - 4. Mirchandani, P., Head, L.: ‘A real-time traffic signal control system: architecture, algorithms, and analysis’, Transp. Res. C, Emerg. Technol., 2001, 9, (6), pp. 415–432 (doi: 10.1016/S0968-090X(00)00047-4).
22. 22)
  - D. Srinivasan , M.C. Choy , R.L. Cheu . Neural networks for real-time traffic signal control. IEEE Trans. Intell. Transp. Syst. , 261 - 272
23. 23)
  - 24. Prashanth, L., Bhatnagar, S.: ‘Reinforcement learning with function approximation for traffic signal control’, IEEE Trans. Intell. Transp. Syst., 2011, 12, (2), pp. 412–421 (doi: 10.1109/TITS.2010.2091408).
24. 24)
  - 12. Henry, J.J., Farges, J.L., Tuffal, J.: ‘The prodyn real time traffic algorithm’. IFAC-IFIP-IFORS Conf. on Control in Transportation System, 1984.
25. 25)
  - 9. Bazzan, A.L., de Oliveira, D., da Silva, B.C.: ‘Learning in groups of traffic signals’, Eng. Appl. Artif. Intell., 2010, 23, (4), pp. 560–568 (doi: 10.1016/j.engappai.2009.11.009).
26. 26)
  - 26. Yin, B., Dridi, M., EL Moudni, A.: ‘Approximate dynamic programming for traffic signal control at isolated intersection’. inSilhavy, R., Senkerik, R., Oplatkova, Z.K., Silhavy, P., Prokopova, Z. (Eds.): ‘Modern trends and techniques in computer science’ (Springer, 2014), pp. 369–381.
27. 27)
  - M.C. Choy , D. Srinivasan , R. Cheu . Cooperative, hybrid agent architecture for real-time traffic signal control. IEEE Trans. Syst. Man Cybern. A, Syst. Humans , 5 , 597 - 607
28. 28)
  - 22. Powell, W.B.: ‘Approximate dynamic programming: solving the curses of dimensionality’ (John Wiley & Sons, 2007, 2nd edn.), p. 703.
29. 29)
  - 4. Trabia, M.B., Kaseko, M.S., Ande, M.: ‘A two-stage fuzzy logic controller for traffic signals’, Transp. Res. C, 1999, 7, (6), pp. 353–367 (doi: 10.1016/S0968-090X(99)00026-1).
30. 30)
  - 7. Garca-Nieto, J., Alba, E., Carolina Olivera, A.: ‘Swarm intelligence for traffic light scheduling: application to real urban areas’, Eng. Appl. Artif. Intell., 2012, 25, (2), pp. 274–283 (doi: 10.1016/j.engappai.2011.04.011).
31. 31)
  - 30. Barto, A.G., Bradtke, S.J., Singh, S.P.: ‘Learning to act using real-time dynamic programming’, Artif. Intell., 1995, 72, (1), pp. 81–138 (doi: 10.1016/0004-3702(94)00011-O).
32. 32)
  - 15. Busoniu, L., Babuska, R., De Schutter, B., Ernst, D.: ‘Reinforcement learning and dynamic programming using function approximators’ (CRC Press, 2010).

Forward search algorithm based on dynamic programming for real-time adaptive traffic signal control

References

Related content