access icon free Missing traffic data: comparison of imputation methods

Many traffic management and control applications require highly complete and accurate data of traffic flow. However, because of various reasons such as sensor failure or transmission error, it is common that some traffic flow data are lost. As a result, various methods were proposed by using a wide spectrum of techniques to estimate missing traffic data in the last two decades. Generally, these missing data imputation methods can be categorised into three kinds: prediction methods, interpolation methods and statistical learning methods. To assess their performance, these methods are compared from different aspects in this paper, including reconstruction errors, statistical behaviours and running speeds. Results show that statistical learning methods are more effective than the other two kinds of imputation methods when data of a single detector is utilised. Among various methods, the probabilistic principal component analysis (PPCA) yields best performance in all aspects. Numerical tests demonstrate that PPCA can be used to impute data online before making further analysis (e.g. make traffic prediction) and is robust to weather changes.

Inspec keywords: road traffic control; probability; principal component analysis; interpolation; traffic engineering computing

Other keywords: prediction methods; transmission error; PPCA; probabilistic principal component analysis; sensor failure; traffic flow data prediction; traffic management applications; running speeds; interpolation methods; numerical tests; reconstruction errors; missing traffic data estimation; traffic control applications; data imputation methods; statistical behaviours; statistical learning methods

Subjects: Other topics in statistics; Knowledge engineering techniques; Traffic engineering computing; Interpolation and function approximation (numerical analysis)

References

    1. 1)
      • 20. Ni, D., Leonard, IIJ.D.: ‘Markov chain Monte Carlo multiple imputation using Bayesian networks for incomplete intelligent transportation systems data’, Transp. Res. Rec., 2005, 1935, (1), pp. 5767 (doi: 10.3141/1935-07).
    2. 2)
      • 27. Dempster, A.P., Laird, N.M., Rubin, D.B.: ‘Maximum likelihood from incomplete data via the EM algorithm’, J. R. Stat. Soc. B, 1977, 39, (1), pp. 138.
    3. 3)
      • 12. Vlahogianni, E.I., Karlaftis, M.G., Golias, J.C.: ‘Optimized and meta-optimized neural networks for short-term traffic flow prediction: a genetic approach’, Transp. Res. C, Emerg. Technol., 2005, 13, (3), pp. 211234 (doi: 10.1016/j.trc.2005.04.007).
    4. 4)
      • 23. Brockwell, P.J., Davis, R.A.: ‘Introduction to time series and forecasting’ (Springer-Verlag, New York, USA, 2002, 2nd edn.).
    5. 5)
      • 21. Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: ‘Markov chain Monte Carlo in practice’ (Chapman & Hall, London, 1996).
    6. 6)
      • 16. Yin, W., Tuite, P.M., Rakha, H.: ‘Imputing erroneous data of single-station loop detectors for nonincident conditions: comparison between temporal and spatial methods’, J. Intell. Transp. Syst., Technol. Plan. Oper., 2012, 16, (3), pp. 159176.
    7. 7)
      • 3. Qu, L., Li, L., Zhang, Y., Hu, J.: ‘PPCA-Based missing data imputation for traffic flow volume: a systematical approach’, IEEE Trans. Intell. Transp. Syst., 2009, 10, (3), pp. 512522 (doi: 10.1109/TITS.2009.2026312).
    8. 8)
      • 1. Turner, S., Albert, L., Gajewski, B., Eisele, W.: ‘Archived intelligent transportation system data quality: preliminary analyses of San Antonio TransGuide data’, Transp. Res. Rec., 2000, 1719, (1), pp. 7784 (doi: 10.3141/1719-10).
    9. 9)
      • 24. Ueda, N., Nakano, R., Ghahramani, Z., Hinton, G.E.: ‘Split and merge EM algorithm for improving Gaussian mixture density estimates’. Proc. IEEE Workshop on Neural Networks for Signal Processing, 1998, pp. 274283.
    10. 10)
      • 10. Dia, H.: ‘An object-oriented neural network approach to short-term traffic forecasting’, Eur. J. Oper. Res., 2001, 131, (2), pp. 253261 (doi: 10.1016/S0377-2217(00)00125-9).
    11. 11)
      • 4. Performance Measurement System, PeMS, University of California, Berkeley. Available athttp://pems.dot.ca.gov/, accessed April 2013.
    12. 12)
      • 2. Chen, C., Kwon, J., Rice, J., Skabardonis, A., Varaiya, P.: ‘Detecting errors and imputing missing data for single loop surveillance systems’, Transp. Res. Rec., 2002, 1855, (1), pp. 160167 (doi: 10.3141/1855-20).
    13. 13)
      • 8. Zhang, C.S., Sun, S., Yu, G.: ‘A Bayesian network approach to time series forecasting of short-term traffic flows’. Proc. IEEE Conf. Intelligent Transportation Systems, Washington, D.C., 2004, pp. 216221.
    14. 14)
      • 26. Tanner, M.A., Wong, W.H.: ‘The calculation of posterior distributions by data augmentation’, J. Am. Stat. Assoc., 1987, 82, (398), pp. 528540 (doi: 10.1080/01621459.1987.10478458).
    15. 15)
      • 30. http://www.wunderground.com/history, wunderground.com, accessed June 2012.
    16. 16)
      • 13. Castro-Neto, M., Jeong, Y.-S., Jeong, M.-K., Han, L.D.: ‘Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions’, Expert Syst. Appl., 2009, 36, (3), pp. 61646173 (doi: 10.1016/j.eswa.2008.07.069).
    17. 17)
      • 28. McLachlan, G., Krishnan, T.: ‘The EM algorithm and extensions’ (John Wiley & Sons, 2008, 2nd edn.).
    18. 18)
      • 25. Pearson, K.P.: ‘Contributions to the mathematical theory of evolution’, Philos. Trans.R. Soc. A, 1894, 185, pp. 71110 (doi: 10.1098/rsta.1894.0003).
    19. 19)
      • 7. Zhong, M., Sharma, S., Lingras, P.: ‘Genetically designed models for accurate imputations of missing traffic counts’, Transp. Res. Rec., 2004, 1879, (1), pp. 7179 (doi: 10.3141/1879-09).
    20. 20)
      • 29. Liu, Z., Sharma, S., Datla, S.: ‘Imputation of missing traffic data during holiday periods’, Transp. Plan. Technol., 2008, 31, (5), pp. 525544 (doi: 10.1080/03081060802364505).
    21. 21)
      • 17. Zhong, M., Sharma, S., Liu, Z.: ‘Assessing robustness of imputation models based on data from different jurisdictions: examples of Alberta and Saskatchewan, Canada’, Transp. Res. Rec., 2005, 1917, (1), pp. 116126 (doi: 10.3141/1917-14).
    22. 22)
      • 15. Chen, C., Wang, Y., Li, L., Hu, J., Zhang, Z.: ‘The retrieval of intra-day trend and its influence on traffic prediction’, Transp. Res. C, Emerg. Technol., 2012, 22, pp. 103118 (doi: 10.1016/j.trc.2011.12.006).
    23. 23)
      • 18. Troyanskaya, O., Cantor, M., Sherlock, G., et al: ‘Missing value estimation methods for DNA microarrays’, Bioinformatics, 2001, 17, (6), pp. 520525 (doi: 10.1093/bioinformatics/17.6.520).
    24. 24)
      • 14. Jin, X., Zhang, Y., Yao, D.: ‘Simultaneously prediction of network traffic flow based on PCA-SVR’. Lect. Notes Comput. Sci., 2007, 4492, pp. 10221031 (doi: 10.1007/978-3-540-72393-6_121).
    25. 25)
      • 5. Ahmed, M.S., Cook, A.R.: ‘Analysis of freeway traffic time-series data by using Box-Jenkins techniques’, Transp. Res. Rec., 1979, (722), pp. 19.
    26. 26)
      • 11. Dougherty, M.: ‘A review of neural networks applied to transport’, Transp. Res. C, Emerg. Technol., 1995, 3, (4), pp. 247260 (doi: 10.1016/0968-090X(95)00009-8).
    27. 27)
      • 20. Ni, D., Leonard, II, J.D.: ‘Markov chain Monte Carlo multiple imputation using Bayesian networks for incomplete intelligent transportation systems data’, Transp. Res. Rec., 2005, 1935, (1), pp. 5767 (doi: 10.3141/1935-07).
    28. 28)
      • 19. Kim, H., Golub, G.H., Park, H.: ‘Missing value estimation methods for DNA microarrays gene expression data: local least squares imputation’, Bioinformatics, 2005, 21, (2), pp. 187198 (doi: 10.1093/bioinformatics/bth499).
    29. 29)
      • 22. Tipping, M.E., Bishop, C.M.: ‘Mixtures of probabilistic principal component analyzers’, Neural Comput., 1999, 11, (2), pp. 443482 (doi: 10.1162/089976699300016728).
    30. 30)
      • 6. Lee, S., Fambro, D.: ‘Application of the subset ARIMA model for short-term freeway traffic volume forecasting’, Transp. Res. Rec., 1999, 1678, (1), pp. 179188 (doi: 10.3141/1678-22).
    31. 31)
      • 9. Ghosh, B., Basu, B., O'Mahony, M.: ‘Bayesian time-series model for short-term traffic flow forecasting’, ASCE J. Transp. Eng., 2007, 133, (3), pp. 180189 (doi: 10.1061/(ASCE)0733-947X(2007)133:3(180)).
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-its.2013.0052
Loading

Related content

content/journals/10.1049/iet-its.2013.0052
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading