Driver fatigue and inattention have long been recognised as the main contributing factors in traffic accidents. This study presents a novel system which applies convolutional neural network (CNN) to automatically learn and predict pre-defined driving postures. The main idea is to monitor driver hand position with discriminative information extracted to predict safe/unsafe driving posture. In comparison to previous approaches, CNNs can automatically learn discriminative features directly from raw images. In the authors' works, a CNN model was first pre-trained by an unsupervised feature learning method called sparse filtering, and subsequently fine-tuned with classification. The approach was verified using the Southeast University driving posture dataset, which comprised of video clips covering four driving postures, including normal driving, responding to a cell phone call, eating, and smoking. Compared with other popular approaches with different image descriptors and classification methods, the authors' scheme achieves the best performance with an overall accuracy of 99.78%. To evaluate the effectiveness and generalisation performance in more realistic conditions, the method was further tested using other two specially designed datasets which takes into account of the poor illuminations and different road conditions, achieving an overall accuracy of 99.3 and 95.77%, respectively.

References

1. 1)
  - 26. Simonyan, K., Zisserman, A.: ‘Two-stream convolutional networks for action recognition in videos’, in Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (Eds.): ‘Advances in neural information processing systems 27’ (Curran Associates Inc., 2014), pp. 568–576.
2. 2)
  - 1. WHO: ‘World report on road traffic injury prevention’, 2004. Available at: http://www.who.int/violence_injury_prevention/publications/road_traffic/world_report/en/.
3. 3)
  - 23. Hu, B., Lu, Z., Li, H., et al: ‘Convolutional neural network architectures for matching natural language sentences’, in Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (Eds.): ‘Advances in neural information processing systems 27’, (Curran Associates Inc., 2014), pp. 2042–2050.
4. 4)
  - 54. Olshausen, B.A., Fieldt, D.J.: ‘Sparse coding with an overcomplete basis set: a strategy employed by v1 ?.
5. 5)
  - 44. Simoncelli, E.: ‘Statistical models for images: compression, restoration and synthesis’. Conf. Record of the Thirty-First Asilomar Conf. on Signals, Systems Computers 1997, 1997, vol. 1, pp. 673–678.
6. 6)
  - 38. Bengio, Y., Simard, P., Frasconi, P.: ‘Learning long-term dependencies with gradient descent is difficult’, IEEE Trans. Neutral Netw.1994, 5, (2), pp. 157–166 (doi: 10.1109/72.279181).
7. 7)
  - 34. Ngiam, J., Chen, Z., Bhaskar, S.A., et al: ‘Sparse filtering’, in Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (Eds.): ‘Advances in neural information processing systems 24’ (Curran Associates Inc., 2011), pp. 1125–1133.
8. 8)
  - 36. Jin, J., Fu, K., Zhang, C.: ‘Traffic sign recognition with hinge loss trained convolutional neural networks’, IEEE Trans. Intell. Transp. Syst., 2014, 15, (5), pp. 1991–2000 (doi: 10.1109/TITS.2014.2308281).
9. 9)
  - 27. Zhang, N., Paluri, M., Ranzato, M., et al: ‘Panda: pose aligned networks for deep attribute modeling’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014, 2014, pp. 1637–1644.
10. 10)
  - 24. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘Imagenet classification with deep convolutional neural networks’. Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.
11. 11)
  - 55. Lee, H., Ekanadham, C., Ng, A.Y.: ‘Sparse deep belief net model for visual area v2’, in Platt, J., Koller, D., Singer, Y., Roweis, S. (Eds.): ‘Advances in neural information processing systems 20’ (Curran Associates Inc., 2008), pp. 873–880.
12. 12)
  - 49. Murphy, K.P.: ‘Machine learning: a probabilistic perspective, adaptive computation and machine learning’ (MIT Press, Cambridge, Mass, 2012).
13. 13)
  - 6. Bergasa, L., Nuevo, J., Sotelo, M., et al: ‘Real-time system for monitoring driver vigilance’, IEEE Trans. Intell. Transp. Syst., 2006, 7, (1), pp. 63–77 (doi: 10.1109/TITS.2006.869598).
14. 14)
  - 50. Online, http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/.
15. 15)
  - 30. Weinzaepfel, P., Revaud, J., Harchaoui, Z., et al: ‘Deepflow: large displacement optical flow with deep matching’. IEEE Int. Conf. on Computer Vision (ICCV), 2013, 2013, pp. 1385–1392, doi: 10.1109/ICCV.2013.175.
16. 16)
  - 29. Farabet, C., Couprie, C., Najman, L., et al: ‘Learning hierarchical features for scene labeling’, IEEE Trans. Pattern Anal. Mach. Intell., 2013, 35, (8), pp. 1915–1929 (doi: 10.1109/TPAMI.2012.231).
17. 17)
  - 58. Zhao, C., Zhang, B., He, J.: ‘Vision-based classification of driving postures by efficient feature extraction and bayesian approach’, J. Intell. Robot. Syst., 2013, 72, (3–4), pp. 483–495 (doi: 10.1007/s10846-012-9797-z).
18. 18)
  - 41. Zeiler, M.D., Fergus, R.: ‘Stochastic pooling for regularization of deep convolutional neural networks’, Available at: http://arxiv.org/abs/1301.3557 abs/1301.3557.
19. 19)
  - 40. Boureau, Y.-L., Ponce, J., Lecun, Y.: ‘A theoretical analysis of feature pooling in visual recognition’. Twenty-Seventh Int. Conf. on Machine Learning, Haifa, Israel, 2010.
20. 20)
  - 52. Yosinski, J., Clune, J., Bengio, Y., et al: ‘How transferable are features in deep neural networks?’, in Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (Eds.): ‘Advances in neural information processing systems 27’ (Curran Associates Inc., 2014), pp. 3320–3328.
21. 21)
  - 14. Hubel, D.H., Wiesel, T.N.: ‘Receptive fields of single neurones in the cat's striate cortex’, J. Physiol., 1959, 148, (3), p. 574 (doi: 10.1113/jphysiol.1959.sp006308).
22. 22)
  - 25. Krause, J., Gebru, T., Deng, J., et al: ‘Learning features and parts for fine-grained recognition’. Twenty-Second Int. Conf. on Pattern Recognition (ICPR), 2014, 2014, pp. 26–33, doi: 10.1109/ICPR.2014.15.
23. 23)
  - 53. Vincent, P., Larochelle, H., Bengio, Y., et al: ‘Extracting and composing robust features with denoising autoencoders’. Proc. Twenty-Fifth Int. Conf. on Machine Learning (ICML 2008), Helsinki, Finland, 5–9 June 2008, pp. 1096–1103.
24. 24)
  - 31. Yi, D., Lei, Z., Liao, S., et al: ‘Deep metric learning for person re-identification’. Twenty-Second Int. Conf. on Pattern Recognition (ICPR), 2014, 2014, pp. 34–39.
25. 25)
  - 13. Zhao, C., Zhang, B., He, J., et al: ‘Recognition of driving postures by contourlet transform and random forests’, IET Intell. Transp. Syst., 2012, 6, (2), pp. 161–168 (doi: 10.1049/iet-its.2011.0116).
26. 26)
  - 5. Watta, P., Lakshmanan, S., Hou, Y.: ‘Nonparametric approaches for estimating driver pose’, IEEE Trans. Veh. Technol., 2007, 56, (4), pp. 2028–2041 (doi: 10.1109/TVT.2007.897634).
27. 27)
  - 15. Veeraraghavan, H., Bird, N., Atev, S., et al: ‘Classifiers for driver activity monitoring’, Transp. Res. C, Emerg. Technol., 2007, 15, (1), pp. 51–67 (doi: 10.1016/j.trc.2007.01.001).
28. 28)
  - 12. Ji, Q., Zhu, Z., Lan, P.: ‘Real-time nonintrusive monitoring and prediction of driver fatigue’, IEEE Trans. Veh. Technol., 2004, 53, (4), pp. 1052–1068 (doi: 10.1109/TVT.2004.830974).
29. 29)
  - 32. Taigman, Y., Yang, M., Ranzato, M., et al: ‘Deepface: closing the gap to human-level performance in face verification’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014, 2014, pp. 1701–1708.
30. 30)
  - 56. Zhao, C., Gao, Y., He, J., et al: ‘Recognition of driving postures by multiwavelet transform and multilayer perceptron classifier’, Eng. Appl. Artif. Intell., 2012, 25, (8), pp. 1677–1686 (doi: 10.1016/j.engappai.2012.09.018).
31. 31)
  - 4. Tada, M., Noma, H., Utsumi, A., et al: ‘Elderly driver retraining using automatic evaluation system of safe driving skill’, IET Intell. Transp. Syst., 2014, 8, (3), pp. 266–272 (doi: 10.1049/iet-its.2013.0027).
32. 32)
  - 33. Sun, Y., Wang, X., Tang, X.: ‘Deep learning face representation from predicting 10,000 classes’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014, 2014, pp. 1891–1898, doi: 10.1109/CVPR.2014.244.
33. 33)
  - 14. Jemai, O., Teyeb, I., Bouchrika, T., et al: ‘A novel approach for drowsy driver detection using eyes recognition system based on wavelet network’, Int. J. Recent Contrib. Eng. Sci. IT (iJES), 2013, 1, (1), pp. 46–52 (doi: 10.3991/ijes.v1i1.2929).
34. 34)
  - 16. Cheng, S., Trivedi, M.: ‘Vision-based infotainment user determination by hand recognition for driver assistance’, IEEE Trans. Intell. Transp. Syst., 2010, 11, (3), pp. 759–764 (doi: 10.1109/TITS.2010.2049354).
35. 35)
  - 22. Mao, Q., Dong, M., Huang, Z., et al: ‘Learning salient features for speech emotion recognition using convolutional neural networks’, IEEE Trans. Multimed., 2014, 16, (8), pp. 2203–2213 (doi: 10.1109/TMM.2014.2360798).
36. 36)
  - 46. Pinto, N., Cox, D.D., DiCarlo, J.J.: ‘Why is real-world visual object recognition hard?’, PLOS Comput. Biol., 2008, 4, (1), p. 27 (doi: 10.1371/journal.pcbi.0040027).
37. 37)
  - 21. Abdel-Hamid, O., Mohamed, A.-R., Jiang, H., et al: ‘Convolutional neural networks for speech recognition’, IEEE/ACM Trans. Audio, Speech, Lang. Process., 2014, 22, (10), pp. 1533–1545 (doi: 10.1109/TASLP.2014.2339736).
38. 38)
  - 51. Erhan, D., Bengio, Y., Courville, A., et al: ‘Why does unsupervised pre-training help deep learning?’, J. Mach. Learn. Res., 2010, 11, pp. 625–660.
39. 39)
  - 47. Lyu, S., Simoncelli, E.: ‘Nonlinear image representation using divisive normalization’. IEEE Conf. on Computer Vision and Pattern Recognition, 2008. CVPR 2008, 2008, pp. 1–8, doi:10.1109/CVPR.2008.4587821.
40. 40)
  - 8. Murphy-Chutorian, E., Trivedi, M.: ‘Head pose estimation and augmented reality tracking: an integrated system and evaluation for monitoring driver awareness’, IEEE Trans. Intell. Transp. Syst., 2010, 11, (2), pp. 300–311 (doi: 10.1109/TITS.2010.2044241).
41. 41)
  - 20. Lecun, Y., Bottou, L., Bengio, Y., et al: ‘Gradient-based learning applied to document recognition’, Proc. IEEE, 1998, 86, (11), pp. 2278–2324 (doi: 10.1109/5.726791).
42. 42)
  - 43. Dong, Z., Pei, M., He, Y., et al: ‘Vehicle type classification using unsupervised convolutional neural network’. Twenty-Second Int. Conf. on Pattern Recognition (ICPR),2014, pp. 172–177, doi: 10.1109/ICPR.2014.39.
43. 43)
  - 2. Wu, B.-F., Chen, Y.-H., Yeh, C.-H.: ‘Driving behaviour-based event data recorder’, IET Intell. Transp. Syst., 2014, 8, (4), pp. 361–367 (doi: 10.1049/iet-its.2013.0009).
44. 44)
  - 19. Le, Q., Zou, W., Yeung, S., et al: ‘Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2011, 2011, pp. 3361–3368, doi: 10.1109/CVPR.2011.5995496.
45. 45)
  - 1. Hinton, G., Osindero, S.: ‘A fast learning algorithm for deep belief nets’, Neural Comput., 2006, 18, (7), pp. 1527–1554 (doi: 10.1162/neco.2006.18.7.1527).
46. 46)
  - 57. Zhao, C., Zhang, B., Zhang, X., et al: ‘Recognition of driving postures by combined features and random subspace ensemble of multilayer perceptron classifiers’, Neural Comput. Appl., 2013, 22, (1), pp. 175–184 (doi: 10.1007/s00521-012-1057-4).
47. 47)
  - 59. Bosch, A., Zisserman, A., Munoz, X.: ‘Representing shape with a spatial pyramid kernel’. Proc. 6th ACM Int. Conf. on Image and Video Retrieval, CIVR ‘07, ACM, New York, NY, USA, 2007, pp. 401–408.
48. 48)
  - 45. Bethge, M.: ‘Factorial coding of natural images: how effective are linear model in removing higher-order dependencies?’, J. Opt. Soc. Am. A, 2006, 23, (6), pp. 1253–1268 (doi: 10.1364/JOSAA.23.001253).
49. 49)
  - 39. Dugas, C., Bengio, Y., Bélisle, F., et al: ‘Incorporating second-order functional knowledge for better option pricing’, in Leen, T., Dietterich, T., Tresp, V. (Eds.): ‘Advances in neural information processing systems 13’ (MIT Press, 2001), pp. 472–478.
50. 50)
  - 35. Glorot, X., Bordes, A., Bengio, Y.: ‘Deep sparse rectifier neural networks’. Journal of Machine Learning Research 15 (Proc. 14th Int. Conf. on Artificial Intelligence and Statistics, AISTATS 2011), 2011, pp. 315–323.
51. 51)
  - 28. Girshick, R., Donahue, J., Darrell, T., et al: ‘Rich feature hierarchies for accurate object detection and semantic segmentation’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014, 2014, pp. 580–587.
52. 52)
  - 9. Doshi, A., Trivedi, M.: ‘On the roles of eye gaze and head dynamics in predicting driver's intent to change lanes’, IEEE Trans. Intell. Transp. Syst., 2009, 10, (3), pp. 453–462 (doi: 10.1109/TITS.2009.2026675).
53. 53)
  - 17. Tran, C., Doshi, A., Trivedi, M.M.: ‘Modeling and prediction of driver behavior by foot gesture analysis’, Comput. Vis. Image Underst., 2012, 116, (3), pp. 435–445 (doi: 10.1016/j.cviu.2011.09.008).
54. 54)
  - 42. Jarrett, K., Kavukcuoglu, K., Ranzato, M., et al: ‘What is the best multi-stage architecture for object recognition?’. IEEE 12th Int. Conf. on Computer Vision, 2009, pp. 2146–2153, doi: 10.1109/ICCV.2009.5459469.
55. 55)
  - 7. Cheng, S.Y., Park, S., Trivedi, M.M.: ‘Multi-spectral and multi-perspective video arrays for driver body tracking and activity analysis’, Comput. Vis. Image Underst., 2007, 2–3, (2C3), pp. 245–257 (doi: 10.1016/j.cviu.2006.08.010).
56. 56)
  - 10. Teyeb, I., Jemai, O., Zaied, M., et al: ‘A drowsy driver detection system based on a new method of head posture estimation’, in Corchado, E., Lozano, J., Quinti¢n, H., Yin, H. (Eds.): ‘Intelligent data engineering and automated learning C IDEAL 2014’, (LNCS, 8669) (Springer International Publishing, 2014), pp. 362–369.
57. 57)
  - 31. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: ‘Learning representations by back-propagating errors’, Nature, 1986, 323, (6088), pp. 533–536 (doi: 10.1038/323533a0).
58. 58)
  - 11. Teyeb, I., Jemai, O., Zaied, M., et al: ‘A novel approach for drowsy driver detection using head posture estimation and eyes recognition system based on wavelet network’. The 5th Int. Conf. on Information, Intelligence, Systems and Applications, IISA 2014, 2014, pp. 379–384.
59. 59)
  - 3. Jiménez, F., Naranjo, J., Gómez, O.: ‘Autonomous collision avoidance system based on accurate knowledge of the vehicle surroundings’, IET Intell. Transp. Syst., 2015, 9, (1), pp. 105–117 (doi: 10.1049/iet-its.2013.0118).
60. 60)
  - 20. Lowe, D.G.: ‘Distinctive image features from scale-invariant keypoints’, Int. J. Comput. Vis., 2004, 60, pp. 91–110 (doi: 10.1023/B:VISI.0000029664.99615.94).

Driving posture recognition by convolutional neural networks

References

Related content