This study addresses the problem of holistic road scene understanding based on the integration of visual and range data. To achieve the grand goal, the authors propose an approach that jointly tackles object-level image segmentation and semantic region labelling within a conditional random field (CRF) framework. Specifically, the authors first generate semantic object hypotheses by clustering 3D points, learning their prior appearance models, and using a deep learning method for reasoning their semantic categories. The learned priors, together with spatial and geometric contexts, are incorporated in CRF. With this formulation, visual and range data are fused thoroughly, and moreover, the coupled segmentation and semantic labelling problem can be inferred via graph cuts. The authors’ approach is validated on the challenging KITTI dataset that contains diverse complicated road scenarios. Both quantitative and qualitative evaluations demonstrate its effectiveness.

References

1. 1)
  - 18. Gonfaus, J.M., Boix, X., Van de Weijer, J., et al: ‘Harmony potentials for joint classification and segmentation’. IEEE Conf. Computer Vision and Pattern Recognition, San Francisco, USA, 2010, pp. 3280–3287.
2. 2)
  - 20. Lin, D., Fidler, S., Urtasun, R.: ‘Holistic scene understanding for 3D object detection with RGBD cameras’. Proc. IEEE Int. Conf. Computer Vision, Sydney, Australia, 2013.
3. 3)
  - 6. Jia, Y., Zhang, C.: ‘Front-view vehicle detection by markov chain monte carlo method’, Pattern Recognit., 2009, 42, (3), pp. 313–321.
4. 4)
  - 1. Alvarez, J.M., Gevers, T., Lopez, A.M.: ‘3D scene priors for road detection’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, San Francisco, USA, 2010, pp. 57–64.
5. 5)
  - 10. Alvarez, J.M., Gevers, T., LeCun, Y., et al: ‘Road scene segmentation from a single image’. Proc. European Conf. Computer Vision, Firenze, Italy, 2012, pp. 376–389.
6. 6)
  - 17. Hane, C., Zach, C., Cohen, A., et al: ‘Joint 3D scene reconstruction and class segmentation’. Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, Portland, USA, 2013, pp. 97–104.
7. 7)
  - 15. Bleyer, M., Rother, C., Kohli, P., et al: ‘Object stereo - Joint stereo matching and object segmentation’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Colorado Springs, USA, 2011, pp. 3081–3088.
8. 8)
  - 25. Boykov, Y., Jolly, M.P.: ‘Interactive graph cuts for optimal boundary & region segmentation of objects in ND images’. Proc. IEEE Int. Conf. Computer Vision, Vancouver, Canada, 2001, pp. 105–112.
9. 9)
  - 5. Nguyen, T.H.B., Kim, H.: ‘Novel and efficient pedestrian detection using bidirectional PCA’, Pattern Recognit., 2013, 46, (8), pp. 2220–2227.
10. 10)
  - 21. Li, C., Kowdle, A., Saxena, A., et al: ‘Toward holistic scene understanding feedback enabled cascaded classification models’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, (7), pp. 1394–1408.
11. 11)
  - 23. Ladick'y, L., Sturgess, P., Alahari, K., et al: ‘What, where and how many? Combining object detectors and crfs’. Proc. of the European Conf. on Computer Vision, Hersonissos, Greece, 2010, pp. 424–437.
12. 12)
  - 3. Benenson, R., Mathias, M., Timofte, R., et al: ‘Pedestrian detection at 100 frames per second’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Providence, USA, 2012, pp. 2903–2910.
13. 13)
  - 28. Fischler, M.A., Bolles, R.C.: ‘Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography’, Communications of the ACM, 1981, 24, (6), pp. 381–395.
14. 14)
  - 22. Yao, J., Fidler, S., Urtasun, R.: ‘Describing the scene as a whole: joint object detection, scene classification and semantic segmentation’. Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, Providence, USA, 2012, pp. 702–709.
15. 15)
  - 29. PCL, Euclidean cluster extraction, http://www.pointclouds.org/documentation/tutorials/cluster extraction.php, 2013.
16. 16)
  - 9. Guo, C., Mita, S., McAllester, D.: ‘Hierarchical road understanding for intelligent vehicles based on sensor fusion’. Proc. Int. IEEE Conf. Intelligent Transportation Systems, Washington, D.C., USA, 2011, pp. 1672–1679.
17. 17)
  - 13. Matzen, K., Snavely, N.: ‘NYC3DCars: a dataset of 3D vehicles in geographic context’. Proc. IEEE Int. Conf. Computer Vision, Sydney, Australia, 2013.
18. 18)
  - 14. Geiger, A., Lenz, P., Urtasun, R.: ‘Are we ready for autonomous driving? The KITTI vision benchmark suite’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Providence, USA, 2012, pp. 3354–3361.
19. 19)
  - 27. Douillard, B., Underwood, J., Kuntz, N., et al: ‘On the segmentation of 3d lidar point clouds’. Proc. IEEE Int. Conf. Robotics and Automation, Shanghai, China, 2011, pp. 2798–2805.
20. 20)
  - 2. Huang, W., Gong, X., Liu, J.: ‘Integrating visual and range data for road detection’. Proc. IEEE Int. Conf. Image Processing, Melbourne, Australia, 2013.
21. 21)
  - 7. Cheng, H., Wang, R.: ‘Semantic modeling of natural scenes based on contextual Bayesian networks’, Pattern Recognit., 2010, 43, (12), pp. 4042–4054.
22. 22)
  - 19. Heitz, G., Gould, S., Saxena, A., et al: ‘Cascaded classification models: combining models for holistic scene understanding’. Advances in Neural Information Processing Systems, Vancouver, Canada, 2008, pp. 641–648.
23. 23)
  - 12. Jung, C., Kim, C.: ‘Real-time estimation of 3D scene geometry from a single image’, Pattern Recognit., 2012, 45, (9), pp. 3256–3269.
24. 24)
  - 26. Liu, J., Gong, X.: ‘Guided depth enhancement via anisotropic diffusion’. Advances in Multimedia Information Processing–PCM, Tokyo, Japan, 2013, pp. 408–417.
25. 25)
  - 31. Socher, R., Huval, B., Bath, B., et al: ‘Convolutional recursive deep learning for 3D object classification’. Advances in Neural Information Processing Systems, Lake Tahoe, USA, 2012, pp. 665–673.
26. 26)
  - 8. Levinkov, E., Fritz, M.: ‘Sequential Bayesian model update under structured scene prior for semantic road scenes labeling’. Proc. IEEE Int. Conf. Computer Vision, Sydney, Australia, 2013.
27. 27)
  - 32. Rother, C., Kolmogorov, V., Blake, A.: ‘Grabcut: interactive foreground extraction using iterated graph cuts’, ACM Trans. Graph., 2004, 23, (3), pp. 309–314.
28. 28)
  - 24. Tighe, J., Lazebnik, S.: ‘Understanding scenes on many levels’. Proc. IEEE Int. Conf. Computer Vision, Barcelona, Spain, 2011, pp. 335–342.
29. 29)
  - 33. Wenqi, H., Xiaojin, G.: ‘Fusion based holistic road scene understanding’, arXiv:1406.7525.
30. 30)
  - 16. Ladick'y, L., Sturgess, P., Russell, C., et al: ‘Joint optimization for object class segmentation and dense stereo reconstruction’, Int. J. Comput. Vis., 2012, 100, (2), pp. 122–133.
31. 31)
  - 11. Huang, W., Gong, X., Xiang, Z.: ‘Road scene segmentation via fusing camera and lidar data’. Proc. Int. Conf. Intelligent Robotics and Automation, Hong Kong, China, 2014.
32. 32)
  - 30. Rusu, R.B., Cousins, S.: ‘3d is here: point cloud library (pcl)’. Proc. IEEE Int. Conf. Robotics and Automation, Shanghai, China, 2011, pp. 1–4.
33. 33)
  - 4. Liu, Y., Guo, J., Chang, C.: ‘Low resolution pedestrian detection using light robust features and hierarchical system’, Pattern Recognit., 2014, 47, (4), pp. 1616–1625.

Fusion-based holistic road scene understanding

References

Related content