Combining 2D and 3D features to improve road detection based on stereo cameras

Guorong Cai; Songzhi Su; Wenli He; Yundong Wu; Shaozi Li

Combining 2D and 3D features to improve road detection based on stereo cameras

View Fulltext

Author(s): Guorong Cai^{1, 2} ; Songzhi Su³ ; Wenli He³ ; Yundong Wu^{1, 2} ; Shaozi Li³
- Affiliations: 1: Computer Engineering College , Jimei University , Xiamen City , People's Republic of China ;
  2: Fujian Collaborative Innovation Center for Big Data Applications in Governments , Fuzhou 350003 , People's Republic of China ;
  3: Department of Intelligent Science and Technology , Xiamen University , Xiamen City , People's Republic of China
Source: Volume 12, Issue 6, September 2018, p. 834 – 843
DOI: 10.1049/iet-cvi.2017.0266 , Print ISSN 1751-9632, Online ISSN 1751-9640

Received 15/05/2017, Accepted 04/04/2018, Revised 05/02/2018, Published 10/04/2018

Road detection is a fundamental component of autonomous driving systems since it provides valid space and candidate regions of objects for driving decision. The core of road detection methods is extracting effective and discriminative features. Since two-dimensional (2D) and 3D features are complementary, the authors propose a robust multi-feature combination and optimisation framework for stereo image pairs, called Feature++. First, several 2D and 3D features such as Gabor and plane are, respectively, extracted after the generation of 2D super-pixel and a 3D depth image from stereo matching. Second, the combined features are fed into a three-layer shallow neural network classifier to decide whether a super-pixel is road region or not. Finally, the classified results are further refined using fully connected conditional random field (CRF), taking the content information into consideration. We extensively evaluate the performance of four 2D features, four 3D features, and their combinations. Experiments conducted on the KITTI ROAD benchmark show that (i) the combinations of 2D and 3D features greatly improve the road detection performance and (ii) using CRF as a refinement step is necessary. Overall, their proposed ‘Feature + +’ method outperforms most manually designed features, and is comparable with state-of-the-art methods that are based on deep learning methods.

References

1. 1)
  - 15. Fritsch, J., Kuehnl, T., Kummert, F.: ‘Monocular road terrain detection by combining visual and spatial information’, IEEE Trans. Intell. Transp. Syst., 2014, 15, (4), pp. 1586–1596.
2. 2)
  - 41. Iandola, F.N., Han, S., Moskewicz, M.W., et al: ‘SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size’. arXiv preprint arXiv:1602.07360, 2016. [Online] Available at https://arxiv.org/pdf/1602.07360.pdf, accessed November 2016.
3. 3)
  - 38. Chen, L.C., Papandreou, G., Kokkinos, I., et al: ‘Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs’. arXiv:1606.00915. [Online] Available at https://arxiv.org/abs/1606.00915v2, accessed May 2017.
4. 4)
  - 21. Thrun, S., Montemerlo, M., Dahlkamp, H., et al: ‘Stanley: the robot that won the DARPA grand challenge’ (Stanford University, Stanford, CA, 2007), pp. 1–43.
5. 5)
  - 16. Mohan, R.: ‘Deep deconvolutional networks for scene parsing’. arXiv: Machine Learning, November 2014. Available at https://arxiv.org/pdf/1411.4101.pdf, accessed November 2014.
6. 6)
  - 29. Yamaguchi, K., McAllester, D., Urtasun, R.: ‘Efficient joint segmentation, occlusion labeling, stereo and flow estimation’. Proc. European Conf. Computer Vision (ECCV), Zurich, Switzerland, September 2014, pp. 756–771.
7. 7)
  - 25. Badino, H., Franke, U., Mester, R.: ‘Free space computation using stochastic occupancy grids and dynamic programming’. Proc. IEEE Int. Conf. Computer Vision (ICCV) Workshop on Dynamical Vision, Rio de Janeiro, Brazil, October 2007, pp. 1–8.
8. 8)
  - 32. Shotton, J., Winn, J., Rother, C., et al: ‘Textonboost: joint appearance, shape and context modeling for multi-class object recognition and segmentation’. Proc. European Conf. Computer Vision (ECCV), Graz, Austria, May 2006, pp. 1–15.
9. 9)
  - 9. Alvarez, J., Lopez, A.M.: ‘Road detection based on illuminant invariance’, IEEE Trans. Intell. Transp. Syst., 2011, 12, (1), pp. 184–193.
10. 10)
  - 36. Fritsch, J., Kuehnl, T., Geiger, A.: ‘A new performance measure and evaluation benchmark for road detection algorithms’. Proc. IEEE Int. Conf. Intelligent Transportation Systems (ITSC), Hague, Netherlands, October 2013, pp. 1–8.
11. 11)
  - 34. Alvarez, J.M., Salzmann, M., Barnes, N.: ‘Exploiting large image sets for road scene parsing’, IEEE Trans. Intell. Transp. Syst., 2016, 17, (9), pp. 2456–2465.
12. 12)
  - 40. Liu, W., Anguelov, D., Erhan, D., et al: ‘SSD: single shot multibox detector’. European Conf. Computer Vision, Amsterdam, The Netherlands, October 2016, pp. 21–37.
13. 13)
  - 6. Gabor, D.: ‘Theory of communication – part 1: the analysis of information’, J. Inst. Electr. Eng. III, 2010, 93, (26), pp. 429–441.
14. 14)
  - 7. Bo, L., Ren, X., Fox, D.: ‘Kernel descriptors for visual recognition’. Proc. IEEE Int. Conf. Neural Information Processing Systems (NIPS), Vancouver, Canada, December 2010, pp. 244–252.
15. 15)
  - 30. Philipp, K., Vladlen, K.: ‘Efficient inference in fully connected CRFs with Gaussian edge potentials’. Proc. IEEE Int. Conf. Neural Information Processing Systems (NIPS), Granada, Spain, December 2011, pp. 1–9.
16. 16)
  - 3. Oliveira, G., Burgard, W., Brox, T.: ‘Efficient deep methods for monocular road segmentation’. Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), Daejeon, South Korea, October 2016, pp. 1–8.
17. 17)
  - 5. Chen, X., Kundu, K., Zhu, Y., et al: ‘3D object proposals for accurate object class detection’. Proc. IEEE Int. Conf. Neural Information Processing Systems (NIPS), Montréal, Canada, December 2015, pp. 424–432.
18. 18)
  - 8. He, W.L., Cai, G.R., Zhong, Z., et al: ‘Feature++: cross dimension feature fusion for road detection’. Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, USA, March 2017, pp. 1–6.
19. 19)
  - 23. Chen, T., Dai, B., Wang, R., et al: ‘Gaussian-process-based real time ground segmentation for autonomous land vehicles’, J. Intell. Robot. Syst. (JINT), 2014, 76, (3), pp. 563–582.
20. 20)
  - 2. Kocamaz, M., Navarro-Serment, L., Hebert, M.: ‘Map-supervised road detection’. Proc. IEEE Int. Conf. Intelligent Vehicles Symp., Gothenburg, Sweden, June 2016, pp. 1–6.
21. 21)
  - 27. Suhr, J.K., Jung, H.G.: ‘Dense stereo-based robust vertical road profile estimation using Hough transform and dynamic programming’, IEEE Trans. Intell. Transp. Syst., 2015, 16, (3), pp. 1528–1536.
22. 22)
  - 10. Wang, B., Frémont, V., Rodriguez, S.: ‘Color-based road detection and its evaluation on the KITTI road benchmark’. Proc. IEEE Int. Conf. Intelligent Vehicles Symp., Dearborn, MI, June 2014, pp. 31–36.
23. 23)
  - 1. Gao, J., Wang, Q., Yuan, Y.: ‘Embedding structured contour and location prior in siamesed fully convolutional networks for road detection’. Proc. IEEE Int. Conf. Robotics and Automation (ICRA), Singapore, May 2017, pp. 1–7.
24. 24)
  - 19. Xiao, L., Dai, B., Liu, D., et al: ‘CRF based road detection with multi-sensor fusion’. Proc. IEEE Int. Conf. Intelligent Vehicles Symp. (IV), Seoul, South Korea, June 2015, pp. 192–198.
25. 25)
  - 12. Alvarez, J.M., Gevers, T., LeCun, Y., et al: ‘Road scene segmentation from a single image’. Proc. European Conf. Computer Vision (ECCV), Firenze, Italy, October 2012, pp. 376–389.
26. 26)
  - 18. Alon, Y., Ferencz, A., Shashua, A.: ‘Off-road path following using region classification and geometric projection constraints’. Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), Washington, DC, USA, June 2006, pp. 689–696.
27. 27)
  - 44. Romera, E., Alvarez, J.M., Bergasa, L.M., et al: ‘Efficient convnet for real-time semantic segmentation’. Proc. IEEE Intelligent Vehicles Symp. (IV), Los Angeles, CA, USA, July 2017, pp. 1789–1794.
28. 28)
  - 33. Alvarez, J.M., Salzmann, M., Barnes, N.: ‘Large-scale semantic co-labeling of image sets’. Proc. IEEE Winter Conf. Applications of Computer Vision (WACV), Steamboat Springs CO., March 2014, pp. 501–508.
29. 29)
  - 20. Vitor, G., Lima, D., Victorino, A., et al: ‘A 2D/3D vision based approach applied to road detection in urban environments’, IEEE. Intell. Veh. Symp. (IV), 2013, 36, (1), pp. 952–957.
30. 30)
  - 17. Dahlkamp, H., Kaehler, A., Stavens, D., et al: ‘Self-supervised monocular road detection in desert terrain’. Proc. IEEE Int. Conf. Robotics Science Systems, Pennsylvania, USA, August 2006, pp. 1–7.
31. 31)
  - 4. Muñoz-Bulnes, J., Fernandez, C., Parra, I., et al: ‘Deep fully convolutional networks with random data augmentation for enhanced generalization in road detection’. Proc. IEEE Int. Conf. Intelligent Transportation Systems Workshop on Deep Learning for Autonomous Driving, Yokohama, Japan, October 2017, pp. 1–6.
32. 32)
  - 22. Moosmann, F., Pink, O., Stiller, C.: ‘Segmentation of 3d LiDAR data in non-flat urban environments using a local convexity criterion’. Proc. IEEE Int. Conf. Intelligent Vehicles Symp., Xi'an, China, June 2009, pp. 215–220.
33. 33)
  - 37. Mendes, C., Frémont, V., Wolf, D.: ‘Vision-based road detection using contextual blocks’. arXiv:1509.01122, 2015. [Online]. Available at http://arxiv.org/abs/1509.01122, accessed September 2015.
34. 34)
  - 14. Kong, H., Audibert, J.-Y., Ponce, J.: ‘Vanishing point detection for road detection’. Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), Miami, FL, June 2009, pp. 96–103.
35. 35)
  - 13. Rasmussen, C.: ‘Grouping dominant orientations for ill-structured road following’. Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition (CVPR), Washington DC, USA, June 2004, pp. 1–8.
36. 36)
  - 31. Bo, L., Ren, X., Fox, D.: ‘Depth kernel descriptors for object recognition’. Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), San Francisco, CA, USA, September 2011, pp. 821–826.
37. 37)
  - 28. Ramakrishnan, A.G., Kumar, R.S., Raghu, H.V.: ‘Neural network-based segmentation of textures using Gabor features’. Proc. IEEE Int. Workshop on Neural Networks for Signal Processing, Martigny, Switzerland, September 2002, pp. 365–374.
38. 38)
  - 35. Alvarez, J.M., Salzmann, M., Barnes, N.: ‘Data-driven road detection’. Proc. IEEE Winter Conf. Applications of Computer Vision (WACV), Steamboat Springs, CO., March 2014, pp. 1–6.
39. 39)
  - 39. Redmon, J., Divvala, S., Girshick, R., et al: ‘You only look once: unified, real-time object detection’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, June 2016, pp. 779–788.
40. 40)
  - 43. Howard, A.G., Zhu, M., Chen, B., et al: ‘MobileNets: efficient convolutional neural networks for mobile vision applications’, arXiv preprint arXiv:1704.04861, 2017. [Online] Available at https://arxiv.org/pdf/1704.04861.pdf, accessed April 2017.
41. 41)
  - 24. Labayrade, R., Aubert, D., Tarel, J.-P.: ‘Real time obstacle detection in stereovision on non-flat road geometry through ‘v-disparity’ representation’. Proc. IEEE Int. Conf. Intelligent Vehicle Symp., Versailles, France, June 2002, pp. 646–651.
42. 42)
  - 42. Courbariaux, M., Bengio, Y.: ‘Binarynet: training deep neural networks with weights and activations constrained to+ 1 or−1’. arXiv:1602.02830. [Online] Available at https://arxiv.org/pdf/1602.02830.pdf, accessed November 2016.
43. 43)
  - 11. Alvarez, J.M., Salzmann, M., Barnes, N.: ‘Learning appearance models for road detection’. Proc. IEEE Int. Conf. Intelligent Vehicles Symp., Gold Coast, QLD, Australia, June 2013, pp. 423–429.
44. 44)
  - 26. Oniga, F., Nedevschi, S.: ‘Processing dense stereo data using elevation maps: road surface, traffic isle, and obstacle detection’, IEEE Trans. Veh. Technol, 2010, 59, (3), pp. 1172–1182.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Combining 2D and 3D features to improve road detection based on stereo cameras

References

Related content