http://iet.metastore.ingenta.com
1887

ResFusion: deeply fused scene parsing network for RGB-D images

ResFusion: deeply fused scene parsing network for RGB-D images

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Computer Vision — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Scene parsing is a very challenging work for complex and diverse scenes. In this study, the authors address the problem of semantic segmentation of indoor scenes for red, green, blue-depth (RGB-D) images. Most existing works use only the colour or photometric information for this problem. Here, they present an approach to fusing feature maps between colour network branch and depth network branch to integrate the photometric information and geometric information, which improves the semantic segmentation performance. They propose a novel convolutional neural network that uses ResNet as a baseline network. Their proposed network adopts a spatial pyramid pooling module to make full use of different sub-region representations. Their proposed network also adopts multiple feature maps fusion modules to integrate texture and structure information between the colour branch and depth branch. Moreover, their proposed network has multiple auxiliary loss branches together with the main loss function to prevent the gradient of frontal layers disappear and accelerate the training phase of the fusion part. Comprehensive experimental evaluations show that their proposed network ‘ResFusion’ improves the performance greatly over the baseline network and has achieved competitive performance compared with other state-of-the-art methods on the challenging SUN RGB-D benchmark.

References

    1. 1)
      • 1. Long, J., Shelhamer, E., Darrell, T.: ‘Fully convolutional networks for semantic segmentation’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 34313440.
    2. 2)
      • 2. Zhao, H., Shi, J., Qi, X., et al: ‘Pyramid scene parsing network’, arXiv preprint arXiv:1612.01105, 2016.
    3. 3)
      • 3. Badrinarayanan, V., Kendall, A., Cipolla, R.: ‘SegNet: a deep convolutional encoder–decoder architecture for image segmentation’, IEEE Trans. Pattern Anal. Mach. Intell., 2017, 39, (12), pp. 24812495.
    4. 4)
      • 4. Chen, L.C., Papandreou, G., Schroff, F., et al: ‘Rethinking atrous convolution for semantic image segmentation’, arXiv preprint arXiv:1706.05587, 2017.
    5. 5)
      • 5. Chen, L.C., Papandreou, G., Kokkinos, I., et al: ‘Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS’, arXiv preprint arXiv:1606.00915, 2016.
    6. 6)
      • 6. Zheng, S., Jayasumana, S., Romera-Paredes, B., et al: ‘Conditional random fields as recurrent neural networks’. Proc. Int. Conf. Computer Vision (ICCV), 2015, pp. 15291537.
    7. 7)
      • 7. Byeon, W., Breuel, T.M., Raue, F., et al: ‘Scene labeling with LSTM recurrent neural networks’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 35473555.
    8. 8)
      • 8. Szegedy, C., Ioffe, S., Vanhoucke, V., et al: ‘Inception-v4, inception-ResNet and the impact of residual connections on learning’. Proc. Conf. Artificial Intelligence (AAAI), 2017, vol. 4, pp. 12.
    9. 9)
      • 9. Zou, C., Colburn, A., Shan, Q., et al: ‘LayoutNet: reconstructing the 3D room layout from a single RGB image’, arXiv preprint arXiv:1803.08999, 2018.
    10. 10)
      • 10. Mousavian, A., Anguelov, D., Flynn, J., et al: ‘3D bounding box estimation using deep learning and geometry’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017, pp. 56325640.
    11. 11)
      • 11. Izadinia, H., Shan, Q., Seitz, S.M.: ‘Im2cad’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017, pp. 24222431.
    12. 12)
      • 12. Everingham, M., Gool, L.V., Williams, C.K.I., et al: ‘The PASCAL visual object classes (VOC) challenge’, Int. J. Comput. Vis., 2010, 88, (2), pp. 303338.
    13. 13)
      • 13. Song, S., Lichtenberg, S.P., Xiao, J.: ‘Sun RGB-D: A RGB-D scene understanding benchmark suite’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 567576.
    14. 14)
      • 14. Silberman, N., Hoiem, D., Kohli, P., et al: ‘Indoor segmentation and support inference from RGBD images’. Proc. European Conf. Computer Vision (ECCV), 2012, pp. 746760.
    15. 15)
      • 15. He, K., Zhang, X., Ren, S., et al: ‘Deep residual learning for image recognition’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770778.
    16. 16)
      • 16. Couprie, C., Farabet, C., Najman, L., et al: ‘Indoor semantic segmentation using depth information’, arXiv preprint arXiv:1301.3572, 2013.
    17. 17)
      • 17. Hazirbas, C., Ma, L., Domokos, C., et al: ‘FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture’. Proc. Asian Conf. Computer Vision (ACCV), 2016, pp. 213228.
    18. 18)
      • 18. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’, arXiv preprint arXiv:1409.1556, 2014.
    19. 19)
      • 19. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘ImageNet classification with deep convolutional neural networks’. Proc. Int. Conf. Neural Information Processing Systems, 2012, pp. 10971105.
    20. 20)
      • 20. Szegedy, C., Liu, W., Jia, Y., et al: ‘Going deeper with convolutions’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 19.
    21. 21)
      • 21. Bishop, C.M.: ‘Pattern recognition and machine learning’ (Springer, New York, USA, 2006).
    22. 22)
      • 22. Noh, H., Hong, S., Han, B.: ‘Learning deconvolution network for semantic segmentation’. Proc. Int. Conf. Computer Vision (ICCV), 2015, pp. 15201528.
    23. 23)
      • 23. Zeiler, M.D., Taylor, G.W., Fergus, R.: ‘Adaptive deconvolutional networks for mid and high level feature learning’. Proc. Int. Conf. Computer Vision (ICCV), 2011, pp. 20182025.
    24. 24)
      • 24. Chen, L.C., Papandreou, G., Kokkinos, I., et al: ‘Semantic image segmentation with deep convolutional nets and fully connected CRFs’, IEEE Trans Pattern Anal Mach Intell., 2018, 40, (4), pp. 834848.
    25. 25)
      • 25. Chen, L.C., Yang, Y., Wang, J., et al: ‘Attention to scale: scale-aware semantic image segmentation’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 36403649.
    26. 26)
      • 26. Lazebnik, S., Schmid, C., Ponce, J.: ‘Beyond bags of features: spatial pyramid matching for recognizing natural scene categories’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2006, vol. 2, pp. 21692178.
    27. 27)
      • 27. Arnab, A., Jayasumana, S., Zheng, S., et al: ‘Higher order conditional random fields in deep neural networks’. Proc. European Conf. Computer Vision (ECCV), 2016, pp. 524540.
    28. 28)
      • 28. McCormac, J., Handa, A., Davison, A., et al: ‘Semantic fusion: dense 3D semantic mapping with convolutional neural networks’. Proc. IEEE Conf. Robotics and Automation (ICRA), 2017, pp. 46284635.
    29. 29)
      • 29. Gupta, S., Girshick, R., Arbeláez, P., et al: ‘Learning rich features from RGB-D images for object detection and segmentation’. Proc. European Conf. Computer Vision (ECCV), 2014, pp. 345360.
    30. 30)
      • 30. Li, Z., Gan, Y., Liang, X., et al: ‘RGB-D scene labeling with long short-term memorized fusion model’, arXiv preprint arXiv:1604.05000, 2016.
    31. 31)
      • 31. He, K., Sun, J..: ‘Convolutional neural networks at constrained time cost’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 53535360.
    32. 32)
      • 32. Srivastava, R.K., Greff, K., Schmidhuber, J.: ‘Highway networks’, arXiv preprint arXiv:1505.00387, 2015.
    33. 33)
      • 33. Zhou, B., Khosla, A., Lapedriza, A., et al: ‘Object detectors emerge in deep scene CNNS’, arXiv preprint arXiv:1412.6856, 2014.
    34. 34)
      • 34. Eigen, D., Fergus, R.: ‘Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture’. Proc. Int. Conf. Computer Vision (ICCV), 2015, pp. 26502658.
    35. 35)
      • 35. Jia, Y., Shelhamer, E., Donahue, J., et al: ‘Caffe: convolutional architecture for fast feature embedding’. Proc. Int. conf. Multimedia, 2014, pp. 675678.
    36. 36)
      • 36. Bottou, L.: ‘Stochastic gradient descent tricks’ (Springer, Berlin, Heidelberg, 2012), pp. 421436.
    37. 37)
      • 37. Kendall, A., Badrinarayanan, V., Cipolla, R.: ‘Bayesian SegNet: model uncertainty in deep convolutional encoder–decoder architectures for scene understanding’, arXiv preprint arXiv:1511.02680, 2015.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2018.5218
Loading

Related content

content/journals/10.1049/iet-cvi.2018.5218
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address