Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon openaccess ApesNet: a pixel-wise efficient segmentation network for embedded devices

Road scene understanding and semantic segmentation is an on-going issue for computer vision. A precise segmentation can help a machine learning model understand the real world more accurately. In addition, a well-designed efficient model can be used on source limited devices. The authors aim to implement an efficient high-level, scene understanding model in an embedded device with finite power and resources. Toward this goal, the authors propose ApesNet, an efficient pixel-wise segmentation network which understands road scenes in near real-time and has achieved promising accuracy. The key findings in the authors’ experiments are significantly lower the classification time and achieving a high accuracy compared with other conventional segmentation methods. The model is characterised by an efficient training and a sufficient fast testing. Experimentally, the authors use two road scene benchmarks, CamVid and Cityscapes to show the advantages of ApesNet. The authors’ compare the proposed architecture's accuracy and time performance with SegNet-Basic, a deep convolutional encoder–decoder architecture. ApesNet is 37% smaller than SegNet-Basic in terms of model size. With this advantage, the combining encoding and decoding time for each image is 2.5 times faster than SegNet-Basic.

References

    1. 1)
      • 8. Cheng, H., Wen, W., Song, C., et al: ‘Exploring the optimal learning technique for IBM TrueNorth platform to overcome quantization loss’. IEEE/ACM Int. Symp. on Nanoscale Architectures, 2016.
    2. 2)
      • 24. Noh, H., Hong, S., Han, B.: ‘Learning deconvolution network for semantic segmentation’. Int. Conf. on Computer Vision (ICCV), 2015.
    3. 3)
      • 15. Li, S., Liu, X., Mao, M., et al: ‘Heterogeneous systems with reconfigurable neuromorphic computing accelerators’, 2016.
    4. 4)
      • 22. Long, J., Shelhamer, E., Darrell, T.: ‘Fully convolutional networks for semantic segmentation’. Int. Conf. on Computer Vision and Pattern Recognition (CVPR), 2015.
    5. 5)
    6. 6)
      • 30. https://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf.
    7. 7)
      • 21. Vincent, P., Larochelle, H., Lajoie, I., et al: ‘Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion’, J. Mach. Learn. Res. (JMLR), 2010, 11, pp. 3371-3408.
    8. 8)
      • 18. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’. Int. Conf. on Learning Representations (ICLR), 2015.
    9. 9)
      • 6. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’, International Conference on Learning Representations, 2015.
    10. 10)
      • 25. Yu, F., Koltun, B.: ‘Multi-scale context aggregation by dilated convolutions’. Int. Conf. on Learning Representations (ICLR), 2016.
    11. 11)
      • 19. Szegedy, C., Liu, W., Jia, Y., et al: ‘Going deeper with convolutions’. Int. Conf. on Computer Vision and Pattern Recognition (CVPR), 2015.
    12. 12)
      • 7. He, K., Zhang, X., Ren, S., et al: ‘Delving deep into rectifiers: surpassing human-level performance on ImageNet classification’, IEEE Conference on Computer Vision, 2015., http://arxiv.org/abs/1502.01852.
    13. 13)
      • 16. Ioffe, S., Szegedy, C.: ‘Batch normalization: accelerating deep network training by reducing internal convariate shift’, J. Mach. Learn. Res. (JMLR), 2015, 37, pp. 1-9.
    14. 14)
      • 5. Szegedy, C., Liu, W., Jia, Y., et al: ‘Going deeper with convolutions’, IEEE Conference on Computer Vision and Pattern Recognition, 2015.
    15. 15)
      • 1. Krizhevsky, A., Sutshever, I., Hinton, G.E.: ‘ImageNet classification with deep convolutional neural networks’. Advances in Neural Information Processing Systems (NIPS), 2012.
    16. 16)
      • 12. Verbeek, J., Triggs, W.: ‘Scene segmentation with CRFs learned from partially labeled images’. NIPS, 2007.
    17. 17)
    18. 18)
      • 27. Eigen, D., Fergus, R.: ‘Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture’, arxiv, 2014.
    19. 19)
      • 10. He, K., Zhang, X., Ren, S., et al: ‘Deep residual learning for image recognition’, arxiv, 2015.
    20. 20)
      • 29. He, K., Zhang, X., Ren, S., et al: ‘Delving deep into rectifiers: surpassing human-level performance on ImageNet classification’, arxiv, 2015.
    21. 21)
      • 14. Wen, W., Wu, C., Wang, Y., et al: ‘Learning structured sparsity in deep neural networks’, 2016.
    22. 22)
      • 9. Badrinarayanan, V., Kendall, A., Cipolla, R.: ‘SegNet: a deep convolutional encoder-decoder architecture for robust semantic pixel-wise labeling’, arxiv, 2015.
    23. 23)
      • 3. Li, S., Wu, C., Li, H., et al: ‘Fpga acceleration of recurrent neural network based language model’. IEEE Int. Symp. on Field-Programmable Custom Computing Machines, 2015, pp. 111118.
    24. 24)
      • 20. Simard, P.Y., Steinkraus, D., Platt, J.C.: ‘Best practices for convolutional neural networks applied to visual document analysis’. Int. Conf. on Document Analysis and Recognition (ICDAR), 2003.
    25. 25)
    26. 26)
      • 11. Brostow, G.J., Shotton, J., Fauqueur, J., et al: ‘Segmentation and recognition using structure form motion point clouds’. European Conf. on Computer Vision (ECCV), 2008.
    27. 27)
      • 23. Badrinarayanan, V., Kendall, A., Cipolla, E.: ‘SegNet: a deep convolutional encoder-decoder architecture for image segmentation’, arxiv, 2015.
    28. 28)
      • 31. Cordts, M., Omran, M., Ramos, S., et al: ‘The cityscapes dataset for semantic urban scene understanding’. Int. Conf. on Computer Vision and Pattern Recognition (CVPR), 2016.
    29. 29)
      • 17. https://developer.nvidia.com/cudnn.
    30. 30)
      • 26. Lin, G., Shen, C., van den Hengel, A., et al: ‘Efficient piecewise training of deep structured models for semantic segmentation’. Int. Conf. on Computer Vision and Pattern Recognition (CVPR), 2016.
    31. 31)
      • 2. Lee, H., Pham, P., Largman, Y., et al: ‘Unsupervised feature learning for audio classification using convolutional deep belief networks’, 2009.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cps.2016.0027
Loading

Related content

content/journals/10.1049/iet-cps.2016.0027
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address