Your browser does not support JavaScript!

access icon free YOLOpeds: efficient real-time single-shot pedestrian detection for smart camera applications

Deep-learning-based pedestrian detectors can enhance the capabilities of smart camera systems in a wide spectrum of machine vision applications including video surveillance, autonomous driving, robots and drones, smart factory, and health monitoring. However, such complex paradigms do not scale easily and are not traditionally implemented in resource-constrained smart cameras for on-device processing which offers significant advantages in situations when real-time monitoring and privacy are vital. This work addresses the challenge of achieving a good trade-off between accuracy and speed for efficient deep-learning-based pedestrian detection in smart camera applications. The contributions of this work are the following: 1) a computationally efficient architecture based on separable convolutions that integrates dense connections across layers and multi-scale feature fusion to improve representational capacity while decreasing the number of parameters and operations, 2) a more elaborate loss function for improved localization, 3) and an anchor-less approach for detection. The proposed approach referred to as YOLOpeds is evaluated using the PETS2009 surveillance dataset on 320 × 320 images. A real-system implementation is presented using the Jetson TX2 embedded platform. YOLOpeds provides real-time sustained operation of over 30 frames per second with detection rates in the range of 86% outperforming existing deep learning models.


    1. 1)
      • 14. Kouris, A., Kyrkou, C., Bouganis, C.S.: ‘Informed region selection for efficient UAV-based object detectors: altitude-aware vehicle detection with Cycar dataset’. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Macau, China, China, 2019.
    2. 2)
      • 20. Zhang, L., Lin, L., Liang, X., et al: ‘Is Faster R-CNN doing well for pedestrian detection?’. InLeibe, B., Matas, J., Sebe, N., Welling, M. (eds.): ‘Computer Vision – ECCV 2016' (Springer International Publishing, Cham, 2016), pp. 443457.
    3. 3)
      • 16. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’, CoRR, 2014, abs/1409.1556. Available at
    4. 4)
      • 27. Wang, Y., Quan, Z., Li, J., et al: ‘A retrospective evaluation of energy-efficient object detection solutions on embedded devices’. 2018 Design, Automation Test in Europe Conf. Exhibition (DATE), Dresden, Germany, 2018, pp. 709714.
    5. 5)
      • 17. He, K., Zhang, X., Ren, S., et al: ‘Deep residual learning for image recognition’. CoRR, 2015, abs/1512.03385. Available at
    6. 6)
      • 29. Chabot, F., Chaouch, M., Pham, Q.C.: ‘LapNet: automatic balanced loss and optimal assignment for real-time dense object detection’, 2019,
    7. 7)
      • 18. Wojek, C., Walk, S., Schiele, B.: ‘Multi-cue onboard pedestrian detection’. 2009 IEEE Conf. on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 794801.
    8. 8)
      • 12. Suleiman, A., Chen, Y., Emer, J., et al: ‘Towards closing the energy gap between hog and CNN features for embedded vision’. 2017 IEEE Int. Symp. on Circuits and Systems (ISCAS), Baltimore, MD, USA, 2017, pp. 14.
    9. 9)
      • 26. Li, H., Lin, Z., Shen, X., et al: ‘A convolutional neural network cascade for face detection’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 53255334.
    10. 10)
      • 24. O'Keeffe, S., Villing, R.: ‘Evaluating pruned object detection networks for real-time robot vision’. 2018 IEEE Int. Conf. on Autonomous Robot Systems and Competitions (ICARSC), Torres Vedras, Portugal, 2018, pp. 9196.
    11. 11)
      • 3. Courtney, M.: ‘Public eyes get smart [CCTV camera]’, Eng. Technol., 2011, 6, pp. 3841(3). Available at
    12. 12)
      • 5. Sermanet, P., LeCun, Y.: ‘Traffic sign recognition with multi-scale convolutional networks’. The 2011 Int. Joint Conf. on Neural Networks, San Jose, CA, USA, 2011, pp. 28092813.
    13. 13)
      • 28. Lin, T., Dollár, P., Girshick, R.B., et al: ‘Feature pyramid networks for object detection’, CoRR, 2016, abs/1612.03144. Available from
    14. 14)
      • 9. Redmon, J., Farhadi, A.: ‘Yolo9000: better, faster, stronger’. 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 65176525.
    15. 15)
      • 10. Redmon, J., Farhadi, A.: ‘Yolov3: an incremental improvement’, CoRR, 2018, abs/1804.02767. Available at
    16. 16)
      • 15. Sandler, M., Howard, A., Zhu, M., et al: ‘Mobilenetv2: inverted residuals and linear bottlenecks’. 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 45104520.
    17. 17)
      • 1. Zhao, Z., Zheng, P., Xu, S., et al: ‘Object detection with deep learning: a review’, IEEE Trans. Neural Netw. Learn. Syst., 2019, 30, (11), pp. 32123232.
    18. 18)
      • 8. Ren, S., He, K., Girshick, R., et al: ‘Faster R-CNN: towards real-time object detection with region proposal networks’, IEEE Trans. Pattern Anal. Mach. Intell., 2017, 39, (6), pp. 11371149.
    19. 19)
      • 6. Zhu, Q., Yeh, M.-C., Cheng, K.-T., et al: ‘Fast human detection using a cascade of histograms of oriented gradients’. 2006 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR'06), New York, NY, USA, Vol. 2, 2006, pp. 14911498.
    20. 20)
      • 23. NVIDIA. ‘Jetson tx2 board,’. Available at
    21. 21)
      • 22. Liu, Z., Chen, Z., Li, Z., et al: ‘An efficient pedestrian detection method based on yolov2’, Math. Probl. Eng., 2018, 2018, p. 10.
    22. 22)
      • 7. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘ImageNet classification with deep convolutional neural networks’. Proc. 25th Int. Conf. on Neural Information Processing Systems – Vol. 1 (NIPS'12), USA, 2012, pp. 10971105. Available at
    23. 23)
      • 32. Wilson, A.C., Roelofs, R., Stern, M., et al: ‘The marginal value of adaptive gradient methods in machine learning’, 2017.
    24. 24)
      • 21. Du, X., El-Khamy, M., Lee, J., et al: ‘Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection’, CoRR, 2016, abs/1610.03466. Available at
    25. 25)
      • 11. Liu, W., Anguelov, D., Erhan, D., et al: ‘SSD: single shot multibox detector’. European Conf. on Computer Vision (ECCV), Amsterdam, Netherlands, 2016.
    26. 26)
      • 4. Kyrkou, C., Plastiras, G., Theocharides, T., et al: ‘DroNet: efficient convolutional neural network detector for real-time UAV applications’. Design, Automation Test in Europe Conf. and Exhibition (DATE), Dresden, Germany, 2018.
    27. 27)
      • 31. Abadi, M., Barham, P., Chen, J., et al: ‘Tensorflow: A system for large-scale machine learning’. Proc. 12th USENIX Conf. on Operating Systems Design and Implementation (OSDI'16), Berkeley, CA, USA: USENIX Association, 2016, pp. 265283. Available at
    28. 28)
      • 25. Zhu, R., Zhang, S., Wang, X., et al: ‘Scratchdet: exploring to train single-shot object detectors from scratch’, CoRR, 2018, abs/1810.08425. Available at
    29. 29)
      • 2. Hospedales, T., Romero, A., Vázquez, D.: ‘Guest editorial: deep learning in computer vision’, IET Comput. Vis., 2017, 11, pp. 621622. Available at
    30. 30)
      • 30. Chollet, F.‘keras’, GitHub, 2015. Available at
    31. 31)
      • 19. Zhang, S., Yang, J., Schiele, B.: ‘Occluded pedestrian detection through guided attention in CNNs’. 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 69957003.
    32. 32)
      • 13. Ferryman, J., Shahrokni, A.: ‘Pets2009: dataset and challenge’. 2009 12th IEEE Int. Workshop on Performance Evaluation of Tracking and Surveillance, Snowbird, UT, USA, 2009, pp. 16.

Related content

This is a required field
Please enter a valid email address