Deep learning-based real-time fine-grained pedestrian recognition using stream processing

Deep learning-based real-time fine-grained pedestrian recognition using stream processing

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Intelligent Transport Systems — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Real-time recognition of pedestrian details can be very important in emergency situations for security reasons, such as traffic accidents identification from traffic video. However, this is challenging due to the needed accuracy of video data mining, and also the performance for real-time video processing. Here, the authors propose a solution for fine-grained pedestrian recognition in monitoring scenarios using deep learning and stream processing cloud computing, which is called DRPRS (deep learning-based real-time fine-grained pedestrian recognition using stream processing). The authors design an improved convolutional neural network (CNN) network called fine-CNN, which is a nine-layer neural network for detailed pedestrian recognition. In DRPRS, a pedestrian in a surveillance video is segmented and fine-grainedly recognised using improved single-shot detector and several fine-CNNs. DRPRS is supported by parallel mechanisms provided by Apache Storm stream processing framework. In addition, in order to further improve the recognition performance, a GPU-based scheduling algorithm is proposed to make full use of GPU resources in a cluster. The whole recognition process is deployed on a big video data processing platform to meet real-time requirements. DRPRS is extensively evaluated in terms of accuracy, fault tolerance, and performance, which show that the proposed approach is efficient.


    1. 1)
      • 1. Foresti, G.L.: ‘Object recognition and tracking for remote video surveillance’, IEEE Trans. Circuits Syst. Video Technol., 1999, 9, (7), pp. 10451062.
    2. 2)
      • 2. Dollar, P., Wojek, C., Schiele, B., et al: ‘Pedestrian detection: an evaluation of the state of the art’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, (4), pp. 743761.
    3. 3)
      • 3. Krause, J., Gebru, T., Deng, J., et al: ‘Learning features and parts for fine-grained recognition’. Int. Conf. on Pattern Recognition, 2014, pp. 2633.
    4. 4)
      • 4. Zhang, N., Donahue, J., Girshick, R., et al: ‘Part-based rcnns for fine-grained category detection’, 2014, 8689, pp. 834849.
    5. 5)
      • 5. Yang, S., Bo, L., Wang, J., et al: ‘Unsupervised template learning for fine-grained object recognition’. Advances in Neural Information Processing Systems, 2012, pp. 31223130.
    6. 6)
      • 6. Liu, W., Anguelov, D., Erhan, D., et al: ‘SSD: single shot multibox detector’. European Conference on Computer Vision. Amsterdam, The Netherlands, 2016, pp. 2137.
    7. 7)
      • 7. Lcun, Y., Bottou, L., Bengio, Y., et al: ‘Gradient-based learning applied to document recognition’, Proc. IEEE, 1998, 86, (11), pp. 22782324.
    8. 8)
      • 8. Liang, X., Xu, C., Shen, X., et al: ‘Human parsing with contextualized convolutional neural network’. IEEE Int. Conf. on Computer Vision, 2015, pp. 13861394.
    9. 9)
      • 9. Zhang, W., Xu, L., Duan, P., et al: ‘A video cloud platform combing online and offline cloud computing technologies’, Pers. Ubiquitous Comput., 2015, 19, (7), pp. 10991110,
    10. 10)
      • 10. Zhang, W., Xu, L., Li, Z., et al: ‘A deep-intelligence framework for online video processing’, IEEE Softw., 2016, 33, (2), pp. 4451,
    11. 11)
      • 11. Zhang, W., Duan, P., Gong, W., et al: ‘A load-aware pluggable cloud framework for real-time video processing’, IEEE Trans. Ind. Inform., 2016, 12, (6), pp. 21662176,
    12. 12)
      • 12. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘Imagenet classification with deep convolutional neural networks’. Int. Conf. on Neural Information Processing Systems, 2012, pp. 10971105.
    13. 13)
      • 13. Everingham, M., Van Gool, L., Williams, C.K.I., et al: ‘The PASCAL visual object classes challenge 2012 (VOCNN2012) results’,
    14. 14)
      • 14. Krause, J., Stark, M., Deng, J., et al: ‘3d object representations for fine-grained categorization’. IEEE Int. Conf. on Computer Vision Workshops, 2013, pp. 554561.
    15. 15)
      • 15. Krause, J.: ‘Fine-grained recognition without part annotations’, IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 55465555.
    16. 16)
      • 16. Lin, T.Y., Roychowdhury, A., Maji, S.: ‘Bilinear CNN models for fine-grained visual recognition’, IEEE Int. Conf. on Computer Vision, 2015, pp. 14491457.
    17. 17)
      • 17. Angelova, A., Zhu, S.: ‘Efficient object detection and segmentation for fine-grained recognition’. Computer Vision and Pattern Recognition, 2013, pp. 811818.
    18. 18)
      • 18. Weber, M., Stiefelhagen, R., Stiefelhagen, R.: ‘Part-based clothing segmentation for person retrieval’. IEEE Int. Conf. on Advanced Video and Signal-Based Surveillance, 2011, pp. 361366.
    19. 19)
      • 19. Long, J., Shelhamer, E., Darrell, T.: ‘Fully convolutional networks for semantic segmentation’. Computer Vision and Pattern Recognition, 2015, pp. 34313440.
    20. 20)
      • 20. Xia, F., Wang, P., Chen, X., et al: ‘Joint multi-person pose estimation and semantic part segmentation’. IEEE Conf. on Computer Vision and Pattern Recognition, 2017, pp. 60806089.
    21. 21)
      • 21. Wang, P., Shen, X., Lin, Z., et al: ‘Joint object and part segmentation using deep learned potentials’, IEEE International Conference on Computer Vision, Chile, 2015, pp. 15731581.
    22. 22)
      • 22. Xia, F., Zhu, J., Wang, P., et al: ‘Pose-guided human parsing with deep learned features’, Computer Science, 2015.
    23. 23)
      • 23. Redmon, J., Divvala, S., Girshick, R., et al: ‘You only look once: unified, real-time object detection’, Computer Science, 2016, pp. 779788.
    24. 24)
      • 24. Ren, S., He, K., Girshick, R., et al: ‘Faster R-CNN: towards real-time object detection with region proposal networks’. IEEE Trans. Pattern Anal. Mach. Intell., 2016, 39, (6), pp. 11.
    25. 25)
      • 25. Aniello, L., Baldoni, R., Querzoni, L.: ‘Adaptive online scheduling in storm’. Proc. of the 7th ACM Int. Conf. on Distributed Event-Based Systems. ACM, 2013, pp. 207218.
    26. 26)
      • 26. Peng, B., Hosseini, M., Hong, Z., et al: ‘Rstorm: resource-aware scheduling in storm’. Proc. of the 16th Annual Middleware Conf. ACM, 2015, pp. 149161.
    27. 27)
      • 27. Cardellini, V., Grassi, V., Lo Presti, F., et al: ‘Distributed QOS-aware scheduling in storm’. Proc. of the 9th ACM Int. Conf. on Distributed Event-Based Systems. ACM, 2015, pp. 344347.

Related content

This is a required field
Please enter a valid email address