Crowd counting by the dual-branch scale-aware network with ranking loss constraints

Crowd counting by the dual-branch scale-aware network with ranking loss constraints

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Computer Vision — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Image crowd counting is a challenging problem. This study proposes a new deep learning method that estimates crowd counting for the congested scene. The proposed network is composed of two major components: the first ten layers of VGG16 are used as the backbone network, and a dual-branch (named as Branch_S and Branch_D) network is proposed to be the second part of the network. Branch_S extracts low-level information (head blob) through a shallow fully convolutional network and Branch_D uses a deep fully convolutional network to extract high-level context features (faces and body). Features learnt from the two different branches can handle the problem of scale variation due to perspective effects and image size differences. Features of different scales extracted from the two branches are fused to generate predicted density map. On the basis of the fact that an original graph must contain more or equal number of persons than any of its sub-images, a ranking loss function utilising the constraint relationship inside an image is proposed. Moreover, the ranking loss is combined with Euclidean loss as the final loss function. Our approach is evaluated on three benchmark datasets, and better results are achieved compared with the state-of-the-art works.


    1. 1)
      • 1. Abdelghany, A., Abdelghany, K., Mahmassani, H., et al: ‘Modeling framework for optimal evacuation of large-scale crowded pedestrian facilities’, Eur. J. Oper. Res., 2014, 237, (3), pp. 11051118.
    2. 2)
      • 2. Chow, W.K., Candy Ng, M.Y.: ‘Waiting time in emergency evacuation of crowded public transport terminals’, Saf. Sci., 2008, 46, (5), pp. 844857.
    3. 3)
      • 3. Sime, J.D.: ‘Crowd psychology and engineering’, Saf. Sci., 1995, 21, (1), pp. 114.
    4. 4)
      • 4. Sindagi, V.A., Patel, V.M.: ‘A survey of recent advances in CNN-based single image crowd counting and density estimation’, Pattern Recognit. Lett., 2018, 107, pp. 316.
    5. 5)
      • 5. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., et al: ‘Object detection with discriminatively trained part-based models’, IEEE Trans. Pattern Anal. Mach. Intell., 2010, 32, (9), pp. 16271645.
    6. 6)
      • 6. Chan, A.B., John Liang, Z.-S., Vasconcelos, N.: ‘Privacy-preserving crowd monitoring: counting people without people models or tracking’. 2008 IEEE Conf. Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008, pp. 17.
    7. 7)
      • 7. Chan, A.B., Vasconcelos, N.: ‘Bayesian Poisson regression for crowd counting’. 2009 IEEE 12th Int. Conf. Computer Vision, Kyoto, Japan, September 2009, pp. 545551.
    8. 8)
      • 8. Garcia-Bunster, G., Torres-Torriti, M., Oberli, C.: ‘Crowded pedestrian counting at bus stops from perspective transformations of foreground areas’, IET Comput. Vis., 2012, 6, (4), pp. 296305.
    9. 9)
      • 9. Viola, P., Jones, M.J.: ‘Robust real-time face detection’, Int. J. Comput. Vis., 2004, 57, (2), pp. 137154.
    10. 10)
      • 10. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. 2005 IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR'05), San Diego, CA, USA, 2005, vol. 1, pp. 886893.
    11. 11)
      • 11. Zhang, C., Li, H., Wang, X., et al: ‘Cross-scene crowd counting via deep convolutional neural networks’. 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, June 2015, pp. 833841.
    12. 12)
      • 12. Deb, D., Ventura, J.: ‘An aggregated multicolumn dilated convolution network for perspective-free counting’. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, June 2018, pp. 308309.
    13. 13)
      • 13. Wang, Z., Xiao, Z., Xie, K., et al: ‘In defense of single-column networks for crowd counting’. arXiv: 1808.06133 [cs], August 2018.
    14. 14)
      • 14. Zhang, Y., Zhou, D., Chen, S., et al: ‘Single-image crowd counting via multi-column convolutional neural network’. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June 2016, pp. 589597.
    15. 15)
      • 15. Zeng, L., Xu, X., Cai, B., et al: ‘Multi-scale convolutional neural networks for crowd counting’. arXiv: 1702.02359 [cs], February 2017.
    16. 16)
      • 16. Zhang, L., Shi, M., Chen, Q.: ‘Crowd counting via scale-adaptive convolutional neural network’. arXiv:1711.04433 [cs], 2017.
    17. 17)
      • 17. Liu, X., van de Weijer, J., Bagdanov, A.D.: ‘Leveraging unlabeled data for crowd counting by learning to rank’. arXiv:1803.03095 [cs], 2018.
    18. 18)
      • 18. Sam, D.B., Surya, S., Venkatesh Babu, R.: ‘Switching convolutional neural network for crowd counting’. arXiv:1708.00199 [cs], 2017.
    19. 19)
      • 19. He, K., Zhang, X., Ren, S., et al: ‘Deep residual learning for image recognition’. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June 2016, pp. 770778.
    20. 20)
      • 20. Sun, K., Zhao, Y., Jiang, B., et al: ‘High-resolution representations for labeling pixels and regions’. arXiv:1904.04514 [cs], April 2019.
    21. 21)
      • 21. Girshick, R.: ‘Fast R-CNarXiv N.:1504.08083 [cs]’. arXiv: 1504.08083, April 2015.
    22. 22)
      • 22. Liu, W., Anguelov, D., Erhan, D., et al: ‘SSD: single shot MultiBox detector’. arXiv:1512.02325 [cs], 9905, 2016.
    23. 23)
      • 23. Redmon, J., Divvala, S., Girshick, R., et al: ‘You only look once: unified, real-time object detection’. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June 2016, pp. 779788.
    24. 24)
      • 24. Shen, Z., Xu, Y., Ni, B., et al: ‘Crowd counting via adversarial cross-scale consistency pursuit’. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, June 2018, pp. 52455254.
    25. 25)
      • 25. Huang, S., Li, X., Zhang, Z., et al: ‘Body structure-aware deep crowd counting’, IEEE Trans. Image Process., 2018, 27, (3), pp. 10491059.
    26. 26)
      • 26. Sam, D.B., Venkatesh Babu, R.: ‘Top-down feedback for crowd counting convolutional neural network’. arXiv:1807.08881 [cs], July 2018.
    27. 27)
      • 27. Wang, C., Zhang, H., Yang, L., et al: ‘Deep people counting in extremely dense crowds’. Proc. 23rd ACM Int. Conf. Multimedia (MM ’15), Brisbane, Australia, 2015, pp. 12991302.
    28. 28)
      • 28. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘ImageNet classification with deep convolutional neural networks’. Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12), NY, USA, 2012, pp. 10971105.
    29. 29)
      • 29. Boominathan, L., Kruthiventi, S.S.S., Venkatesh Babu, R.: ‘CrowdNet: a deep convolutional network for dense crowd counting’. arXiv:1608.06197 [cs], 2016.
    30. 30)
      • 30. Li, Y., Zhang, X., Chen, D.: ‘CSRNet: dilated convolutional neural networks for understanding the highly congested scenes’. arXiv:1802.10062 [cs], 2018.
    31. 31)
      • 31. Idrees, H., Tayyab, M., Athrey, K., et al: ‘Composition loss for counting, density map estimation and localization in dense crowds’. arXiv:1808.01050 [cs], 2018.
    32. 32)
      • 32. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’. arXiv:1409.1556 [cs], 2014.
    33. 33)
      • 33. Cao, X., Wang, Z., Zhao, Y., et al: ‘Scale aggregation network for accurate and efficient crowd counting’. Computer Vision (ECCV 2018), Munich, Germany, 2018, vol. 11209, pp. 757773.
    34. 34)
      • 34. Oñoro-Rubio, D., López-Sastre, R.J.: ‘Towards perspective-free object counting with deep learning’. Computer Vision (ECCV 2016), Amsterdam, Netherlands, 2016, vol. 9911, pp. 615629.
    35. 35)
      • 35. Zhao, H., Shi, J., Qi, X., et al: ‘Pyramid scene parsing network’. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, July 2017, pp. 62306239.
    36. 36)
      • 36. Lin, T., Dollár, P., Girshick, R., et al: ‘Feature pyramid networks for object detection’.arXiv: 1612.03144, April 2017.
    37. 37)
      • 37. Sindagi, V.A., Patel, V.M.: ‘CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting’. arXiv:1707.09605 [cs], 2017.
    38. 38)
      • 38. Gao, J., Wang, Q., Li, X.: ‘PCC Net: perspective crowd counting via spatial convolutional network’. arXiv:1905.10085 [cs], 2019.

Related content

This is a required field
Please enter a valid email address