Your browser does not support JavaScript!

access icon free Scale specified single shot multibox detector

Detecting objects at vastly different scales is a fundamental challenge in computer vision. To solve this, some approaches (e.g. TridentNet) investigate the effect of receptive fields, whereas other approaches (e.g. SNIP, SNIPER) are based on the image pyramid strategy. In this study, a novel single-shot based detector, called scale specified single-shot multibox detector (4SD) is proposed. It aims to predict objects of a specific scale range separately by using feature maps of different sizes. First, a parallel multi-branch architecture with feature maps of different sizes is generated by scale specific inference module. Then, the authors propose a scale specific training scheme to specialise each branch by sampling object instances of proper scales for training. Results are shown on both PASCAL VOC and COCO detection. The proposed method can achieve a mean average precision of 83.1% on PASCAL VOC 2007, and 36.9% on MS-COCO at a speed of 28 frames per second, which is superior to most single-stage detectors.


    1. 1)
      • 28. Lin, T., Goyal, P., Girshick, R., et al: ‘Focal loss for dense object detection’. ICCV, 2017, Venice, Italy, October 22-29, 2017.
    2. 2)
      • 34. Hoiem, D., Chodpathumwan, Y., Dai, Q.: ‘Diagnosing error in object detectors’. Computer Vision ECCV, 2012, Florence, Italy, October 7-13, 2012, pp. 340353.
    3. 3)
      • 13. Huang, Z., Huang, L., Gong, Y., et al: ‘Mask scoring R-CNN’. Computer Vision and Pattern Recognition (CVPR, 2019), Long Beach, CA, USA, June 16-20, 2019.
    4. 4)
      • 7. Deng, J., Dong, W., Socher, R., et al: ‘Imagenet: a large-scale hierarchical image database’. IEEE Conf. on Computer Vision and Pattern Recognition, 2009. CVPR, 2009, Miami, Florida, USA, June 20-25, 2009, pp. 248255.
    5. 5)
      • 11. He, K., Zhang, X., Ren, S., et al: ‘Spatial pyramid pooling in deep convolutional networks for visual recognition’. European Conf. on Computer Vision (ECCV, 2014), Zurich, Switzerland, September 6-12, 2014.
    6. 6)
      • 8. Ren, S., He, K., Girshick, G., et al: ‘Faster R-CNN: towards real-time object detection with region proposal networks’. Neural Information Processing Systems (NIPS, 2015), Montreal, Quebec, Canada, December 7-12, 2015.
    7. 7)
      • 32. Bodla, N., Singh, B., Chellappa, R., et al: ‘SoftNMS improving object detection with one line of code’. IEEE Int. Conf. on Computer Vision (ICCV, 2017), Venice, Italy, October 22-29, 2017.
    8. 8)
      • 10. Redmon, J., Divvala, S., Girshick, R., et al: ‘You only look once: unified, real-time object detection’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR, 2016), Las Vegas, NV, USA, June 27-30, 2016.
    9. 9)
      • 27. Redmon, J., Farhadi, A.: ‘YOLOv3: an incremental improvement’, arXiv preprint arXiv:1804.02767, 2018.
    10. 10)
      • 19. Li, Y., Chen, Y., Wang, N., et al: ‘Scale-aware trident networks for object detection’. ICCV, 2019, Seoul, Korea, October 27 - November 2, 2019.
    11. 11)
      • 29. Cai, Z., Vasconcelos, N.: ‘Cascade R-CNN: delving into high quality object detection’. CVPR, 2018, Salt Lake City, Utah, USA, June 18-22, 2018.
    12. 12)
      • 3. Lowe, D.G.: ‘Distinctive image features from scale-invariant keypoints’, Int. J. Comput. Vis., 2004, 60, pp. 91110.
    13. 13)
      • 9. Dai, J., Li, Y., He, K., et al: ‘R-FCN: object detection via region-based fully convolutional networks’. NIPS'16 Proc. of the 30th Int. Conf. on Neural Information Processing Systems, Barcelona, Spain, December 5-10, 2016, pp. 379387.
    14. 14)
      • 15. Shrivastava, A., Sukthankar, R., Malik, J., et al: ‘Beyond skip connections: top-down modulation for object detection’, arXiv preprint arXiv:1612.06851, 2016.
    15. 15)
      • 21. Everingham, M., Gool, L.V., Williams, C.K.I., et al: ‘The PASCAL visual object classes (VOC) challenge’, Int. J. Comput. Vis., 2010, 88, (2), pp. 303338.
    16. 16)
      • 12. Girshick, R.: ‘Fast R-CNN’. Int. Conf. on Computer Vision (ICCV, 2015), Santiago, Chile, December 11-18, 2015.
    17. 17)
      • 6. Krizhevsky, A., Sutskever, I., Hinton, G.: ‘Imagenet classification with deep convolutional neural networks’. Neural Information Processing Systems (NIPS, 2012), Harrah's and Harveys Lake Tahoe, NV, USA, December 3-8, 2012.
    18. 18)
      • 33. Bell, S., Zitnick, C.L., Bala, K., et al: ‘Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks’. CVPR, 2016, Las Vegas, NV, USA, June 27-30, 2016.
    19. 19)
      • 31. He, K., Zhang, X., Ren, S., et al: ‘Deep residual learning for image recognition’. 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR, 2016), Las Vegas, NV, USA, June 27-30, 2016.
    20. 20)
      • 14. Liu, W., Anguelov, D., Erhan, D., et al: ‘SSD: single shot multibox detector’. European Conf. on Computer Vision (ECCV, 2016), Amsterdam, The Netherlands, October 8-16, 2016.
    21. 21)
      • 26. Redmon, J., Farhadi, A.: ‘YOLO9000: better, faster, stronger’. CVPR, 2017, Honolulu, Hawaii, USA, July 21-26, 2017.
    22. 22)
      • 25. Uijlings, J.R., van de Sande, K.E., Gevers, T., et al: ‘Selective search for object recognition’, Int. J. Comput. Vis., 2013, 104, pp. 154171.
    23. 23)
      • 5. LeCun, Y., Boser, B., Denker, J.S., et al: ‘Backpropagation applied to handwritten zip code recognition’, Neural Comput., 1989, 1, pp. 541551.
    24. 24)
      • 16. Fu, C.Y., Liu, W., Ranga, A., et al: ‘DSSD: deconvolutional single shot detector’, arXiv preprint arXiv:1701.06659, 2017.
    25. 25)
      • 30. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for largescale image recognition’, ICLR, 2015, San Diego, CA, USA, May 7-9, 2015.
    26. 26)
      • 17. Lin, T.Y., Dollar, P., Girshick, R.B., et al: ‘Feature pyramid networks for object detection’. 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR, 2017), Honolulu, Hawaii, USA, July 21-26, 2017.
    27. 27)
      • 22. Lin, T.Y., Maire, M., Belongie, S., et al: ‘Microsoft COCO: common objects in context’. European Conf. on Computer Vision, Zurich, Switzerland, September 6-12, 2014, pp. 740755.
    28. 28)
      • 2. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. 2005 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR, 2005), San Diego, CA, USA, June 20-26, 2005.
    29. 29)
      • 24. Girshick, G., Donahue, J., Darrell, T., et al: ‘Rich feature hierarchies for accurate object detection and semantic segmentation’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR, 2014), Columbus, Ohio, USA, June 23-28, 2014.
    30. 30)
      • 18. Singh, B., Davis, L.S.: ‘An analysis of scale invariance in object detection–SNIP’. Computer Vision and Pattern Recognition (CVPR, 2018), Salt Lake City, Utah, USA, June 18-22, 2018.
    31. 31)
      • 20. Yu, F., Koltun, V.: ‘Multi-scale context aggregation by dilated convolutions’. ICLR, 2016, San Juan, Puerto Rico, May 2-4, 2016.
    32. 32)
      • 4. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., et al: ‘Object detection with discriminatively trained part-based models’, IEEE Trans. Pattern Anal. Mach. Intell., 2010, 32, pp. 16271645.
    33. 33)
      • 23. Sermanet, P., Eigen, D., Zhang, X., et al: ‘Overfeat: integrated recognition, localization and detection using convolutional networks’. Proc. Int. Conf. Learning Representations (ICLR), 2014, Banff, Canada, April 14-16, 2014.
    34. 34)
      • 1. Adelson, E.H., Anderson, C.H., Bergen, J.R., et al: ‘Pyramid methods in image processing’, RCA engineer, 1984, 29, (6), pp. 3341.

Related content

This is a required field
Please enter a valid email address