http://iet.metastore.ingenta.com
1887

Adaptive learning feature pyramid for object detection

Adaptive learning feature pyramid for object detection

For access to this article, please select a purchase option:

Buy article PDF
$19.95
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Computer Vision — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Inconsistent detection performance for objects of different scales lies in many state-of-the-art object detection models. The feature pyramid network (FPN) alleviates this problem by fusing multi-scale feature maps through a top-down path. However, the features fusion strategy used in FPN lacks learning ability, which may result in suboptimal performance of the model. In this study, the authors propose a cross-scale feature fusion network (CSFF) to fuse the low-level location feature maps with the high-level semantic feature maps. The CSFF first embeds a dilated convolution and deconvolution layer into the top-down path of the FPN to enhance the learning ability of feature fusion. After that, an attention module is applied to suppress distraction and interference in the feature map. Each component of the CSFF is highly decoupled and can easily cooperate with a base network in an end-to-end training manner. In this study, they combine the CSFF with faster region with convolutional neural network and conduct a series of experiments on the PASCAL VOC 2007 and 2012 object detection datasets. Without any bells and whistles, the CSFF achieves a considerable detection improvement over the baseline network.

References

    1. 1)
      • 1. He, K., Zhang, X., Ren, S., et al: ‘Deep residual learning for image recognition’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR2016), Las Vegas, NV, USA, 2016, pp. 770778.
    2. 2)
      • 2. Szegedy, C., Liu, W., Jia, Y., et al: ‘Going deeper with convolutions’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR2015), Boston, MA, USA, 2015, pp. 19.
    3. 3)
      • 3. Huang, G., Liu, Z., Van-Der-Maaten, L., et al: ‘Densely connected convolutional networks’. Computer Vision and Pattern Recognition (CVPR2017), Hawaii, USA, 2017, 1, p. 3.
    4. 4)
      • 4. Chollet, F.: ‘Xception: deep learning with depthwise separable convolutions’, ArXiv preprint ArXiv:161002357, 2016.
    5. 5)
      • 5. Liu, W., Anguelov, D., Erhan, D., et al: ‘SSD: single shot multibox detector’. European Conf. Computer Vision (ECCV2016), Amsterdam, Netherlands, 2016, pp. 2137.
    6. 6)
      • 6. Lin, T.Y., Dollár, P., Girshick, R.B., et al: ‘Feature pyramid networks for object detection’. Computer Vision and Pattern Recognition (CVPR2017), Hawaii, USA, 2017, 1, p. 3.
    7. 7)
      • 7. Redmon, J., Farhadi, A.: ‘YOLOv3: an incremental improvement’, ArXiv preprint ArXiv:180402767, 2018.
    8. 8)
      • 8. Fu, C.Y., Liu, W., Ranga, A., et al: ‘DSSD: deconvolutional single shot detector’, ArXiv preprint ArXiv:170106659, 2017.
    9. 9)
      • 9. Li, Z., Peng, C., Yu, G., et al: ‘DetNet: a backbone network for object detection’, ArXiv preprint ArXiv:180406215, 2018.
    10. 10)
      • 10. Girshick, R., Donahue, J., Darrell, T., et al: ‘Rich feature hierarchies for accurate object detection and semantic segmentation’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR2014), Columbus, Ohio, USA, 2014, pp. 580587.
    11. 11)
      • 11. Uijlings, J.R., Van-De-Sande, K.E., Gevers, T., et al: ‘Selective search for object recognition’, Int. J. Comput. Vis., 2013, 104, (2), pp. 154171.
    12. 12)
      • 12. Girshick, R.: ‘Fast R-CNN’. Proc. IEEE Int. Conf. Computer Vision (ICCV2015), Santiago, Chile, 2015, pp. 14401448.
    13. 13)
      • 13. Ren, S., He, K., Girshick, R., et al: ‘Faster R-CNN: towards real-time object detection with region proposal networks’, Adv. Neural. Inf. Process. Syst., 2015, pp. 9199.
    14. 14)
      • 14. Redmon, J., Divvala, S., Girshick, R., et al: ‘You only look once: unified, real-time object detection’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR2016), Las Vegas, NV, USA, 2016, pp. 779788.
    15. 15)
      • 15. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR2015), San Diego, CA, USA, 2005, 1, pp. 886893.
    16. 16)
      • 16. LeCun, Y., Bottou, L., Bengio, Y., et al: ‘Gradient-based learning applied to document recognition’, Proc. IEEE, 1998, 86, (11), pp. 22782324.
    17. 17)
      • 17. Kong, T., Yao, A., Chen, Y., et al: ‘HyperNet: towards accurate region proposal generation and joint object detection’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR2016), Las Vegas, NV, USA, 2016, pp. 845853.
    18. 18)
      • 18. Bell, S., Lawrence-Zitnick, C., Bala, K., et al: ‘Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR2016), Las Vegas, NV, USA, 2016, pp. 28742883.
    19. 19)
      • 19. Kong, T., Sun, F., Yao, A., et al: ‘RON: reverse connection with objectness prior networks for object detection’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR2017), Hawaii, USA, 2017, 1, p. 2.
    20. 20)
      • 20. Redmon, J., Farhadi, A.: ‘YOLO9000: better, faster, stronger’, ArXiv preprint, 2017.
    21. 21)
      • 21. Zhang, X., Wang, T., Qi, J., et al: ‘Progressive attention guided recurrent network for salient object detection’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR2018), Salt Lake City, Utah, USA, 2018, pp. 714722.
    22. 22)
      • 22. Dai, J., Li, Y., He, K., et al: ‘R-FCN: object detection via region-based fully convolutional networks’, Adv. Neural Inf. Process. Syst., 2016, pp. 379387.
    23. 23)
      • 23. Woo, S., Hwang, S., Kweon, I.S.: ‘StairNet: top-down semantic aggregation for accurate one shot detection’. 2018 IEEE Winter Conf. Applications of Computer Vision (WACV2018), Lake Tahoe, USA, 2018, pp. 10931102.
    24. 24)
      • 24. Gidaris, S., Komodakis, N.: ‘Object detection via a multi-region and semantic segmentation-aware CNN model’. Proc. IEEE Int. Conf. Computer Vision (ICCV2015), Santiago, Chile, 2015, pp. 11341142.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2018.5654
Loading

Related content

content/journals/10.1049/iet-cvi.2018.5654
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address