Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Video object segmentation via attention-modulating networks

This Letter presents an attention-modulating network for video object segmentation that can well adapt its segmentation model to the annotated frame. Specifically, the authors first develop an efficient visual and spatial attention modulator to fast modulate the segmentation model to focus on the specific object of interest. Then they design a channel and spatial attention module and inject it into the segmentation model to further refine its feature maps. In addition, to fuse multi-scale context information, they construct a feature pyramid attention module to further process the top layer feature maps, achieving better pixel-level attention for the high-level feature maps. Finally, to address the sample imbalance issue in training, they employ focal loss that can distinguish simple samples from the difficult ones to accelerate the convergence of network training. Extensive evaluations on DAVIS2017 dataset show that the proposed approach has achieved state-of-the-art performance, outperforming the baseline OSMN by 3.6 and 5.4% in terms of IoU and F-measure without fine-tuning.

References

    1. 1)
      • 2. Perazzi, F., Khoreva, A., Benenson, R., et al: ‘Learning video object segmentation from static images’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 2017, pp. 34913500.
    2. 2)
      • 8. Hu, J., Shen, L., Sun, G.: ‘Squeeze-and-excitation networks’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, June 2018, pp. 71327141.
    3. 3)
      • 6. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’. Int. Conf. on Learning Representations (ICLR), San Diego, CA, USA, May 2015.
    4. 4)
      • 10. Lin, T.Y., Goyal, P., Girshick, R., et al: ‘Focal loss for dense object detection’. IEEE International Conference on Computer Vision (ICCV), Venice, Italy, October 2017.
    5. 5)
      • 11. Pont-Tuset, J., Perazzi, F., Caelles, S., et al: ‘The 2017 davis challenge on video object segmentation’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 2017.
    6. 6)
      • 7. Woo, S., Park, J., Lee, J.Y., et al: ‘Cbam: convolutional block attention module’. European Conf. on Computer Vision (ECCV), Munich, Germany, September 2018, pp. 319.
    7. 7)
      • 1. Caelles, S., Maninis, K.K., Pont-Tuset, J., et al: ‘One-shot video object segmentation’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 2017, pp. 53205329.
    8. 8)
      • 4. Cheng, J., Tsai, Y.H., Hung, W.C., et al: ‘Fast and accurate online video object segmentation via tracking parts’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, June 2018, pp. 74157424.
    9. 9)
      • 3. Jampani, V., Gadde, R., Gehler, P.V.: ‘Video propagation networks’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 2017, pp. 31543164.
    10. 10)
      • 5. Yang, L., Wang, Y., Xiong, X., et al: ‘Efficient video object segmentation via network modulation’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, June 2018, pp. 64996507.
    11. 11)
      • 9. Li, H., Xiong, P., An, J., et al: ‘Pyramid Attention Network for Semantic Segmentation’, British Machine Vision Conference (BMVC), Newcastle Upon Tyne, UK, September 2018, arXiv preprint arXiv:1805.10180.
http://iet.metastore.ingenta.com/content/journals/10.1049/el.2019.0304
Loading

Related content

content/journals/10.1049/el.2019.0304
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
Correspondence
This article has following corresponding article(s):
divided attention
This is a required field
Please enter a valid email address