http://iet.metastore.ingenta.com
1887

Video object segmentation via attention-modulating networks

Video object segmentation via attention-modulating networks

For access to this article, please select a purchase option:

Buy article PDF
$19.95
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
Electronics Letters — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

This Letter presents an attention-modulating network for video object segmentation that can well adapt its segmentation model to the annotated frame. Specifically, the authors first develop an efficient visual and spatial attention modulator to fast modulate the segmentation model to focus on the specific object of interest. Then they design a channel and spatial attention module and inject it into the segmentation model to further refine its feature maps. In addition, to fuse multi-scale context information, they construct a feature pyramid attention module to further process the top layer feature maps, achieving better pixel-level attention for the high-level feature maps. Finally, to address the sample imbalance issue in training, they employ focal loss that can distinguish simple samples from the difficult ones to accelerate the convergence of network training. Extensive evaluations on DAVIS2017 dataset show that the proposed approach has achieved state-of-the-art performance, outperforming the baseline OSMN by 3.6 and 5.4% in terms of IoU and F-measure without fine-tuning.

References

    1. 1)
      • 1. Caelles, S., Maninis, K.K., Pont-Tuset, J., et al: ‘One-shot video object segmentation’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 2017, pp. 53205329.
    2. 2)
      • 2. Perazzi, F., Khoreva, A., Benenson, R., et al: ‘Learning video object segmentation from static images’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 2017, pp. 34913500.
    3. 3)
      • 3. Jampani, V., Gadde, R., Gehler, P.V.: ‘Video propagation networks’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 2017, pp. 31543164.
    4. 4)
      • 4. Cheng, J., Tsai, Y.H., Hung, W.C., et al: ‘Fast and accurate online video object segmentation via tracking parts’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, June 2018, pp. 74157424.
    5. 5)
      • 5. Yang, L., Wang, Y., Xiong, X., et al: ‘Efficient video object segmentation via network modulation’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, June 2018, pp. 64996507.
    6. 6)
      • 6. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’. Int. Conf. on Learning Representations (ICLR), San Diego, CA, USA, May 2015.
    7. 7)
      • 7. Woo, S., Park, J., Lee, J.Y., et al: ‘Cbam: convolutional block attention module’. European Conf. on Computer Vision (ECCV), Munich, Germany, September 2018, pp. 319.
    8. 8)
      • 8. Hu, J., Shen, L., Sun, G.: ‘Squeeze-and-excitation networks’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, June 2018, pp. 71327141.
    9. 9)
      • 9. Li, H., Xiong, P., An, J., et al: ‘Pyramid Attention Network for Semantic Segmentation’, British Machine Vision Conference (BMVC), Newcastle Upon Tyne, UK, September 2018, arXiv preprint arXiv:1805.10180.
    10. 10)
      • 10. Lin, T.Y., Goyal, P., Girshick, R., et al: ‘Focal loss for dense object detection’. IEEE International Conference on Computer Vision (ICCV), Venice, Italy, October 2017.
    11. 11)
      • 11. Pont-Tuset, J., Perazzi, F., Caelles, S., et al: ‘The 2017 davis challenge on video object segmentation’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, July 2017.
http://iet.metastore.ingenta.com/content/journals/10.1049/el.2019.0304
Loading

Related content

content/journals/10.1049/el.2019.0304
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
Correspondence
This article has following corresponding article(s):
divided attention
This is a required field
Please enter a valid email address