Online video object segmentation via LRS representation

Song Gu; Jian Wang; Yingjie Du; Weirui Zhang; Wei Hao; Dongmei Zhou

Online video object segmentation via LRS representation

View Fulltext

Author(s): Song Gu¹ ; Jian Wang¹ ; Yingjie Du¹ ; Weirui Zhang¹ ; Wei Hao¹ ; Dongmei Zhou²
- Affiliations: 1: Aeronautic Engineering Department , Chengdu Aeronautic Polytechnic , Chengdu , People's Republic of China ;
  2: College of Information Science and Technology , Chengdu University of Technology , Chengdu , People's Republic of China
Source: Volume 13, Issue 5, August 2019, p. 469 – 479
DOI: 10.1049/iet-cvi.2018.5717 , Print ISSN 1751-9632, Online ISSN 1751-9640

This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License (http://creativecommons.org/licenses/by-nc-nd/3.0/)

Received 06/11/2018, Accepted 21/02/2019, Revised 27/01/2019, Published 06/03/2019

Video object segmentation has been extensively investigated in computer vision recently because of its wide range of applications. The key factor of the segmentation is the construction of the spatiotemporal coherence. Inaccurate motion approximation as a measurement of the coherence usually leads to an inaccurate segmentation result. To obtain an accurate segmentation result, a low-rank sparse (LRS)-based approach is proposed. Regarding each superpixel as an element, this algorithm has a good segmentation accuracy compared with other pixel-level algorithms. Each element can be represented by the sparse linear combinations of dictionary templates, and this algorithm capitalises on the inherent low-rank structure of representations that are learnt jointly. The represented coefficients construct an affinity matrix which measures the elements’ similarity between the current frame and the templates in the dictionary. For video object segmentation, a principled spatiotemporal objective function that uses LRS saliency term to propagate information between frames. Furthermore, an online parameter updating scheme is proposed to enhance the system's robustness. The online model propagates information forward without the need to access future frames. Evaluations on many challenging sequences demonstrate that the authors' approach outperforms the state-of-the-art methods in terms of object segmentation accuracy.

References

1. 1)
  - 15. Xiao, F., Jae-Lee, Y.: ‘Track and segment: an iterative unsupervised approach for video object proposals’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, Nevada, 2016, pp. 933–942.
2. 2)
  - 36. Bao, C., Wu, Y., Ling, H., et al: ‘Real time robust l1 tracker using accelerated proximal gradient approach’. IEEE Conf. Computer Vision and Pattern Recognition, Rhode, Island, 2012, pp. 1830–1837.
3. 3)
  - 5. Mehrani, P., Veksler, O.: ‘Saliency segmentation based on learning and graph cut refinement’. British Machine Vision Conf., Aberystwyth, Wales, 2010, pp. 1–12.
4. 4)
  - 19. Tsai, Y.H., Yang, M.H., Black, M.J.: ‘Video segmentation via object flow’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, Nevada, 2016, pp. 3899–3908.
5. 5)
  - 25. Yin, Z., Collins, R.T.: ‘Online figure-ground segmentation with edge pixel classification’. British Machine Vision Conf., Leeds, UK, 2008, pp. 1–10.
6. 6)
  - 26. Criminisi, A., Cross, G., Blake, A., et al: ‘Bilayer segmentation of live video’. IEEE Conf. Computer Vision and Pattern Recognition, New York, NY, 2006, vol. 1, pp. 53–60.
7. 7)
  - 10. Jain, S.D., Xiong, B., Grauman, K.: ‘Fusionseg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos’. Proc. Computer Vision and Pattern Recognition, Honolulu, Hawaii, 2017, vol. 1.
8. 8)
  - 1. Zhang, D., Guo, G., Huang, D., et al: ‘Poseflow: a deep motion representation for understanding human behaviors in videos’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, Utah, 2018, pp. 6762–6770.
9. 9)
  - 41. Shen, X., Wu, Y.: ‘A unified approach to salient object detection via low rank matrix recovery’. 2012 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Rhode, Island, 2012, pp. 853–860.
10. 10)
  - 9. Li, F., Kim, T., Humayun, A., et al: ‘Video segmentation by tracking many figure-ground segments’. 2013 IEEE Int. Conf. Computer Vision (ICCV), Sydney, Australia, 2013, pp. 2192–2199.
11. 11)
  - 31. Fragkiadaki, K., Arbelaez, P., Felsen, P., et al: ‘Learning to segment moving objects in videos’. IEEE Conf. Computer Vision and Pattern Recognition, Boston, Massachusetts, 2015, pp. 4083–4090.
12. 12)
  - 58. Zhou, F., Bing-Kang, S., Cohen, M.F.: ‘Time-mapping using space–time saliency’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, Ohio, 2014, pp. 3358–3365.
13. 13)
  - 55. Seo, H.J., Milanfar, P.: ‘Static and space–time visual saliency detection by self-resemblance’, J. Vis., 2009, 9, (12), p. 15.
14. 14)
  - 59. Achanta, R., Hemami, S., Estrada, F., et al: ‘Frequency-tuned salient region detection’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR 2009), Miami, Florida, 2009, pp. 1597–1604.
15. 15)
  - 2. He, K., Gkioxari, G., Dollár, P., et al: ‘Mask R-CNN’. 2017 IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 2980–2988.
16. 16)
  - 7. Protiere, A., Sapiro, G.: ‘Interactive image segmentation via adaptive weighted distances’, IEEE Trans. Image Process., 2007, 16, (4), pp. 1046–1057.
17. 17)
  - 6. Hernandez-Vela, A., Hernández-Vela, A., Primo, C., et al: ‘Automatic user interaction correction via multi-label graph cuts’. IEEE Int. Conf. Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 2011, pp. 1276–1281.
18. 18)
  - 21. Wang, S., Lu, H., Yang, F., et al: ‘Superpixel tracking’, 2011.
19. 19)
  - 44. Mei, X., Ling, H.: ‘Robust visual tracking and vehicle classification via sparse representation’, IEEE Trans. Pattern Anal. Mach. Intell., 2011, 33, (11), pp. 2259–2272.
20. 20)
  - 16. Gu, S., Wang, J., Pan, L., et al: ‘Figure/ground video segmentation via low-rank sparse learning’. 2016 IEEE Int. Conf. Image Processing (ICIP), Phoenix, Arizona, 2016, pp. 864–868.
21. 21)
  - 20. Jampani, V., Gadde, R., Gehler, P.V.: ‘Video propagation networks’. Proc. Computer Vision and Pattern Recognition, Honolulu, Hawaii, 2017, vol. 6, p. 7.
22. 22)
  - 43. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’, arXiv preprint arXiv:14091556, 2014.
23. 23)
  - 57. Fu, H., Cao, X., Tu, Z.: ‘Cluster-based co-saliency detection’, IEEE Trans. Image Process., 2013, 22, (10), pp. 3766–3778.
24. 24)
  - 12. Zhang, D., Javed, O., Shah, M.: ‘Video object segmentation through spatially accurate and temporally dense extraction of primary object regions’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Portland, Oregon, 2013, pp. 628–635.
25. 25)
  - 38. Yao, J., Liu, X., Qi, C.: ‘Foreground detection using low rank and structured sparsity’. 2014 IEEE Int. Conf. Multimedia and Expo (ICME), Chengdu, China, 2014, pp. 1–6.
26. 26)
  - 45. Zhuang, L., Gao, H., Lin, Z., et al: ‘Non-negative low rank and sparse graph for semi-supervised learning’. IEEE Conf. Computer Vision and Pattern Recognition, Rhode, Island, 2012, pp. 2328–2335.
27. 27)
  - 18. Lee, Y.J., Kim, J., Grauman, K.: ‘Key-segments for video object segmentation’. 2011 IEEE Int. Conf. Computer Vision (ICCV), Barcelona, Spain, 2011, pp. 1995–2002.
28. 28)
  - 24. Tsai, D., Flagg, M., Nakazawa, A., et al: ‘Motion coherent tracking using multi-label MRF optimization’, Int. J. Comput. Vis., 2012, 100, (2), pp. 190–202.
29. 29)
  - 27. Han, J., Cheng, G., Li, Z., et al: ‘A unified metric learning-based framework for co-saliency detection’, IEEE Trans. Circuits Syst. Video Technol., 2018, 28, (10), pp. 2473–2483.
30. 30)
  - 48. Kolmogorov, V., Zabin, R.: ‘What energy functions can be minimized via graph cuts?’, IEEE Trans. Pattern Anal. Mach. Intell., 2004, 26, (2), pp. 147–159.
31. 31)
  - 11. Wang, W., Shen, J., Yang, R., et al: ‘Saliency-aware video object segmentation’, IEEE Trans. Pattern Anal. Mach. Intell., 2017, 40, (1), pp. 20–33.
32. 32)
  - 4. Wang, T., Han, B., Collomosse, J.: ‘Touchcut: fast image and video segmentation using single-touch interaction’, Comput. Vis. Image Underst., 2014, 120, pp. 14–30.
33. 33)
  - 39. Liu, X., Lin, L., Yuille, A.L.: ‘Robust region grouping via internal patch statistics’. IEEE Conf. Computer Vision and Pattern Recognition, Portland, Oregon, 2013, pp. 1931–1938.
34. 34)
  - 37. Zhang, T., Ghanem, B., Liu, S., et al: ‘Low-rank sparse learning for robust visual tracking’. European Conf. Computer Vision, Firenze, Italy, 2012, pp. 470–484.
35. 35)
  - 17. Jain, S.D., Grauman, K.: ‘Supervoxel-consistent foreground propagation in video’. European Conf. Computer Vision, Zurich, Switzerland, 2014, pp. 656–671.
36. 36)
  - 46. Lin, Z., Liu, R., Su, Z.: ‘Linearized alternating direction method with adaptive penalty for low-rank representation’. Advances in Neural Information Processing Systems, Granada, Spain, 2011, pp. 612–620.
37. 37)
  - 3. Fathi, A., Balcan, M.F., Ren, X., et al: ‘Combining self-training and active learning for video segmentation’, in Hoey, J., McKenna, S.J., Trucco, E. (Eds): Proceedings of the British Machine Vision Conference, (BMVA Press, UK, 2011), pp. 78.1–78.11.
38. 38)
  - 53. Brox, T., Malik, J.: ‘Object segmentation by long term analysis of point trajectories’. European Conf. Computer Vision, Crete, Greece, 2010, pp. 282–295.
39. 39)
  - 8. Papazoglou, A., Ferrari, V.: ‘Fast object segmentation in unconstrained video’. 2013 IEEE Int. Conf. Computer Vision (ICCV), Sydney, Australia, 2013, pp. 1777–1784.
40. 40)
  - 60. Godec, M., Roth, P.M., Bischof, H.: ‘Hough-based tracking of non-rigid objects’, Comput. Vis. Image Underst., 2013, 117, (10), pp. 1245–1256.
41. 41)
  - 32. Han, J., Chen, H., Liu, N., et al: ‘CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion’, IEEE Trans. Cybern., 2017, 99, pp. 1–13.
42. 42)
  - 50. Wang, T., Collomosse, J.: ‘Probabilistic motion diffusion of labeling priors for coherent video segmentation’, IEEE Trans. Multimed., 2012, 14, (2), pp. 389–400.
43. 43)
  - 56. Guo, C., Ma, Q., Zhang, L.: ‘Spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform’, 2008.
44. 44)
  - 23. Yang, Y., Sundaramoorthi, G.: ‘Shape tracking with occlusions via coarse-to-fine region-based Sobolev descent’, IEEE Trans. Pattern Anal. Mach. Intell., 2015, 37, (5), pp. 1053–1066.
45. 45)
  - 54. Yang, C., Zhang, L., Lu, H., et al: ‘Saliency detection via graph-based manifold ranking’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Portland, Oregon, 2013, pp. 3166–3173.
46. 46)
  - 35. Wen, L., Du, D., Lei, Z., et al: ‘Jots: joint online tracking and segmentation’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, Massachusetts, 2015, pp. 2226–2234.
47. 47)
  - 33. Achanta, R., Shaji, A., Smith, K., et al: ‘SLIC superpixels’, 2010.
48. 48)
  - 13. Grundmann, M., Kwatra, V., Han, M., et al: ‘Efficient hierarchical graph-based video segmentation’. 2010 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, 2010, pp. 2141–2148.
49. 49)
  - 29. Zhang, D., Meng, D., Han, J.: ‘Co-saliency detection via a self-paced multiple-instance learning framework’, IEEE Trans. Pattern Anal. Mach. Intell., 2017, 39, (5), pp. 865–878.
50. 50)
  - 28. Zhang, D., Han, J., Yang, L., et al: ‘SPFTN: A Joint Learning Framework for Localizing and Segmenting Objects in Weakly Labeled Videos’, IEEE Trans. Pattern Anal. Mach. Intell., 2018, pp. 1–1, doi: 10.1109/TPAMI.2018.2881114.
51. 51)
  - 22. Chockalingam, P., Pradeep, N., Birchfield, S.: ‘Adaptive fragments-based tracking of non-rigid objects using level sets’. Int. Conf. Computer Vision, Kyoto, Japan, 2009, pp. 1530–1537.
52. 52)
  - 14. Jang, W.D., Lee, C., Kim, C.S.: ‘Primary object segmentation in videos via alternate convex optimization of foreground and background distributions’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, Nevada, 2016, pp. 696–704.
53. 53)
  - 42. Ma, C., Huang, J.B., Yang, X., et al: ‘Hierarchical convolutional features for visual tracking’. Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 3074–3082.
54. 54)
  - 49. Boykov, Y., Kolmogorov, V.: ‘An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision’, IEEE Trans. Pattern Anal. Mach. Intell., 2004, 26, (9), pp. 1124–1137.
55. 55)
  - 34. Perazzi, F., Krähenbühl, P., Pritch, Y., et al: ‘Saliency filters: contrast based filtering for salient region detection’. 2012 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Rhode, Island, 2012, pp. 733–740.
56. 56)
  - 47. Rother, C., Kolmogorov, V., Blake, A.: ‘Grabcut: interactive foreground extraction using iterated graph cuts’, ACM Trans. Graph. (TOG), 2004, 23, pp. 309–314.
57. 57)
  - 30. Zhang, D., Han, J., Zhang, Y.: ‘Supervision by fusion: towards unsupervised learning of deep salient object detector’. Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, vol. 1, p.3.
58. 58)
  - 40. Li, C., Lin, L., Zuo, W., et al: ‘Sold: sub-optimal low-rank decomposition for efficient video segmentation’. IEEE Conf. Computer Vision and Pattern Recognition, Boston, Massachusetts, 2015, pp. 5519–5527.
59. 59)
  - 52. Yang, F., Lu, H., Yang, M.H.: ‘Robust superpixel tracking’, IEEE Trans. Image Process., 2014, 23, (4), pp. 1639–1651.
60. 60)
  - 51. Hu, W., Li, W., Zhang, X., et al: ‘Single and multiple object tracking using a multi-feature joint sparse representation’, IEEE Trans. Comput. Vis. Pattern Recogn., 2015, 37, (4), pp. 816–833.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Online video object segmentation via LRS representation

References

Related content