access icon free Incremental transfer learning for video annotation via grouped heterogeneous sources

Here, the authors focus on incrementally acquiring heterogeneous knowledge from both internet and publicly available datasets to reduce the tedious and expensive labelling efforts required in video annotation. An incremental transfer learning framework is presented to integrate heterogeneous source knowledge and update the annotation model incrementally during the transfer learning process. Under this framework, web images and existing action videos form the source domain to provide labelled static and motion information of the target domain videos, respectively. Moreover, according to the semantic of the source domain data, all the source domain data are partitioned into several groups. Different from traditional methods, which compare the entire target domain videos with each source group from the source domain, the authors treat the group weights as sample-specific variables and optimise them along with new adding data. Two regularisers are used to prevent the incremental learning process from negative transfer. Experimental results on the two large-scale consumer video datasets (i.e. multimedia event detection (MED) and Columbia consumer video (CCV)) show the effectiveness of the proposed method.

Inspec keywords: learning (artificial intelligence); Internet; video signal processing

Other keywords: heterogeneous knowledge; transfer learning process; video annotation; grouped heterogeneous sources; web images; incremental learning process; negative transfer; large-scale consumer video datasets; internet datasets; source group; incremental transfer learning framework; expensive labelling efforts; entire target domain videos; heterogeneous source knowledge; labelled static motion information; tedious labelling efforts; group weights; annotation model; existing action videos; source domain data

Subjects: Information networks; Video signal processing; Optical, image and video signal processing; Other topics in statistics; Information retrieval techniques; Computer vision and image processing techniques; Knowledge engineering techniques

References

    1. 1)
      • 22. Ma, Z., Yang, Y., Sebe, N., et al: ‘Knowledge adaptation with partially shared features for event detection using few exemplars’, IEEE Trans. Pattern Anal. Mach. Intell., 2014, 39, (9), pp. 17891802.
    2. 2)
      • 26. Kuzborskij, I., Orabona, F., Caputo, B.: ‘From n to n + 1: multiclass transfer incremental learning’. 2013 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Portland, America, 2013, pp. 33583365.
    3. 3)
      • 3. Gan, C., Sun, C., Duan, L., et al: ‘Webly-supervised video recognition by mutually voting for relevant web images and web video frames’. European Conf. Computer Vision, Boston, America, 2015.
    4. 4)
      • 4. Sun, C., Shetty, S., Sukthankar, R., et al: ‘Temporal localization of fine-grained actions in videos by domain transfer from web images’. ACM Int. Conf. on Multimedia, Brisbane, Australia, 2015, pp. 371380.
    5. 5)
      • 33. Jiang, Y., Ye, G., Chang, S., et al: ‘Consumer video understanding: a benchmark database and an evaluation of human and machine performance’. Int. Conf. Multimedia Retrieval, Trento, Italy, 2011, p. 29.
    6. 6)
      • 27. Wang, M., Hua, X.-S.: ‘Active learning in multimedia annotation and retrieval: a survey’, ACM Trans. Intell. Syst. Technol., 2011, 2, (2), p. 10.
    7. 7)
      • 9. Zhang, C., Peng, Y.: ‘Better and faster: knowledge transfer from multiple self-supervised learning tasks via graph distillation for video classification’. Proc. of the 27th Int. Joint Conference on Artificial Intelligence (IJCAI) 2018, Stockholm, Sweden, July 13–19 2018, pp. 11351141.
    8. 8)
      • 28. Safadi, B., Quénot, G.: ‘Active learning with multiple classifiers for multimedia indexing’, Multimedia Tools Appl., 2012, 60, (2), pp. 403417.
    9. 9)
      • 34. Med2014’. Available at: http://www.nist.gov/itl/iad/mig/med14.cfm.
    10. 10)
      • 10. Gopalan, R., Li, R., Chellappa, R.: ‘Domain adaptation for object recognition: an unsupervised approach’. Int. Conf. on Computer Vision, Barcelona, Spain, 2011, pp. 9991006.
    11. 11)
      • 31. Gretton, A., Borgwardt, K., Rasch, M.J., et al: ‘A kernel method for the two-sample problem’. Advances in Neural Information Processing Systems, British Columbia, Canada, 2008.
    12. 12)
      • 41. Shen, C., Guo, Y.: ‘Unsupervised heterogeneous domain adaptation with sparse feature transformation’. Asian Conf. Machine Learning, Beijing, China, 2018.
    13. 13)
      • 29. Do, T.-M.-T., Artières, T.: ‘Large margin training for hidden Markov models with partially observed states’. Proc. of the 26th Annual Int. Conf. on Machine Learning, Quebec, Canada, 2009, pp. 265272.
    14. 14)
      • 18. Lee, K.-H., Zhang, L.: ‘Cleannet: transfer learning for scalable image classifier training with label noise’. CVPR, Salt Lake City, UT, USA, June 18–22 2018, pp. 54475456.
    15. 15)
      • 5. Singh, B., Han, X., Wu, Z., et al: ‘Selecting relevant web trained concepts for automated event retrieval’. IEEE Int. Conf. on Computer Vision, Santiago, Chile, 2015, pp. 45614569.
    16. 16)
      • 12. Yang, J., Yan, R., Hauptmann, A.: ‘Cross-domain video concept detection using adaptive SVMs’. Int. Conf. on Multimedia, Bavaria, Germany, 2007, pp. 188197.
    17. 17)
      • 38. Duan, L., Xu, D., Tsang, W.-H.: ‘Domain adaptation from multiple sources: a domain-dependent regularization approach’, IEEE Trans. Neural Netw. Learn. Syst., 2012, 23, (3), pp. 504518.
    18. 18)
      • 21. Yao, Y., Yu, Z., Li, X., et al: ‘Heterogeneous domain adaptation via soft transfer network’. Proc. of the 27th ACM International Conference on Multimedia, Nice, France, October 2019, pp. 15781586.
    19. 19)
      • 40. Long, M., Wang, J.: ‘Learning transferable features with deep adaptation networks’. Int. Conf. on Machine Learning (ICML), Lille, France, 2015.
    20. 20)
      • 2. Gan, C., Yao, T., Yang, K., et al: ‘You lead, we exceed: labor-free video concept learning by jointly exploiting web videos and images’. Computer Vision and Pattern Recognition, Las Vegas, America, 2016, pp. 923932.
    21. 21)
      • 39. Poggio, G.C.: ‘Incremental and decremental support vector machine learning’. Advances in Neural Information Processing Systems: Proc. of the 2000 Conf., British Columbia, Canada, 2001, vol. 13, p. 409.
    22. 22)
      • 16. Habibian, A., Mensink, T., Snoek, C.G.: ‘Videostory: a new multimedia embedding for few-example recognition and translation of events’. Proc. of the ACM Int. Conf. on Multimedia, Orlando, USA, 2014, pp. 1726.
    23. 23)
      • 24. Chen, L., Duan, L., Xu, D.: ‘Event recognition in videos by learning from heterogeneous web sources’. 2013 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Sydney, Australia, 2013, pp. 26662673.
    24. 24)
      • 19. Doretto, G., Yao, Y.: ‘Boosting for transfer learning with multiple auxiliary domains’. IEEE Conf. on Computer Vision and Pattern Recognition, San Francisco, America, 2010.
    25. 25)
      • 37. Bruzzone, L., Marconcini, M.: ‘Domain adaptation problems: a daSVM classification technique and a circular validation strategy’, IEEE Trans. Pattern Anal. Mach. Intell., 2010, 32, (5), pp. 770787.
    26. 26)
      • 11. Lv, J., Chen, W., Li, Q., et al: ‘Unsupervised cross-dataset person reidentification by transfer learning of spatial–temporal patterns’. IEEE Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018, pp. 79487956.
    27. 27)
      • 36. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘Imagenet classification with deep convolutional neural networks’, Adv. Neural. Inf. Process. Syst., 2012, 25, p. 2012.
    28. 28)
      • 1. Cao, Y., Long, M., Wang, J.: ‘Unsupervised domain adaptation with distribution matching machines’. Proc. of the Thirty-Second AAAI Conf. on Artificial Intelligence (AAAI-18), New Orleans, Louisiana, USA, 2–7 February 2018, pp. 27952802.
    29. 29)
      • 17. Wen, L., Limin, W., Wei, L., et al: ‘Webvision database: visual learning and understanding from web data’, arXiv:1708.02862, 2017.
    30. 30)
      • 32. Sriperumbudur, B.K., Gretton, A., Fukumizu, K., et al: ‘Hilbert space embeddings and metrics on probability measures’, J. Mach. Learn. Res., 2010, 99, pp. 15171561.
    31. 31)
      • 6. Duan, L., Xu, D., Chang, S.-F.: ‘Exploiting web images for event recognition in consumer videos: a multiple source domain adaptation approach’. IEEE Conf. on Computer Vision and Pattern Recognition, Rhode Island, USA, 2012, pp. 19591966.
    32. 32)
      • 25. Wang, H., Wu, X., Jia, Y.: ‘Video annotation via image groups from the web’, IEEE Trans. Multimed., 2014, 16, (5), pp. 12821291.
    33. 33)
      • 15. Ma, Z., Yang, Y., Nie, F., et al: ‘Harnessing lab knowledge for real-world action recognition’, Int. J. Comput. Vis., 2014, 109, (1–2), pp. 6073.
    34. 34)
      • 35. Jia, Y., Shelhamer, E., Donahue, J., et al: ‘Caffe: convolutional architecture for fast feature embedding’, Eprint Arxiv, 2014, pp. 675678.
    35. 35)
      • 23. Tang, K., Ramanathan, V., Fei-Fei, L., et al: ‘Shifting weights: adapting object detectors from image to video’. Advances in Neural Information Processing Systems, Nevada, USA, 2012, pp. 647655.
    36. 36)
      • 7. Blank, M., Gorelick, L., Shechtman, E., et al: ‘Actions as spacetime shapes’. Tenth IEEE Int. Conf. on Computer Vision, Beijing, China, 2005, vol. 2, pp. 13951402.
    37. 37)
      • 30. Borgwardt, K.M., Gretton, A., Rasch, M.J., et al: ‘Integrating structured biological data by kernel maximum mean discrepancy’, Bioinformatics, 2006, 22, (14), pp. e49e57.
    38. 38)
      • 14. Duan, L., Xu, D., Tsang, I., et al: ‘Visual event recognition in videos by learning from web data’. IEEE Conf. on Computer Vision and Pattern Recognition, San Francisco, America, 2010, pp. 19591966.
    39. 39)
      • 8. Schmid, C., Laptev, I., Caputo, B.: ‘Recognizing human actions: a local SVM approach’. Proc. of the 17th Int. Conf. on Pattern Recognition, Cambridge, UK, 2004, vol. 3, pp. 3236.
    40. 40)
      • 13. Duan, L., Tsang, I., Xu, D., et al: ‘Domain transfer SVM for video concept detection’. IEEE Conf. on Computer Vision and Pattern Recognition, Florida,America, 2009, pp. 13751381.
    41. 41)
      • 20. Schweikert, G., Widmer, C., Schölkopf, B., et al: ‘An empirical analysis of domain adaptation algorithms for genomic sequence analysis’. Advances in Neural Information Processing Systems, Vancouver, Canada, 2009.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2018.5730
Loading

Related content

content/journals/10.1049/iet-cvi.2018.5730
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading