In this study, the authors propose a multi-group–multi-class domain adaptation framework to recognise events in consumer videos by leveraging a large number of web videos. The authors’ framework is extended from multi-class support vector machine by adding a novel data-dependent regulariser, which can force the event classifier to become consistent in consumer videos. To obtain web videos, they search them using several event-related keywords and refer the videos returned by one keyword search as a group. They also leverage a video representation which is the average of convolutional neural networks features of the video frames for better performance. Comprehensive experiments on the two real-world consumer video datasets demonstrate the effectiveness of their method for event recognition in consumer videos.

References

1. 1)
  - 19. Liu, X., Yuan, X., Yan, S., et al: ‘Multi-class semi-supervised SVMs with positiveness exclusive regularization’. IEEE Int. Conf. on Computer Vision, Barcelona, Spain, 2011, pp. 1435–1442.
2. 2)
  - 4. Chen, L., Duan, L., Xu, D.: ‘Event recognition in videos by learning from heterogeneous web sources’. IEEE Conf. on Computer Vision and Pattern Recognition, Portland, USA, June 2013, pp. 2666–2673.
3. 3)
  - 10. Yang, J., Yan, R., Hauptmann, A.G.: ‘Cross-domain video concept detection using adaptive SVMs’. Proc. of the 15th Int. Conf. on Multimedia, Augsburg, Germany, 2007, pp. 188–197.
4. 4)
  - 20. Saffari, A., Leistner, C., Bischof, H.: ‘Regularized multi-class semisupervised boosting’. IEEE Conf. on Computer Vision and Pattern Recognition, Miami, USA, June 2009, pp. 967–974.
5. 5)
  - 34. Laptev, I., Marszalek, M., Schmid, C., et al: ‘Learning realistic human actions from movies’. IEEE Conf. on Computer Vision and Pattern Recognition, Anchorage, USA, 2008, pp. 1–8.
6. 6)
  - 31. Miller, G.A.: ‘WordNet: a lexical database for English’, Commun. ACM, 1995, 38, (11), pp. 39–41 (doi: 10.1145/219717.219748).
7. 7)
  - 7. Wang, H., Wu, X., Jia, Y.: ‘Video annotation via image groups from the web’, IEEE Trans. Multimed., 2014, 16, (5), pp. 1282–1291 (doi: 10.1109/TMM.2014.2312251).
8. 8)
  - 29. Zien, A., De Bona, F., Ong, C.S.: ‘Training and approximation of a primal multiclass support vector machine’, ASMDA, 2007.
9. 9)
  - 26. Duan, L., Xu, D., Tsang, I.W.: ‘Domain adaptation from multiple sources: a domain-dependent regularization approach’, IEEE Trans. Neural Netw. Learn. Syst., 2012, 23, (3), pp. 504–518 (doi: 10.1109/TNNLS.2011.2178556).
10. 10)
  - 11. Bruzzone, L., Marconcini, M.: ‘Domain adaptation problems: a DASVM classification technique and a circular validation strategy’, IEEE Trans. Pattern Anal. Mach. Intell., 2010, 32, (5), pp. 770–787 (doi: 10.1109/TPAMI.2009.57).
11. 11)
  - 21. Donahue, J., Hoffman, J., Rodner, E., et al: ‘Semi-supervised domain adaptation with instance constraints’. IEEE Conf. on Computer Vision and Pattern Recognition, Portland, USA, June 2013, pp. 668–675.
12. 12)
  - 16. Duan, L., Xu, D., Tsang, I.W., Luo, J.: ‘Visual event recognition in videos by learning from web data’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, (9), pp. 1667–1680 (doi: 10.1109/TPAMI.2011.265).
13. 13)
  - 12. Hoffman, J, Rodner, E., Donahue, J., et al: ‘Efficient learning of domain-invariant image representations’, arXiv preprint, arXiv:1301.3224, 2013.
14. 14)
  - 20. Lowe, D.G.: ‘Distinctive image features from scale-invariant keypoints’, Int. J. Comput. Vis., 2004, 60, pp. 91–110 (doi: 10.1023/B:VISI.0000029664.99615.94).
15. 15)
  - 24. Jia, Y., Shelhamer, E., Donahue, J.: ‘Caffe: convolutional architecture for fast feature embedding’, arXiv preprint, arXiv:1408.5093, 2014.
16. 16)
  - 33. Duan, L., Xu, D., Chang, S.F.: ‘Exploiting web images for event recognition in consumer videos: a multiple source domain adaptation approach’. IEEE Conf. on Computer Vision and Pattern Recognition, Providence, USA, 2012, pp. 1338–1345.
17. 17)
  - 9. Jiang, Y., He, G., Chang, S., et al: ‘Consumer video understanding: a benchmark database and an evaluation of human and machine performance’. Proc. of the First ACM Int. Conf. on Multimedia Retrieval, Trento, Italy, 2011, p. 29.
18. 18)
  - 23. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘Imagenet classification with deep convolutional neural networks’. Advances in Neural Information Processing Systems, Harrahs Lake Tahoe, USA, 2012, pp. 1097–1105.
19. 19)
  - 15. Xu, J., Ramos, S., Vázquez, D., et al: ‘Cost-sensitive structured SVM for multi-category domain adaptation’. Int. Conf. on Pattern Recognition, Stockholm, Sweden, August 2014, pp. 3886–3891.
20. 20)
  - 25. Belkin, M., Niyogi, P., Sindhwani, V.: ‘Manifold regularization: a geometric framework for learning from labeled and unlabeled examples’, J. Mach. Learn. Res., 2006, 7, pp. 2399–2434.
21. 21)
  - 18. Tanha, J., Van Someren, M., Afsarmanesh, H.: ‘Boosting for multiclass semi-supervised learning’, Pattern Recognit. Lett., 2014, 37, pp. 63–77 (doi: 10.1016/j.patrec.2013.10.008).
22. 22)
  - 17. Valizadegan, H., Jin, R., Jain, A.K.: ‘Semi-supervised boosting for multi-class classification’. Proc. of the European Conf. on Machine Learning and Knowledge Discovery in Databases, Antwerp, Belgium, 2008, pp. 522–537.
23. 23)
  - 22. Karpathy, A., Toderici, G., Shetty, S., et al: ‘Large-scale video classification with convolutional neural networks’. IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, USA, 2014, pp. 1725–1732.
24. 24)
  - 5. Feng, Y., Wu, X., Wang, H., et al: ‘Multi-group adaptation for event recognition from videos’. Int. Conf. on Pattern Recognition, Stockholm, Sweden, August 2014, pp. 3915–3920.
25. 25)
  - 1. Pan, S.J., Yang, Q.: ‘A survey on transfer learning’, IEEE Trans. Knowl. Data Eng., 2010, 22, pp. 1345–1359 (doi: 10.1109/TKDE.2009.191).
26. 26)
  - 27. Chattopadhyay, R., Sun, Q., Fan, W., et al: ‘Multisource domain adaptation and its application to early detection of fatigue’, ACM Trans. Knowl. Discov. Data, 2012, 6, (4), p. 18 (doi: 10.1145/2382577.2382582).
27. 27)
  - 8. Loui, A., Luo, J., Chang, S., et al: ‘Kodaks consumer video benchmark data set: concept definition and annotation’. Proc. of the Int. Workshop on Multimedia Information Retrieval, Augsburg, Germany, 2007, pp. 245–254.
28. 28)
  - 28. Do, T.M.T., Arti'eres, T.: ‘Large margin training for hidden Markov models with partially observed states’. Proc. of Int. Conf. on Machine Learning, Montreal, Canada, 2009, pp. 265–272.
29. 29)
  - 13. Wu, X., Wang, H., Liu, C., et al: ‘Cross-view action recognition over heterogeneous feature spaces’. IEEE Int. Conf. on Computer Vision, Sydney, Australia, December 2013, pp. 609–616.
30. 30)
  - 3. Ikizler-Cinbis, N., Cinbis, R.G., Sclaroff, S.: ‘Learning actions from the web’. IEEE Int. Conf. on Computer Vision, Kyoto, Japan, September 2009, pp. 995–1002.
31. 31)
  - 14. Lee, C., Jang, M.G.: ‘A prior model of structural SVMs for domain adaptation’, ETRI J., 2011, 33, (5), pp. 712–719 (doi: 10.4218/etrij.11.0110.0571).
32. 32)
  - 6. Crammer, K., Singer, Y.: ‘On the algorithmic implementation of multiclass kernel-based vector machines’, J. Mach. Learn. Res., 2002, 2, pp. 265–292.
33. 33)
  - 16. Xu, J., Ramos, S., Vázquez, D., et al: ‘Domain adaptation of deformable part-based models’, IEEE Trans. Pattern Anal. Mach. Intell., 2014, 36, (12), pp. 2367–2380 (doi: 10.1109/TPAMI.2014.2327973).
34. 34)
  - 30. Gong, B., Shi, Y., Sha, F., et al: ‘Geodesic flow kernel for unsupervised domain adaptation’. IEEE Conf. on Computer Vision and Pattern Recognition, Providence, USA, 2012, pp. 2066–2073.

Multi-group–multi-class domain adaptation for event recognition

References

Related content