© The Institution of Engineering and Technology
In this study, the authors propose a multi-group–multi-class domain adaptation framework to recognise events in consumer videos by leveraging a large number of web videos. The authors’ framework is extended from multi-class support vector machine by adding a novel data-dependent regulariser, which can force the event classifier to become consistent in consumer videos. To obtain web videos, they search them using several event-related keywords and refer the videos returned by one keyword search as a group. They also leverage a video representation which is the average of convolutional neural networks features of the video frames for better performance. Comprehensive experiments on the two real-world consumer video datasets demonstrate the effectiveness of their method for event recognition in consumer videos.
References
-
-
1)
-
19. Liu, X., Yuan, X., Yan, S., et al: ‘Multi-class semi-supervised SVMs with positiveness exclusive regularization’. IEEE Int. Conf. on Computer Vision, Barcelona, Spain, 2011, pp. 1435–1442.
-
2)
-
4. Chen, L., Duan, L., Xu, D.: ‘Event recognition in videos by learning from heterogeneous web sources’. IEEE Conf. on Computer Vision and Pattern Recognition, Portland, USA, June 2013, pp. 2666–2673.
-
3)
-
10. Yang, J., Yan, R., Hauptmann, A.G.: ‘Cross-domain video concept detection using adaptive SVMs’. Proc. of the 15th Int. Conf. on Multimedia, Augsburg, Germany, 2007, pp. 188–197.
-
4)
-
20. Saffari, A., Leistner, C., Bischof, H.: ‘Regularized multi-class semisupervised boosting’. IEEE Conf. on Computer Vision and Pattern Recognition, Miami, USA, June 2009, pp. 967–974.
-
5)
-
34. Laptev, I., Marszalek, M., Schmid, C., et al: ‘Learning realistic human actions from movies’. IEEE Conf. on Computer Vision and Pattern Recognition, Anchorage, USA, 2008, pp. 1–8.
-
6)
-
31. Miller, G.A.: ‘WordNet: a lexical database for English’, Commun. ACM, 1995, 38, (11), pp. 39–41 (doi: 10.1145/219717.219748).
-
7)
-
7. Wang, H., Wu, X., Jia, Y.: ‘Video annotation via image groups from the web’, IEEE Trans. Multimed., 2014, 16, (5), pp. 1282–1291 (doi: 10.1109/TMM.2014.2312251).
-
8)
-
29. Zien, A., De Bona, F., Ong, C.S.: ‘Training and approximation of a primal multiclass support vector machine’, , 2007.
-
9)
-
26. Duan, L., Xu, D., Tsang, I.W.: ‘Domain adaptation from multiple sources: a domain-dependent regularization approach’, IEEE Trans. Neural Netw. Learn. Syst., 2012, 23, (3), pp. 504–518 (doi: 10.1109/TNNLS.2011.2178556).
-
10)
-
11. Bruzzone, L., Marconcini, M.: ‘Domain adaptation problems: a DASVM classification technique and a circular validation strategy’, IEEE Trans. Pattern Anal. Mach. Intell., 2010, 32, (5), pp. 770–787 (doi: 10.1109/TPAMI.2009.57).
-
11)
-
21. Donahue, J., Hoffman, J., Rodner, E., et al: ‘Semi-supervised domain adaptation with instance constraints’. IEEE Conf. on Computer Vision and Pattern Recognition, Portland, USA, June 2013, pp. 668–675.
-
12)
-
16. Duan, L., Xu, D., Tsang, I.W., Luo, J.: ‘Visual event recognition in videos by learning from web data’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, (9), pp. 1667–1680 (doi: 10.1109/TPAMI.2011.265).
-
13)
-
12. Hoffman, J, Rodner, E., Donahue, J., et al: ‘Efficient learning of domain-invariant image representations’, .
-
14)
-
20. Lowe, D.G.: ‘Distinctive image features from scale-invariant keypoints’, Int. J. Comput. Vis., 2004, 60, pp. 91–110 (doi: 10.1023/B:VISI.0000029664.99615.94).
-
15)
-
24. Jia, Y., Shelhamer, E., Donahue, J.: ‘Caffe: convolutional architecture for fast feature embedding’, , 2014.
-
16)
-
33. Duan, L., Xu, D., Chang, S.F.: ‘Exploiting web images for event recognition in consumer videos: a multiple source domain adaptation approach’. IEEE Conf. on Computer Vision and Pattern Recognition, Providence, USA, 2012, pp. 1338–1345.
-
17)
-
9. Jiang, Y., He, G., Chang, S., et al: ‘Consumer video understanding: a benchmark database and an evaluation of human and machine performance’. Proc. of the First ACM Int. Conf. on Multimedia Retrieval, Trento, Italy, 2011, p. 29.
-
18)
-
23. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘Imagenet classification with deep convolutional neural networks’. Advances in Neural Information Processing Systems, Harrahs Lake Tahoe, USA, 2012, pp. 1097–1105.
-
19)
-
15. Xu, J., Ramos, S., Vázquez, D., et al: ‘Cost-sensitive structured SVM for multi-category domain adaptation’. Int. Conf. on Pattern Recognition, Stockholm, Sweden, August 2014, pp. 3886–3891.
-
20)
-
25. Belkin, M., Niyogi, P., Sindhwani, V.: ‘Manifold regularization: a geometric framework for learning from labeled and unlabeled examples’, J. Mach. Learn. Res., 2006, 7, pp. 2399–2434.
-
21)
-
18. Tanha, J., Van Someren, M., Afsarmanesh, H.: ‘Boosting for multiclass semi-supervised learning’, Pattern Recognit. Lett., 2014, 37, pp. 63–77 (doi: 10.1016/j.patrec.2013.10.008).
-
22)
-
17. Valizadegan, H., Jin, R., Jain, A.K.: ‘Semi-supervised boosting for multi-class classification’. Proc. of the European Conf. on Machine Learning and Knowledge Discovery in Databases, Antwerp, Belgium, 2008, pp. 522–537.
-
23)
-
22. Karpathy, A., Toderici, G., Shetty, S., et al: ‘Large-scale video classification with convolutional neural networks’. IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, USA, 2014, pp. 1725–1732.
-
24)
-
5. Feng, Y., Wu, X., Wang, H., et al: ‘Multi-group adaptation for event recognition from videos’. Int. Conf. on Pattern Recognition, Stockholm, Sweden, August 2014, pp. 3915–3920.
-
25)
-
1. Pan, S.J., Yang, Q.: ‘A survey on transfer learning’, IEEE Trans. Knowl. Data Eng., 2010, 22, pp. 1345–1359 (doi: 10.1109/TKDE.2009.191).
-
26)
-
27. Chattopadhyay, R., Sun, Q., Fan, W., et al: ‘Multisource domain adaptation and its application to early detection of fatigue’, ACM Trans. Knowl. Discov. Data, 2012, 6, (4), p. 18 (doi: 10.1145/2382577.2382582).
-
27)
-
8. Loui, A., Luo, J., Chang, S., et al: ‘Kodaks consumer video benchmark data set: concept definition and annotation’. Proc. of the Int. Workshop on Multimedia Information Retrieval, Augsburg, Germany, 2007, pp. 245–254.
-
28)
-
28. Do, T.M.T., Arti'eres, T.: ‘Large margin training for hidden Markov models with partially observed states’. Proc. of Int. Conf. on Machine Learning, Montreal, Canada, 2009, pp. 265–272.
-
29)
-
13. Wu, X., Wang, H., Liu, C., et al: ‘Cross-view action recognition over heterogeneous feature spaces’. IEEE Int. Conf. on Computer Vision, Sydney, Australia, December 2013, pp. 609–616.
-
30)
-
3. Ikizler-Cinbis, N., Cinbis, R.G., Sclaroff, S.: ‘Learning actions from the web’. IEEE Int. Conf. on Computer Vision, Kyoto, Japan, September 2009, pp. 995–1002.
-
31)
-
14. Lee, C., Jang, M.G.: ‘A prior model of structural SVMs for domain adaptation’, ETRI J., 2011, 33, (5), pp. 712–719 (doi: 10.4218/etrij.11.0110.0571).
-
32)
-
6. Crammer, K., Singer, Y.: ‘On the algorithmic implementation of multiclass kernel-based vector machines’, J. Mach. Learn. Res., 2002, 2, pp. 265–292.
-
33)
-
16. Xu, J., Ramos, S., Vázquez, D., et al: ‘Domain adaptation of deformable part-based models’, IEEE Trans. Pattern Anal. Mach. Intell., 2014, 36, (12), pp. 2367–2380 (doi: 10.1109/TPAMI.2014.2327973).
-
34)
-
30. Gong, B., Shi, Y., Sha, F., et al: ‘Geodesic flow kernel for unsupervised domain adaptation’. IEEE Conf. on Computer Vision and Pattern Recognition, Providence, USA, 2012, pp. 2066–2073.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2014.0405
Related content
content/journals/10.1049/iet-cvi.2014.0405
pub_keyword,iet_inspecKeyword,pub_concept
6
6