Back-dropout transfer learning for action recognition

Huamin Ren; Nattiya Kanhabua; Andreas Møgelmose; Weifeng Liu; Kaustubh Kulkarni; Sergio Escalera; Xavier Baró; Thomas B. Moeslund

Back-dropout transfer learning for action recognition

View Fulltext

Author(s): Huamin Ren¹ ; Nattiya Kanhabua² ; Andreas Møgelmose¹ ; Weifeng Liu³ ; Kaustubh Kulkarni^{4, 5} ; Sergio Escalera^{1, 4, 6} ; Xavier Baró^{4, 7} ; Thomas B. Moeslund¹
- Affiliations: 1: Department of Architecture, Design and Media Technology , Aalborg University , Aalborg , Denmark ;
  2: NTENT , Barcelona , Spain ;
  3: Department of Computer Science , Norwegian University of Science and Technology , Trondheim , Norway ;
  4: Computer Vision Center , Barcelona , Spain ;
  5: University of Autonoma Barcelona , Barcelona , Spain ;
  6: Universitat Autonoma de Barcelona , Barcelona , Spain ;
  7: IT, Multimedia and Telecommunications department , Universitat Oberta de Catalunya , Barcelona , Spain
Source: Volume 12, Issue 4, June 2018, p. 484 – 491
DOI: 10.1049/iet-cvi.2016.0309 , Print ISSN 1751-9632, Online ISSN 1751-9640

Received 31/08/2016, Accepted 21/12/2017, Revised 01/12/2017, Published 03/01/2018

Transfer learning aims at adapting a model learned from source dataset to target dataset. It is a beneficial approach especially when annotating on the target dataset is expensive or infeasible. Transfer learning has demonstrated its powerful learning capabilities in various vision tasks. Despite transfer learning being a promising approach, it is still an open question how to adapt the model learned from the source dataset to the target dataset. One big challenge is to prevent the impact of category bias on classification performance. Dataset bias exists when two images from the same category, but from different datasets, are not classified as the same. To address this problem, a transfer learning algorithm has been proposed, called negative back-dropout transfer learning (NB-TL), which utilizes images that have been misclassified and further performs back-dropout strategy on them to penalize errors. Experimental results demonstrate the effectiveness of the proposed algorithm. In particular, the authors evaluate the performance of the proposed NB-TL algorithm on UCF 101 action recognition dataset, achieving 88.9% recognition rate.

References

1. 1)
  - 42. Donahue, J., Jia, Y., Vinyals, O., et al: ‘Decaf: A deep convolutional activation feature for generic visual recognition’. Proc. 31th Int. Conf. on Machine Learning, Beijing, China, June 2014, pp. 647–655.
2. 2)
  - 38. Liu, J., Shah, M., Kuipers, B., et al: ‘Cross-view action recognition via view knowledge transfer’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, Massachusetts, June 2015, pp. 3209–3216.
3. 3)
  - 8. Deng, J., Berg, A., Satheesh, S., et al: ‘Ilsvrc-2012’, available at http://www.image-net.org/challenges/LSVRC/2012/, 2012.
4. 4)
  - 7. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘Imagenet classification with deep convolutional neural networks’. Proc. Annual Conf. on Neural Information Processing Systems, Lake Tahoe, USA, December 2012, pp. 1106–1114.
5. 5)
  - 24. Simonyan, K., Zisserman, A.: ‘Two-stream convolutional networks for action recognition in videos’. Proc. Annual Conf. on Neural Information Processing Systems, Montreal, Quebec, Canada, December 2014, pp. 568–576.
6. 6)
  - 1. Jhuang, H., Serre, T., Wolf, L., et al: ‘A biologically inspired system for action recognition’. Proc. Int. Conf. on Computer Vision, Rio de Janeiro, Brazil, October 2007, pp. 1–8.
7. 7)
  - 29. Wang, L., Qiao, Y., Tang, X.: ‘Action recognition with trajectory-pooled deep-convolutional descriptors’. IEEE Conf. on Computer Vision and Pattern Recognition, 2015.
8. 8)
  - 49. Soomro, K., Zamir, A.R., Shah, M.: ‘UCF101: a dataset of 101 human actions classes from videos in the wild’. CRCV-TR-12-01, 2012.
9. 9)
  - 43. Sermanet, P., Eigen, D., Zhang, X., et al: ‘Overfeat: integrated recognition, localization and detection using convolutional networks’. Proc. Int. Conf. on Learning Representations, April 2014.
10. 10)
  - 41. Zeiler, M.D., Fergus, R.: ‘Visualizing and understanding convolutional networks’. Proc. European Conf. on Computer Vision, Zurich, Switzerland, September 2014, pp. 818–833.
11. 11)
  - 30. Pan, S.J., Yang, Q.: ‘A survey on transfer learning’, IEEE Trans. Knowl. Data Eng., 2010, 22, pp. 1345–1359.
12. 12)
  - 19. Kuehne, H., Gall, J., Serre, T.: ‘An end-to-end generative framework for video segmentation and recognition’. 2016 IEEE Winter Conf. on Applications of Computer Vision (WACV), 2016.
13. 13)
  - 37. Duan, L., Xu, D., Tsang, I.H., et al: ‘Visual event recognition in videos by learning from web data’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, pp. 1667–1680.
14. 14)
  - 28. Donahue, J., Anne Hendricks, L., Guadarrama, S., et al: ‘Long-term recurrent convolutional networks for visual recognition and description’. IEEE Conf. on Computer Vision and Pattern Recognition, 2014.
15. 15)
  - 51. Oneata, D., Verbeek, J., Schmid, C.: ‘The LEAR submission at Thumos 2014’. ECCV THUMOS Workshop, 2014.
16. 16)
  - 50. Chatfield, K., Simonyan, K., Vedaldi, A., et al: ‘Return of the devil in the details: delving deep into convolutional nets’. Proc. British Machine Vision Conf., Nottingham, UK, September 2014.
17. 17)
  - 46. Srivastava, N., Hinton, G.E., Krizhevsky, A., et al: ‘Dropout: a simple way to prevent neural networks from overfitting’, J. Mach. Learn. Res., 2014.
18. 18)
  - 13. Laptev, I., Lindeberg, T.: ‘Space-time interest points’. 9th Int. Conf. on Computer Vision, Nice, France, 2003.
19. 19)
  - 44. Oquab, M., Bottou, L., Laptev, I., et al: ‘Learning and transferring mid-level image representations using convolutional neural networks’. Proc. of the IEEE conf. on Computer Vision and Pattern Recognition, 2014.
20. 20)
  - 14. Wang, H., Ullah, M.M., Klaser, A., et al: ‘Evaluation of local spatio-temporal features for action recognition’. British Machine Vision Conf., 2009.
21. 21)
  - 10. Gkioxari, G., Hariharan, B., Girshick, R., et al: ‘R-CNNs for pose estimation and action detection’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, Ohio, June 2014.
22. 22)
  - 15. Wang, H., Schmid, C.: ‘Action recognition with improved trajectories’. Proc. Int. Conf. on Computer Vision, Sydney, Australia, December 2013, pp. 3551–3558.
23. 23)
  - 20. Jhuang, H., Gall, J., Zuffi, S., et al: ‘Towards understanding action recognition’. Int. Conf. on Computer Vision (ICCV), 2013.
24. 24)
  - 40. Yosinski, J., Clune, J., Bengio, Y., et al: ‘How transferable are features in deep neural networks?’. Proc. Annual Conf. on Neural Information Processing Systems, Montreal, Canada, December 2014, pp. 3320–3328.
25. 25)
  - 25. Feichtenhofer, C., Pinz, A., Zisserman, A.: ‘Convolutional two-stream network fusion for video action recognition’. Conf. on Computer Vision and Pattern Recognition, 2016.
26. 26)
  - 11. Torralba, A., Efros, A.A.: ‘Unbiased look at dataset bias’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Colorado Springs, June 2011, pp. 1521–1528.
27. 27)
  - 27. Ji, S., Xu, W., Yang, M., et al: ‘3D convolutional neural networks for human action recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2013.
28. 28)
  - 18. Oneata, D., Verbeek, J., Schmid, C.: ‘Action and event recognition with fisher vectors on a compact feature set’. IEEE Int. Conf. on Computer Vision, 2013.
29. 29)
  - 12. Csurka, G., Dance, C.R., Lixin, F., et al: ‘Visual categorization with bags of keypoints’. Workshop on Statistical Learning in Computer Vision, ECCV, 2004.
30. 30)
  - 6. Long, J., Shelhamer, E., Darrell, T.: ‘Fully convolutional networks for semantic segmentation’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, Massachusetts, June 2015, pp. 3431–3440.
31. 31)
  - 2. Karpathy, A., Toderici, G., Shetty, S., et al: ‘Large-scale video classification with convolutional neural networks’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, Ohio, June 2014, pp. 1725–1732.
32. 32)
  - 22. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’, arXiv preprint arXiv:1409.1556, 2014.
33. 33)
  - 52. Peng, X., Wang, L., Wang, X., et al: ‘Bag of visual words and fusion methods for action recognition: comprehensive study and good practice’, Comput. Vis. Image Underst., 2016, 150, pp. 109–125.
34. 34)
  - 9. Hoffman, J., Guadarrama, S., Tzeng, E., et al: ‘Lsda: large scale detection through adaptation’. Proc. Annual Conf. on Neural Information Processing Systems, Montreal, Canada, December 2014.
35. 35)
  - 39. Rahmani, H., Mian, A.: ‘Learning a non-linear knowledge transfer model for cross-view action recognition’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, Massachusetts, June 2015, pp. 2458–2466.
36. 36)
  - 34. Layne, R., Hospedales, T.M., Gong, S.: ‘Domain transfer for person re-identification’. Proc. the 4th ACM/IEEE Int. Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Stream, ACM Multimedia Conf., Barcelona, Spain, October 2013, pp. 25–32.
37. 37)
  - 54. Wu, J., Zhang, Y., Lin, W.: ‘Towards good practices for action video encoding’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, Ohio, June 2014, pp. 2577–2584.
38. 38)
  - 48. Dai, W., Chen, Y., Xue, G.R., et al: ‘Translated learning: transfer learning across different feature spaces’. Proc. Annual Conf. on Neural Information Processing Systems, Vancouver, B.C., Canada, December 2008, pp. 353–360.
39. 39)
  - 35. Cook, D., Feuz, K.D., Krishnan, N.C.: ‘Transfer learning for activity recognition: a survey’, Knowl. Inf. Syst., 2013, 36, pp. 537–556.
40. 40)
  - 33. Gao, J., Ling, H., Hu, W., et al: ‘Transfer learning based visual tracking with Gaussian processes regression’. Proc. European Conf. on Computer Vision, Zurich, Switzerland, September 2014, pp. 188–203.
41. 41)
  - 5. Cheng, G., Han, J., Guo, L., et al: ‘Learning coarse-to-fine sparselets for efficient object detection and scene classification’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, Massachusetts, June 2015.
42. 42)
  - 21. Xia, L., Chen, C.C., Aggarwal, J.K: ‘View invariant human action recognition using histograms of 3D joints’. Computer Vision and Pattern Recognition Workshops (CVPRW), 2012.
43. 43)
  - 31. Lu, J., Behbood, V., Hao, P., et al: ‘Transfer learning using computational intelligence: a survey’, Knowl.-Based Syst., 2015, 80, pp. 14–23.
44. 44)
  - 4. Szegedy, C., Liu, W., Jia, Y., et al: ‘Going deeper with convolutions’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, Massachusetts, June 2015.
45. 45)
  - 3. Xie, J., Xu, L., Chen, E.: ‘Image denoising and inpainting with deep neural networks’, Adv. Neural Inf. Process. Syst., 2012, 25, pp. 341–349.
46. 46)
  - 53. Cai, Z., Wang, L., Qiao, X.P.Y.: ‘Multi-view super vector for action recognition’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, Ohio, June 2014, pp. 596–603.
47. 47)
  - 36. Xu, T., Zhu, F., Wong, E.K., et al: ‘Dual many-to-one-encoder-based transfer learning for cross-dataset human action recognition’, Image Vis. Comput., 2016.
48. 48)
  - 45. Jain, M., van Gemert, J.C., Snoek, C.: ‘What do 15,000 object categories tell us about classifying and localizing actions?’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2014.
49. 49)
  - 23. Wang, L., Yuanjun, X., Zhe, W., et al: ‘Temporal segment networks: towards good practices for deep action recognition’. Proc. of the European Conf. on Computer Vision, 2016.
50. 50)
  - 26. Feichtenhofer, C., Pinz, A., Wildes, R.: ‘Spatiotemporal residual networks for video action recognition’. Advances in Neural Information Processing Systems, 2016.
51. 51)
  - 17. Laptev, I., Marszalek, M., Schmid, C., et al: ‘Learning realistic human actions from movies’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Anchorage, Alaska, June 2008, pp. 1–8.
52. 52)
  - 47. Olivas, E.S., Guerrero, J.D.M., Sober, M.M., et al: ‘Handbook of research on machine learning applications and trends: algorithms, methods and techniques’, ‘Information science reference’ (IGI Publishing, Hershey PA, 2009), no. 2.
53. 53)
  - 16. Ren, H., Liu, W., Olsen, S., et al: ‘Unsupervised behavior-specific dictionary learning for abnormal event detection’. British Machine Vision Conf., 2015.
54. 54)
  - 32. Cao, X., Wipf, D., Wen, F., et al: ‘A practical transfer learning algorithm for face verification’. Proc. IEEE Int. Conf. on Computer Vision, Portland, Oregon, June 2013, pp. 3208–3215.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Back-dropout transfer learning for action recognition

References

Related content