Here, the authors introduce a novel system which incorporates the discriminative motion of oriented magnitude patterns (MOMP) descriptor into simple yet efficient techniques. The authors’ descriptor both investigates the relations of the local gradient distributions in neighbours among consecutive image sequences and characterises information changing across different orientations. The proposed system has two main contributions: (i) the authors adopt feature post-processing principal component analysis followed by vector of locally aggregated descriptors encoding to de-correlate MOMP descriptor and reduce the dimension in order to speed up the algorithm; (ii) then the authors include the feature selection (i.e. statistical dependency, mutual information, and minimal redundancy maximal relevance) to find out the best feature subset to improve the performance and decrease the computational expense in classification through support vector machine techniques. Experiment results on four data sets, Weizmann (98.4%), KTH (96.3%), UCF Sport (82.0%), and HMDB51 (31.5%), prove the efficiency of the authors’ algorithm.

References

1. 1)
  - 33. Peng, H., Long, F., Ding, C.: ‘Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy’, IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27, (8), pp. 1226–1238.
2. 2)
  - 54. Liu, L., Shao, L., Li, X., et al: ‘Learning spatio-temporal representations for action recognition: a genetic programming approach’, IEEE Trans. Cybern., 2016, 46, (1), pp. 158–170.
3. 3)
  - 58. Varol, G., Laptev, I., Schmid, C.: ‘Long-term temporal convolutions for action recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2017.
4. 4)
  - 16. Faraki, M., Palhang, M., Sanderson, C.: ‘Log-Euclidean bag of words for human action recognition’, IET Comput. Vis., 2014, 9, (3), pp. 331–339.
5. 5)
  - 40. Tran, D., Sorokin, A.: ‘Human activity recognition with metric learning’, European Conf. Computer Vision–ECCV 2008, 2008, Marseilles, France, pp. 548–561.
6. 6)
  - 43. Dollár, P., Rabaud, V., Cottrell, G., et al: ‘Behavior recognition via sparse spatio-temporal features’. Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005. IEEE, Beijing, China, 2005, pp. 65–72.
7. 7)
  - 24. Donahue, J., Anne-Hendricks, L., Guadarrama, S., et al: ‘Long-term recurrent convolutional networks for visual recognition and description’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 2625–2634.
8. 8)
  - 55. Kläser, A.: ‘Learning human actions in video’. PhD thesis, Université de Grenoble, 2010.
9. 9)
  - 31. Wang, H., Ullah, M.M., Klaser, A., et al: ‘Evaluation of local spatio-temporal features for action recognition’. BMVC 2009-British Machine Vision Conf. BMVA Press, London, UK, 2009, p. 124-1.
10. 10)
  - 10. Soomro, K., Zamir, A.R., Shah, M.: ‘Ucf101: a dataset of 101 human actions classes from videos in the wild’, arXiv preprint arXiv:12120402, 2012.
11. 11)
  - 8. Russakovsky, O., Deng, J., Su, H., et al: ‘Imagenet large scale visual recognition challenge’, Int. J. Comput. Vis., 2015, 115, (3), pp. 211–252.
12. 12)
  - 18. Clapes, A., Tuytelaars, T., Escalera, S.: ‘Darwintrees for action recognition’. The IEEE Int. Conf. on Computer Vision (ICCV), 2017, pp. 3169–3178.
13. 13)
  - 60. Misra, I., Zitnick, C.L., Hebert, M.: ‘Shuffle and learn: unsupervised learning using temporal order verification’. European Conf. on Computer Vision, Amsterdam, Netherlands, 2016, pp. 527–544.
14. 14)
  - 12. Laptev, I., Marszałek, M., Schmid, C., et al: ‘Learning realistic human actions from movies’. 2008. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). IEEE, Anchorage, AK, USA, 2008, pp. 1–8.
15. 15)
  - 30. Wang, L., Suter, D.: ‘Recognizing human activities from silhouettes: motion subspace and factorial discriminative graphical model’. Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conf. on. IEEE, Minneapolis, MN, USA, 2007, pp. 1–8.
16. 16)
  - 20. Dalal, N., Triggs, B., Schmid, C.: ‘Human detection using oriented histograms of flow and appearance’. European Conf. on Computer Vision. Springer, Graz, Austria, 2006, pp. 428–441.
17. 17)
  - 7. Girdhar, R., Ramanan, D., Gupta, A., et al: ‘Actionvlad: learning spatio-temporal aggregation for action classification’, arXiv preprint arXiv:170402895, 2017.
18. 18)
  - 26. Lei, J., Li, G., Zhang, J., et al: ‘Continuous action segmentation and recognition using hybrid convolutional neural network-hidden Markov model model’, IET Comput. Vis., 2016, 10, (6), pp. 537–544.
19. 19)
  - 15. Scovanner, P., Ali, S., Shah, M.: ‘A 3-dimensional sift descriptor and its application to action recognition’. Proc. of the 15th ACM Int. Conf. on Multimedia. ACM, Augsburg, Germany, 2007, pp. 357–360.
20. 20)
  - 13. Wang, H., Kläser, A., Schmid, C., et al: ‘Dense trajectories and motion boundary descriptors for action recognition’, Int. J. Comput. Vis., 2013, 103, (1), pp. 60–79.
21. 21)
  - 62. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘Imagenet classification with deep convolutional neural networks’. Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.
22. 22)
  - 27. Bobick, A.F., Davis, J.W.: ‘The recognition of human movement using temporal templates’, IEEE Trans. Pattern Anal. Mach. Intell., 2001, 23, (3), pp. 257–267.
23. 23)
  - 4. Tran, D., Bourdev, L., Fergus, R., et al: ‘Learning spatiotemporal features with 3d convolutional networks’. IEEE Int. Conf. on Computer Vision (ICCV). IEEE, Santiago, Chile, 2015, pp. 4489–4497.
24. 24)
  - 3. Simonyan, K., Zisserman, A.: ‘Two-stream convolutional networks for action recognition in videos’. Advances in Neural Information Processing Systems, Montreal, Quebec, Canada, 2014, pp. 568–576.
25. 25)
  - 42. Yang, W., Wang, Y., Mori, G.: ‘Human action recognition from a single clip per action’. Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th Int. Conf. on. IEEE, Kyoto, Japan, 2009, pp. 482–489.
26. 26)
  - 35. Rodriguez, M.D., Ahmed, J., Shah, M.: ‘Action mach a spatio-temporal maximum average correlation height filter for action recognition’. 2008 IEEE Conf. on. Computer Vision and Pattern Recognition (CVPR), IEEE, Anchorage, AK, USA, 2008, pp. 1–8.
27. 27)
  - 5. Feichtenhofer, C., Pinz, A., Zisserman, A.: ‘Convolutional two-stream network fusion for video action recognition’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2016, pp. 1933–1941.
28. 28)
  - 53. Iosifidis, A., Tefas, A., Pitas, I.: ‘Discriminant bag of words based representation for human action recognition’, Pattern Recognit. Lett., 2014, 49, pp. 185–192.
29. 29)
  - 28. Yilmaz, A., Shah, M.: ‘Actions sketch: a novel action representation’. Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conf. on IEEE, 2005, vol. 1, pp. 984–989.
30. 30)
  - 23. Phan, H.H., Vu, N.S., Nguyen, V.L., et al: ‘Motion of oriented magnitudes patterns for human action recognition’. Int. Symp. on Visual Computing. Springer, Las Vegas, NV, USA, 2016, pp. 168–177.
31. 31)
  - 47. Wang, J., Chen, Z., Wu, Y.: ‘Action recognition with multiscale spatio-temporal contexts’. Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conf. on. IEEE, Colorado Springs, CO, USA, 2011, pp. 3185–3192.
32. 32)
  - 61. He, K., Zhang, X., Ren, S., et al: ‘Deep residual learning for image recognition’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778.
33. 33)
  - 37. Ali, S., Basharat, A., Shah, M.: ‘Chaotic invariants for human action recognition’. Computer Vision, 2007. ICCV 2007. IEEE 11th Int. Conf. on. IEEE, Rio de Janeiro, Brazil, 2007, pp. 1–8.
34. 34)
  - 19. Klaser, A., Marszałek, M., Schmid, C.: ‘A spatio-temporal descriptor based on 3d-gradients’. BMVC 19th British Machine Vision Conf. British Machine Vision Association, Leeds, UK, 2008, p. 275-1.
35. 35)
  - 59. Feichtenhofer, C., Pinz, A., Wildes, R.P.: ‘Spatiotemporal multiplier networks for video action recognition’. 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, USA, 2017, pp. 7445–7454.
36. 36)
  - 38. Jhuang, H., Serre, T., Wolf, L., et al: ‘A biological inspired system for human action classification’. Proc. Int. Conf. on Computer Vision, Rio de Janeiro, Brazil, 2007, vol. 2.
37. 37)
  - 49. Jiang, Z., Lin, Z., Davis, L.: ‘Recognizing human actions by learning and matching shape-motion prototype trees’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, (3), pp. 533–547.
38. 38)
  - 17. Rahman, S., Cho, S.Y., Leung, M.: ‘Recognising human actions by analysing negative spaces’, IET Comput. Vis., 2012, 6, (3), pp. 197–213.
39. 39)
  - 32. Kantorov, V., Laptev, I.: ‘Efficient feature extraction, encoding and classification for action recognition’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Columbis, OH, USA, 2014, pp. 2593–2600.
40. 40)
  - 56. Al-Ghamdi, M., Zhang, L., Gotoh, Y.: ‘Spatio-temporal sift and its application to human action classification’. European Conf. on Computer Vision. Springer, Florence, Italy, 2012, pp. 301–310.
41. 41)
  - 34. Schuldt, C., Laptev, I., Caputo, B.: ‘Recognizing human actions: a local SVM approach’. Pattern Recognition, 2004. ICPR 2004. Proc. of the 17th Int. Conf. on, IEEE, 2004, vol. 3, pp. 32–36.
42. 42)
  - 2. Le, Q.V., Zou, W.Y., Yeung, S.Y., et al: ‘Learning hierarchical invariant spatiotemporal features for action recognition with independent subspace analysis’. 2011 IEEE Conf. on. Computer Vision and Pattern Recognition (CVPR). IEEE, Colorado Springs, CO, USA, 2011, pp. 3361–3368.
43. 43)
  - 50. Kliper-Gross, O., Gurovich, Y., Hassner, T., et al: ‘Motion interchange patterns for action recognition in unconstrained videos’. European Conf. on Computer Vision. Springer, Florence, Italy, 2012, pp. 256–269.
44. 44)
  - 51. Zhou, Q., Wang, G.: ‘Atomic action features: a new feature for action recognition’. European Conf. on Computer Vision. Springer, Florence, Italy, 2012, pp. 291–300.
45. 45)
  - 46. Taylor, G.W., Fergus, R., LeCun, Y., et al: ‘Convolutional learning of spatiotemporal features’. European Conf. on Computer Vision. Springer, Heraklion, Crete, Greece, 2010, pp. 140–153.
46. 46)
  - 57. Bilen, H., Fernando, B., Gavves, E., et al: ‘Dynamic image networks for action recognition’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 3034–3042.
47. 47)
  - 44. Yeffet, L., Wolf, L.: ‘Local trinary patterns for human action recognition’. Computer Vision, 2009 IEEE 12th Int. Conf. on. IEEE, Kyoto, Japan, 2009, pp. 492–497.
48. 48)
  - 11. Laptev, I.: ‘On space-time interest points’, Int. J. Comput. Vis., 2005, 64, (2–3), pp. 107–123.
49. 49)
  - 25. Lazaridou, A., Pham, N.T., Baroni, M.: ‘Combining language and vision with a multimodal skip-gram model’, arXiv preprint arXiv:150102598, 2015.
50. 50)
  - 41. Gorelick, L., Blank, M., Shechtman, E., et al: ‘Actions as space-time shapes’, IEEE Trans. Pattern Anal. Mach. Intell., 2007, 29, (12), pp. 2247–2253.
51. 51)
  - 45. Kovashka, A., Grauman, K.: ‘Learning a hierarchy of discriminative space-time neighborhood features for har’. 2010 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 2010, pp. 2046–2053.
52. 52)
  - 9. Jhuang, H., Garrote, H., Poggio, E., et al: ‘A large video database for human motion recognition’. Proc. of IEEE Int. Conf. on Computer Vision, Barcelona, Spain, 2011, vol. 4, p. 6.
53. 53)
  - 1. Ji, S., Xu, W., Yang, M., et al: ‘3d convolutional neural networks for human action recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2013, 35, (1), pp. 221–231.
54. 54)
  - 22. Das, S.: ‘Filters, wrappers and a boosting-based hybrid for feature selection’. Int. Conf. on Machine Learning (ICML), Citeseer, 2001, vol. 1, pp. 74–81.
55. 55)
  - 39. Ikizler, N., Duygulu, P.: ‘Human action recognition using distribution of oriented rectangular patches’, Human Motion–Underst., Model., Capture Animat., 2007, pp. 271–284.
56. 56)
  - 21. Jégou, H., Douze, M., Schmid, C., et al: ‘Aggregating local descriptors into a compact image representation’. 2010 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE, San Francisca, CA, USA, 2010, pp. 3304–3311.
57. 57)
  - 48. Sadanand, S., Corso, J.J.: ‘Action Bank: a high-level representation of activity in video’. Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conf. on. IEEE, Providence, RI, USA, 2012, pp. 1234–1241.
58. 58)
  - 36. Niebles, J.C., Fei-Fei, L.: ‘A hierarchical model of shape and appearance for human action classification’. IEEE Conf. on. Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE, Minneapolis, MN, USA, 2007, pp. 1–8.
59. 59)
  - 6. Zhu, W., Hu, J., Sun, G., et al: ‘A key volume mining deep framework for action recognition’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 1991–1999.
60. 60)
  - 14. Wang, H., Schmid, C.: ‘Action recognition with improved trajectories’. Proc. of the IEEE Int. Conf. on Computer Vision, Sydney, Australia, 2013, pp. 3551–3558.
61. 61)
  - 29. Blank, M., Gorelick, L., Shechtman, E., et al: ‘Actions as spacetime shapes’. Computer Vision, 2005. ICCV 2005. Tenth IEEE Int. Conf. on, IEEE, Beijing, China, 2005, vol. 2, pp. 1395–1402.
62. 62)
  - 52. Shabani, A.H., Zelek, J.S., Clausi, D.A.: ‘Multiple scale-specific representations for improved human action recognition’, Pattern Recognit. Lett., 2013, 34, (15), pp. 1771–1779.

Action recognition based on motion of oriented magnitude patterns and feature selection

References

Related content