Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Spatio-temporal multi-scale motion descriptor from a spatially-constrained decomposition for online action recognition

This study presents a spatio-temporal motion descriptor that is computed from a spatially-constrained decomposition and applied to online classification and recognition of human activities. The method starts by computing a dense optical flow without explicit spatial regularisation. Potential human actions are detected at each frame as spatially consistent moving regions of interest (RoIs). Each of these RoIs is then sequentially partitioned to obtain a spatial representation of small overlapped subregions with different sizes. Each of these region parts is characterised by a set of flow orientation histograms. A particular RoI is then described along the time by a set of recursively calculated statistics that collect information from the temporal history of orientation histograms, to form the action descriptor. At any time, the whole descriptor can be extracted and labelled by a previously trained support vector machine. The method was evaluated using three different public datasets: (i) the ViSOR dataset was used for global classification obtaining an average accuracy of 95% and for recognition in long sequences, achieving an average per-frame accuracy of 92.3%. (ii) The KTH dataset was used for global classification and (iii) the UT-datasets were used for recognition task, obtaining an average accuracy of 80% (frame rate).

References

    1. 1)
      • 19. Ryoo, M.S., Rothrock, B., Matthies, L.: ‘Pooled motion features for first-person videos’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2015, pp. 896904.
    2. 2)
      • 8. Chen, C.Y., Grauman, K.: ‘Efficient activity detection with max-subgraph search’. Computer Vision and Pattern Recognition (CVPR), 2012, pp. 12741281.
    3. 3)
      • 18. Baccouche, M., Mamalet, F., Wolf, C., et al: ‘Sequential deep learning for human action recognition’. Int. Workshop on Human Behavior Understanding, 2011, pp. 2939.
    4. 4)
      • 3. Borges, P.V.K., Conci, N., Cavallaro, A.: ‘Video-based human behavior understanding: a survey’, IEEE Trans. Circuits Syst. Video Technol., 2013, 23, (11), pp. 19932008.
    5. 5)
      • 12. Ikizler, N., Cinbis, R.G., Duygulu, P.: ‘Human action recognition with line and flow histograms’. 19th Int. Conf. on Pattern Recognition (ICPR), 2008, pp. 14.
    6. 6)
      • 27. Chang, C.C., Lin, C.J.: ‘LIBSVM: a library for support vector machines’, ACM Trans. Intell. Syst. Technol. (TIST), 2011, 2, (3), p. 27.
    7. 7)
      • 14. Zhang, Z., Hu, Y., Chan, S., et al: ‘Motion context: a new representation for human action recognition’. European Conf. on Computer Vision, 2008, pp. 817829.
    8. 8)
      • 7. Samanta, S., Chanda, B.: ‘Space–time facet model for human activity classification’, IEEE Trans. Multimed., 2014, 16, (6), pp. 15251535.
    9. 9)
      • 13. Tabia, H., Gouiffes, M., Lacassagne, L.: ‘Motion histogram quantification for human action recognition’. 21st Int. Conf. Pattern Recognition (ICPR), 2012, pp. 24042407.
    10. 10)
      • 30. Ryoo, M., Aggarwal, J.: ‘UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA)’, 2010.
    11. 11)
      • 15. Vrigkas, M., Karavasilis, V., Nikou, C., et al: ‘Matching mixtures of curves for human action recognition’, Comput. Vis. Image Underst., 2014, 119, pp. 2740.
    12. 12)
      • 32. Scovanner, P., Ali, S., Shah, M.: ‘A 3-dimensional SIFT descriptor and its application to action recognition’. Proc. of the 15th ACM Int. Conf. on Multimedia, 2007, pp. 357360.
    13. 13)
      • 24. Richefeu, J., Manzanera, A.: ‘A new hybrid differential filter for motion detection’, Comput. Vis. Graph., 2006, 32, pp. 727732.
    14. 14)
      • 28. Vezzani, R., Cucchiara, R.: ‘Video surveillance online repository (visor): an integrated framework’, Multimedia Tools Appl., 2010, 50, (2), pp. 359380.
    15. 15)
      • 36. Ji, X., Wang, C., Zuo, X., et al: ‘Multiple feature voting based human interaction recognition’, Int. J. Signal Process. Image Process. Pattern Recognit., 2016, 9, (1), pp. 323334.
    16. 16)
      • 20. Ostrovsky, Y., Meyers, E., Ganesh, S., et al: ‘Visual parsing after recovery from blindness’, Psychol. Sci., 2009, 20, (12), pp. 14841491.
    17. 17)
      • 17. Taylor, G.W., Fergus, R., LeCun, Y., et al: ‘Convolutional learning of spatio-temporal features’. European Conf. on Computer Vision, 2010, pp. 140153.
    18. 18)
      • 9. Wang, H., Oneata, D., Verbeek, J., et al: ‘A robust and efficient video representation for action recognition’, Int. J. Comput. Vis., 2016, 119, (3), pp. 219238.
    19. 19)
      • 5. Liu, A.A., Xu, N., Su, Y.T., et al: ‘Single/multi-view human action recognition via regularized multi-task learning’, Neurocomputing, 2015, 151, pp. 544553.
    20. 20)
      • 1. Aggarwal, J.K., Ryoo, M.S.: ‘Human activity analysis: a review’, ACM Comput. Surv. (CSUR), 2011, 43, (3), p. 16.
    21. 21)
      • 23. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. Computer Vision and Pattern Recognition, CVPR, 2005, vol. 1, pp. 886893.
    22. 22)
      • 21. Manzanera, A.: ‘Local jet feature space framework for image processing and representation’. Signal-Image Technology and Internet-Based Systems (SITIS), 2011, pp. 261268.
    23. 23)
      • 16. Ji, S., Xu, W., Yang, M., et al: ‘3D convolutional neural networks for human action recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2013, 35, (1), pp. 221231..
    24. 24)
      • 2. Vishwakarma, S., Agrawal, A.: ‘A survey on activity recognition and behavior understanding in video surveillance’, Vis. Comput., 2013, 29, (10), pp. 9831009.
    25. 25)
      • 29. Ballan, L., Bertini, M., Del Bimbo, A., et al: ‘Effective codebooks for human action categorization’. Computer Vision Workshops (ICCV Workshops), 2009, pp. 506513.
    26. 26)
      • 11. Riemenschneider, H., Donoser, M., Bischof, H.: ‘Bag of optical flow volumes for image sequence recognition’. The British Machine Vision Conf. (BMVC), 2009, pp. 111.
    27. 27)
      • 33. Nour el houda Slimani, K., Benezeth, Y., Souami, F.: ‘Human interaction recognition based on the co-occurence of visual words’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition Workshops, 2014, pp. 455460.
    28. 28)
      • 35. Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: ‘Recognizing interaction between human performers using “key pose doublet”’. Proc. of the 19th ACM Int. Conf. on Multimedia, 2011, pp. 13291332.
    29. 29)
      • 31. Yu, G., Yuan, J., Liu, Z.: ‘Propagative hough voting for human activity recognition’. European Conf. on Computer Vision, 2012, pp. 693706.
    30. 30)
      • 4. Vrigkas, M., Nikou, C., Kakadiaris, I.A.: ‘A review of human activity recognition methods’, Front. Robot. AI, 2015, 2, p. 28.
    31. 31)
      • 37. Ji, X., Wang, C., Li, Y.: ‘A view-invariant action recognition based on multi-view space hidden Markov models’, Int. J. Humanoid Robot., 2014, 11, (01), p. 1450011.
    32. 32)
      • 10. Chaudhry, R., Ravichandran, A., Hager, G., et al: ‘Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions’. Computer Vision and Pattern Recognition (CVPR), 2009, pp. 19321939.
    33. 33)
      • 22. van Hateren, J.H., Ruderman, D.L.: ‘Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex’, Proc. R. Soc. Lond. B, Biol. Sci., 1998, 265, (1412), pp. 23152320.
    34. 34)
      • 25. Schuldt, C., Laptev, I., Caputo, B.: ‘Recognizing human actions: a local SVM approach’. Pattern Recognition, ICPR, 2004, vol. 3, pp. 3236.
    35. 35)
      • 26. Dollár, P., Rabaud, V., Cottrell, G., et al: ‘Behavior recognition via sparse spatio-temporal features’. Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, pp. 6572.
    36. 36)
      • 34. Ryoo, M.S.: ‘Human activity prediction: early recognition of ongoing activities from streaming videos’. The Int. Conf. on Computer Vision (ICCV), 2011, pp. 10361043.
    37. 37)
      • 6. Cao, X., Zhang, H., Deng, C., et al: ‘Action recognition using 3D DAISY descriptor’, Mach. Vis. Appl., 2014, 25, (1), pp. 159171.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2016.0055
Loading

Related content

content/journals/10.1049/iet-cvi.2016.0055
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address