http://iet.metastore.ingenta.com
1887

Spatio-temporal multi-scale motion descriptor from a spatially-constrained decomposition for online action recognition

Spatio-temporal multi-scale motion descriptor from a spatially-constrained decomposition for online action recognition

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Computer Vision — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

This study presents a spatio-temporal motion descriptor that is computed from a spatially-constrained decomposition and applied to online classification and recognition of human activities. The method starts by computing a dense optical flow without explicit spatial regularisation. Potential human actions are detected at each frame as spatially consistent moving regions of interest (RoIs). Each of these RoIs is then sequentially partitioned to obtain a spatial representation of small overlapped subregions with different sizes. Each of these region parts is characterised by a set of flow orientation histograms. A particular RoI is then described along the time by a set of recursively calculated statistics that collect information from the temporal history of orientation histograms, to form the action descriptor. At any time, the whole descriptor can be extracted and labelled by a previously trained support vector machine. The method was evaluated using three different public datasets: (i) the ViSOR dataset was used for global classification obtaining an average accuracy of 95% and for recognition in long sequences, achieving an average per-frame accuracy of 92.3%. (ii) The KTH dataset was used for global classification and (iii) the UT-datasets were used for recognition task, obtaining an average accuracy of 80% (frame rate).

References

    1. 1)
      • J.K. Aggarwal , M.S. Ryoo .
        1. Aggarwal, J.K., Ryoo, M.S.: ‘Human activity analysis: a review’, ACM Comput. Surv. (CSUR), 2011, 43, (3), p. 16.
        . ACM Comput. Surv. (CSUR) , 3 , 16
    2. 2)
      • S. Vishwakarma , A. Agrawal .
        2. Vishwakarma, S., Agrawal, A.: ‘A survey on activity recognition and behavior understanding in video surveillance’, Vis. Comput., 2013, 29, (10), pp. 9831009.
        . Vis. Comput. , 10 , 983 - 1009
    3. 3)
      • P.V.K. Borges , N. Conci , A. Cavallaro .
        3. Borges, P.V.K., Conci, N., Cavallaro, A.: ‘Video-based human behavior understanding: a survey’, IEEE Trans. Circuits Syst. Video Technol., 2013, 23, (11), pp. 19932008.
        . IEEE Trans. Circuits Syst. Video Technol. , 11 , 1993 - 2008
    4. 4)
      • M. Vrigkas , C. Nikou , I.A. Kakadiaris .
        4. Vrigkas, M., Nikou, C., Kakadiaris, I.A.: ‘A review of human activity recognition methods’, Front. Robot. AI, 2015, 2, p. 28.
        . Front. Robot. AI , 28
    5. 5)
      • A.A. Liu , N. Xu , Y.T. Su .
        5. Liu, A.A., Xu, N., Su, Y.T., et al: ‘Single/multi-view human action recognition via regularized multi-task learning’, Neurocomputing, 2015, 151, pp. 544553.
        . Neurocomputing , 544 - 553
    6. 6)
      • X. Cao , H. Zhang , C. Deng .
        6. Cao, X., Zhang, H., Deng, C., et al: ‘Action recognition using 3D DAISY descriptor’, Mach. Vis. Appl., 2014, 25, (1), pp. 159171.
        . Mach. Vis. Appl. , 1 , 159 - 171
    7. 7)
      • S. Samanta , B. Chanda .
        7. Samanta, S., Chanda, B.: ‘Space–time facet model for human activity classification’, IEEE Trans. Multimed., 2014, 16, (6), pp. 15251535.
        . IEEE Trans. Multimed. , 6 , 1525 - 1535
    8. 8)
      • C.Y. Chen , K. Grauman .
        8. Chen, C.Y., Grauman, K.: ‘Efficient activity detection with max-subgraph search’. Computer Vision and Pattern Recognition (CVPR), 2012, pp. 12741281.
        . Computer Vision and Pattern Recognition (CVPR) , 1274 - 1281
    9. 9)
      • H. Wang , D. Oneata , J. Verbeek .
        9. Wang, H., Oneata, D., Verbeek, J., et al: ‘A robust and efficient video representation for action recognition’, Int. J. Comput. Vis., 2016, 119, (3), pp. 219238.
        . Int. J. Comput. Vis. , 3 , 219 - 238
    10. 10)
      • R. Chaudhry , A. Ravichandran , G. Hager .
        10. Chaudhry, R., Ravichandran, A., Hager, G., et al: ‘Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions’. Computer Vision and Pattern Recognition (CVPR), 2009, pp. 19321939.
        . Computer Vision and Pattern Recognition (CVPR) , 1932 - 1939
    11. 11)
      • H. Riemenschneider , M. Donoser , H. Bischof .
        11. Riemenschneider, H., Donoser, M., Bischof, H.: ‘Bag of optical flow volumes for image sequence recognition’. The British Machine Vision Conf. (BMVC), 2009, pp. 111.
        . The British Machine Vision Conf. (BMVC) , 1 - 11
    12. 12)
      • N. Ikizler , R.G. Cinbis , P. Duygulu .
        12. Ikizler, N., Cinbis, R.G., Duygulu, P.: ‘Human action recognition with line and flow histograms’. 19th Int. Conf. on Pattern Recognition (ICPR), 2008, pp. 14.
        . 19th Int. Conf. on Pattern Recognition (ICPR) , 1 - 4
    13. 13)
      • H. Tabia , M. Gouiffes , L. Lacassagne .
        13. Tabia, H., Gouiffes, M., Lacassagne, L.: ‘Motion histogram quantification for human action recognition’. 21st Int. Conf. Pattern Recognition (ICPR), 2012, pp. 24042407.
        . 21st Int. Conf. Pattern Recognition (ICPR) , 2404 - 2407
    14. 14)
      • Z. Zhang , Y. Hu , S. Chan .
        14. Zhang, Z., Hu, Y., Chan, S., et al: ‘Motion context: a new representation for human action recognition’. European Conf. on Computer Vision, 2008, pp. 817829.
        . European Conf. on Computer Vision , 817 - 829
    15. 15)
      • M. Vrigkas , V. Karavasilis , C. Nikou .
        15. Vrigkas, M., Karavasilis, V., Nikou, C., et al: ‘Matching mixtures of curves for human action recognition’, Comput. Vis. Image Underst., 2014, 119, pp. 2740.
        . Comput. Vis. Image Underst. , 27 - 40
    16. 16)
      • S. Ji , W. Xu , M. Yang .
        16. Ji, S., Xu, W., Yang, M., et al: ‘3D convolutional neural networks for human action recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2013, 35, (1), pp. 221231..
        . IEEE Trans. Pattern Anal. Mach. Intell. , 1 , 221 - 231
    17. 17)
      • G.W. Taylor , R. Fergus , Y. LeCun .
        17. Taylor, G.W., Fergus, R., LeCun, Y., et al: ‘Convolutional learning of spatio-temporal features’. European Conf. on Computer Vision, 2010, pp. 140153.
        . European Conf. on Computer Vision , 140 - 153
    18. 18)
      • M. Baccouche , F. Mamalet , C. Wolf .
        18. Baccouche, M., Mamalet, F., Wolf, C., et al: ‘Sequential deep learning for human action recognition’. Int. Workshop on Human Behavior Understanding, 2011, pp. 2939.
        . Int. Workshop on Human Behavior Understanding , 29 - 39
    19. 19)
      • M.S. Ryoo , B. Rothrock , L. Matthies .
        19. Ryoo, M.S., Rothrock, B., Matthies, L.: ‘Pooled motion features for first-person videos’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 2015, pp. 896904.
        . Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition , 896 - 904
    20. 20)
      • Y. Ostrovsky , E. Meyers , S. Ganesh .
        20. Ostrovsky, Y., Meyers, E., Ganesh, S., et al: ‘Visual parsing after recovery from blindness’, Psychol. Sci., 2009, 20, (12), pp. 14841491.
        . Psychol. Sci. , 12 , 1484 - 1491
    21. 21)
      • A. Manzanera .
        21. Manzanera, A.: ‘Local jet feature space framework for image processing and representation’. Signal-Image Technology and Internet-Based Systems (SITIS), 2011, pp. 261268.
        . Signal-Image Technology and Internet-Based Systems (SITIS) , 261 - 268
    22. 22)
      • J.H. van Hateren , D.L. Ruderman .
        22. van Hateren, J.H., Ruderman, D.L.: ‘Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex’, Proc. R. Soc. Lond. B, Biol. Sci., 1998, 265, (1412), pp. 23152320.
        . Proc. R. Soc. Lond. B, Biol. Sci. , 1412 , 2315 - 2320
    23. 23)
      • N. Dalal , B. Triggs .
        23. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. Computer Vision and Pattern Recognition, CVPR, 2005, vol. 1, pp. 886893.
        . Computer Vision and Pattern Recognition, CVPR , 886 - 893
    24. 24)
      • J. Richefeu , A. Manzanera .
        24. Richefeu, J., Manzanera, A.: ‘A new hybrid differential filter for motion detection’, Comput. Vis. Graph., 2006, 32, pp. 727732.
        . Comput. Vis. Graph. , 727 - 732
    25. 25)
      • C. Schuldt , I. Laptev , B. Caputo .
        25. Schuldt, C., Laptev, I., Caputo, B.: ‘Recognizing human actions: a local SVM approach’. Pattern Recognition, ICPR, 2004, vol. 3, pp. 3236.
        . Pattern Recognition, ICPR , 32 - 36
    26. 26)
      • P. Dollár , V. Rabaud , G. Cottrell .
        26. Dollár, P., Rabaud, V., Cottrell, G., et al: ‘Behavior recognition via sparse spatio-temporal features’. Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, pp. 6572.
        . Visual Surveillance and Performance Evaluation of Tracking and Surveillance , 65 - 72
    27. 27)
      • C.C. Chang , C.J. Lin .
        27. Chang, C.C., Lin, C.J.: ‘LIBSVM: a library for support vector machines’, ACM Trans. Intell. Syst. Technol. (TIST), 2011, 2, (3), p. 27.
        . ACM Trans. Intell. Syst. Technol. (TIST) , 3 , 27
    28. 28)
      • R. Vezzani , R. Cucchiara .
        28. Vezzani, R., Cucchiara, R.: ‘Video surveillance online repository (visor): an integrated framework’, Multimedia Tools Appl., 2010, 50, (2), pp. 359380.
        . Multimedia Tools Appl. , 2 , 359 - 380
    29. 29)
      • L. Ballan , M. Bertini , A. Del Bimbo .
        29. Ballan, L., Bertini, M., Del Bimbo, A., et al: ‘Effective codebooks for human action categorization’. Computer Vision Workshops (ICCV Workshops), 2009, pp. 506513.
        . Computer Vision Workshops (ICCV Workshops) , 506 - 513
    30. 30)
      • M. Ryoo , J. Aggarwal .
        30. Ryoo, M., Aggarwal, J.: ‘UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA)’, 2010.
        .
    31. 31)
      • G. Yu , J. Yuan , Z. Liu .
        31. Yu, G., Yuan, J., Liu, Z.: ‘Propagative hough voting for human activity recognition’. European Conf. on Computer Vision, 2012, pp. 693706.
        . European Conf. on Computer Vision , 693 - 706
    32. 32)
      • P. Scovanner , S. Ali , M. Shah .
        32. Scovanner, P., Ali, S., Shah, M.: ‘A 3-dimensional SIFT descriptor and its application to action recognition’. Proc. of the 15th ACM Int. Conf. on Multimedia, 2007, pp. 357360.
        . Proc. of the 15th ACM Int. Conf. on Multimedia , 357 - 360
    33. 33)
      • K. Nour el houda Slimani , Y. Benezeth , F. Souami .
        33. Nour el houda Slimani, K., Benezeth, Y., Souami, F.: ‘Human interaction recognition based on the co-occurence of visual words’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition Workshops, 2014, pp. 455460.
        . Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition Workshops , 455 - 460
    34. 34)
      • M.S. Ryoo .
        34. Ryoo, M.S.: ‘Human activity prediction: early recognition of ongoing activities from streaming videos’. The Int. Conf. on Computer Vision (ICCV), 2011, pp. 10361043.
        . The Int. Conf. on Computer Vision (ICCV) , 1036 - 1043
    35. 35)
      • S. Mukherjee , S.K. Biswas , D.P. Mukherjee .
        35. Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: ‘Recognizing interaction between human performers using “key pose doublet”’. Proc. of the 19th ACM Int. Conf. on Multimedia, 2011, pp. 13291332.
        . Proc. of the 19th ACM Int. Conf. on Multimedia , 1329 - 1332
    36. 36)
      • X. Ji , C. Wang , X. Zuo .
        36. Ji, X., Wang, C., Zuo, X., et al: ‘Multiple feature voting based human interaction recognition’, Int. J. Signal Process. Image Process. Pattern Recognit., 2016, 9, (1), pp. 323334.
        . Int. J. Signal Process. Image Process. Pattern Recognit. , 1 , 323 - 334
    37. 37)
      • X. Ji , C. Wang , Y. Li .
        37. Ji, X., Wang, C., Li, Y.: ‘A view-invariant action recognition based on multi-view space hidden Markov models’, Int. J. Humanoid Robot., 2014, 11, (01), p. 1450011.
        . Int. J. Humanoid Robot. , 1 , 1450011
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2016.0055
Loading

Related content

content/journals/10.1049/iet-cvi.2016.0055
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address