access icon free Probability-based method for boosting human action recognition using scene context

In this study, the authors investigate the possibility of boosting action recognition performance by exploiting the associated scene context. Towards this end, the authors model a scene as a mid-level ‘middle layer’ in order to bridge action descriptors and action categories. This is achieved via a scene topic model, in which hybrid visual descriptors, including spatial–temporal action features and scene descriptors, are first extracted from a video sequence. Then, the authors learn a joint probability distribution between scene and action using a naive Bayes nearest neighbour algorithm, which is adopted to jointly infer the action categories online by combining off-the-shelf action recognition algorithms. The authors demonstrate the advantages of their approach by comparing it with state-of-the-art approaches using several action recognition benchmarks.

Inspec keywords: image sequences; video signal processing; feature extraction; Bayes methods; image motion analysis

Other keywords: joint probability distribution; video sequence; naive Bayes nearest neighbour algorithm; boosting human action recognition; hybrid visual descriptors; spatial temporal action features; bridge action descriptors; probability based method; action categories; associated scene context

Subjects: Other topics in statistics; Optical, image and video signal processing; Video signal processing; Computer vision and image processing techniques; Other topics in statistics

References

    1. 1)
      • 8. Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition’, Spat. Coherence Vis. Motion Anal., 2006, 3667, pp. 91103.
    2. 2)
      • 13. Wang, Y., Mori, G.: ‘Human action recognition by semilatent topic models’, IEEE Trans. Pattern Anal. Mach. Intell., 2009, 31, (10), pp. 17621774.
    3. 3)
      • 22. Marszalek, M., Laptev, I., Schmid, C.: ‘Actions in context’. 2009 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009, 2009, Miami, FL, USA, 20–25 June 2009.
    4. 4)
      • 30. Rodriguez, M.D., Ahmed, J., Shah, M.: ‘Action MACH: A spatio-temporal maximum average correlation height filter for action recognition’. 26th IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, 2008, Anchorage, AK, USA, 23–28 June 2008.
    5. 5)
      • 18. Choi, J., Jeon, W.J., Lee, S.-C.: ‘Spatio-temporal pyramid matching for sports videos’. 1st Int. ACM Conf. on Multimedia Information Retrieval, MIR2008, Co-Located with the 2008 ACM Int. Conf. on Multimedia, MM'08, 2008, Vancouver, BC, Canada, 30–31 August 2008.
    6. 6)
      • 35. Shao, L., Zhen, X.T., Tao, D.C., et al: ‘Spatio-temporal Laplacian pyramid coding for action recognition’, IEEE Trans. Cybern., 2014, 44, (6), pp. 817827.
    7. 7)
      • 33. Liu, J.E., Yang, Y., Saleemi, I., et al: ‘Learning semantic features for action recognition via diffusion maps’, Comput. Vis. Image Underst., 2012, 116, (3), pp. 361377.
    8. 8)
      • 28. Zhang, H.B., Li, S.A., Chen, S.Y., et al: ‘Adaptive photograph retrieval method’, Multimedia Tools Appl., 2014, 70, (3), pp. 21892209.
    9. 9)
      • 14. Weinland, D., Boyer, E.: Action recognition using exemplar-based embedding’. 26th IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, 2008, Anchorage, AK, USA, 23–28 June 2008.
    10. 10)
      • 31. Nguyen, T.V., Song, Z., Yan, S.C.: ‘STAP: Spatial–temporal attention-aware pooling for action recognition’, IEEE Trans. Circuits Syst. Video Technol., 2015, 25, (1), pp. 7786.
    11. 11)
      • 21. Burghouts, G.J., Schutte, K., ten Hove, R.J.M., et al: ‘Instantaneous threat detection based on a semantic representation of activities, zones and trajectories’, Signal Image Video Process., 2014, 8, (1), pp. 191200.
    12. 12)
      • 10. Wang, H., Ullah, M. M., Klaser, A., et al: ‘Evaluation of local spatio-temporal features for action recognition’. 2009 20th British Machine Vision Conf., BMVC 2009, 2009, London, UK, 7–10 September 2009.
    13. 13)
      • 1. Aggarwal, J.K., Ryoo, M.S.: ‘Human activity analysis: a review’, ACM Comput. Surv., 2011, 43, (3), pp. 194218.
    14. 14)
      • 20. Ikizler-Cinbis, N., Sclaroff, S.: ‘Object, scene and actions: combining multiple features for human action recognition’. 11th European Conf. on Computer Vision, ECCV 2010, 2010, Heraklion, Crete, Greece, 5–11 September 2010.
    15. 15)
      • 34. Le, Q.V., Zou, W.Y., Yeung, S.Y., et al: ‘Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis’. 2011 IEEE Conf. on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011.
    16. 16)
      • 17. Xu, D., Chang, S.-F.: ‘Visual event recognition in news video using kernel methods with multi-level temporal alignment’. 2007 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, CVPR'07, 2007, Minneapolis, MN, USA, 17–22 June 2007.
    17. 17)
      • 26. Wang, H., Schmid, C.: ‘Action recognition with improved trajectories’. 2013 14th IEEE Int. Conf. on Computer Vision, ICCV 2013, Sydney, NSW, Australia, 1–8 December 2013.
    18. 18)
      • 24. Jiang, Y.G., Li, Z.G., Chang, S.F.: ‘Modeling scene and object contexts for human action retrieval with few examples’, IEEE Trans. Circuits Syst. Video Technol., 2011, 21, (5), pp. 674681.
    19. 19)
      • 29. Liu, J., Luo, J., Shah, M.: ‘Recognizing realistic actions from videos in the Wild’. 2009 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009, Miami, FL, USA, 20–25 June 2009.
    20. 20)
      • 2. Poppe, R.: ‘A survey on vision-based human action recognition’, Image Vis. Comput., 2010, 28, (6), pp. 976990.
    21. 21)
      • 19. Wang, J., Liu, P., She, , Mary, F. H., et al: ‘Human action recognition based on pyramid histogram of oriented gradients’. 2011 IEEE Int. Conf. on Systems, Man, and Cybernetics, SMC 2011, 2011, Anchorage, AK, USA, 9–12 October 2011.
    22. 22)
      • 32. Zhang, Z., Liu, S., Han, L., et al: ‘Human action recognition using salient region detection in complex scenes’, in Mu, J., et al (Eds.): ‘The proceedings of the third international conference on communications, signal processing, and systems’ (Springer International Publishing, 2015), pp. 565572.
    23. 23)
      • 25. Niebles, J.C., Wang, H.C., Fei-Fei, L.: ‘Unsupervised learning of human action categories using spatial–temporal words’, Int. J. Comput. Vis., 2008, 79, (3), pp. 299318.
    24. 24)
      • 6. Ogata, T., Christmas, W., Kittler, J., et al: ‘Improving human activity detection by combining multi-dimensional motion descriptors with boosting’. 18th Int. Conf. on Pattern Recognition, ICPR 2006, 2006, Hong Kong, China, 20–24 August 2006.
    25. 25)
      • 4. Scovanner, P., Ali, S., Shah, M.: ‘A 3-dimensional SIFT descriptor and its application to action recognition’. 15th ACM Int. Conf. on Multimedia, MM'07’, 2007, Augsburg, Bavaria, Germany, 24–29 September 2007.
    26. 26)
      • 5. Klaser, A., Marszalek, M., Schmid, C.: ‘A spatio-temporal descriptor based on 3D-gradients’. 2008 19th British Machine Vision Conf., BMVC 2008, 2008, Leeds, UK, 1–4 September 2008.
    27. 27)
      • 23. Qu, W., Zhang, Y., Feng, S., et al: ‘Action-scene model for recognizing human actions from background in realistic videos’. 15th Int. Conf. on Web-Age Information Management, WAIM 2014, 2014, Macau, China, 16–18 June 2014.
    28. 28)
      • 3. Chakraborty, B., Holte, M. B., Moeslund, T. B., et al: ‘Selective spatio-temporal interest points’, Comput. Vis. Image Underst., 2012, 116, (3), pp. 396410.
    29. 29)
      • 9. Liu, J., Shah, M.: ‘Learning human actions via information maximization’. 26th IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, 2008, Anchorage, AK, USA, 23–28 June 2008.
    30. 30)
      • 27. Zhang, H.B., Li, S.Z., Su, S.Z., et al: ‘Selecting effective and discriminative spatio-temporal interest points for recognizing human action’, IEICE Trans. Inf. Syst., 2013, E96d, (8), pp. 17831792.
    31. 31)
      • 11. Yuan, J.S., Liu, Z.C., Wu, Y.: ‘Discriminative video pattern search for efficient action detection’, IEEE Trans. Pattern Anal. Mach. Intell., 2011, 33, (9) pp. 17281743.
    32. 32)
      • 12. Hsieh, J.W., Hsu, Y. T., Liao, H. Y. M., et al: ‘Video-based human movement analysis and its application to surveillance systems’, IEEE Trans. Multimed., 2008, 10, (3), pp. 372384.
    33. 33)
      • 15. Lazebnik, S., Schmid, C., Ponce, J.: ‘Beyond bags of features: spatial pyramid matching for recognizing natural scene categories’. 2006 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, CVPR 2006, 2006, New York, NY, USA, 17–22 June 2006.
    34. 34)
      • 7. Laptev, I., Marszalek, M., Schmid, C., et al: ‘Learning realistic human actions from movies’. 26th IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, 2008, Anchorage, AK, USA, 23–28 June 2008.
    35. 35)
      • 16. Bosch, A., Zisserman, A., Munoz, X.: ‘Representing shape with a spatial pyramid kernel’. 6th ACM Int. Conf. on Image and Video Retrieval, CIVR 2007, 2007, Amsterdam, Netherlands, 9–11 July 2007.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2015.0420
Loading

Related content

content/journals/10.1049/iet-cvi.2015.0420
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading