http://iet.metastore.ingenta.com
1887

Probability-based method for boosting human action recognition using scene context

Probability-based method for boosting human action recognition using scene context

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Computer Vision — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

In this study, the authors investigate the possibility of boosting action recognition performance by exploiting the associated scene context. Towards this end, the authors model a scene as a mid-level ‘middle layer’ in order to bridge action descriptors and action categories. This is achieved via a scene topic model, in which hybrid visual descriptors, including spatial–temporal action features and scene descriptors, are first extracted from a video sequence. Then, the authors learn a joint probability distribution between scene and action using a naive Bayes nearest neighbour algorithm, which is adopted to jointly infer the action categories online by combining off-the-shelf action recognition algorithms. The authors demonstrate the advantages of their approach by comparing it with state-of-the-art approaches using several action recognition benchmarks.

References

    1. 1)
      • 1. Aggarwal, J.K., Ryoo, M.S.: ‘Human activity analysis: a review’, ACM Comput. Surv., 2011, 43, (3), pp. 194218.
    2. 2)
      • 2. Poppe, R.: ‘A survey on vision-based human action recognition’, Image Vis. Comput., 2010, 28, (6), pp. 976990.
    3. 3)
      • 3. Chakraborty, B., Holte, M. B., Moeslund, T. B., et al: ‘Selective spatio-temporal interest points’, Comput. Vis. Image Underst., 2012, 116, (3), pp. 396410.
    4. 4)
      • 4. Scovanner, P., Ali, S., Shah, M.: ‘A 3-dimensional SIFT descriptor and its application to action recognition’. 15th ACM Int. Conf. on Multimedia, MM'07’, 2007, Augsburg, Bavaria, Germany, 24–29 September 2007.
    5. 5)
      • 5. Klaser, A., Marszalek, M., Schmid, C.: ‘A spatio-temporal descriptor based on 3D-gradients’. 2008 19th British Machine Vision Conf., BMVC 2008, 2008, Leeds, UK, 1–4 September 2008.
    6. 6)
      • 6. Ogata, T., Christmas, W., Kittler, J., et al: ‘Improving human activity detection by combining multi-dimensional motion descriptors with boosting’. 18th Int. Conf. on Pattern Recognition, ICPR 2006, 2006, Hong Kong, China, 20–24 August 2006.
    7. 7)
      • 7. Laptev, I., Marszalek, M., Schmid, C., et al: ‘Learning realistic human actions from movies’. 26th IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, 2008, Anchorage, AK, USA, 23–28 June 2008.
    8. 8)
      • 8. Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition’, Spat. Coherence Vis. Motion Anal., 2006, 3667, pp. 91103.
    9. 9)
      • 9. Liu, J., Shah, M.: ‘Learning human actions via information maximization’. 26th IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, 2008, Anchorage, AK, USA, 23–28 June 2008.
    10. 10)
      • 10. Wang, H., Ullah, M. M., Klaser, A., et al: ‘Evaluation of local spatio-temporal features for action recognition’. 2009 20th British Machine Vision Conf., BMVC 2009, 2009, London, UK, 7–10 September 2009.
    11. 11)
      • 11. Yuan, J.S., Liu, Z.C., Wu, Y.: ‘Discriminative video pattern search for efficient action detection’, IEEE Trans. Pattern Anal. Mach. Intell., 2011, 33, (9) pp. 17281743.
    12. 12)
      • 12. Hsieh, J.W., Hsu, Y. T., Liao, H. Y. M., et al: ‘Video-based human movement analysis and its application to surveillance systems’, IEEE Trans. Multimed., 2008, 10, (3), pp. 372384.
    13. 13)
      • 13. Wang, Y., Mori, G.: ‘Human action recognition by semilatent topic models’, IEEE Trans. Pattern Anal. Mach. Intell., 2009, 31, (10), pp. 17621774.
    14. 14)
      • 14. Weinland, D., Boyer, E.: Action recognition using exemplar-based embedding’. 26th IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, 2008, Anchorage, AK, USA, 23–28 June 2008.
    15. 15)
      • 15. Lazebnik, S., Schmid, C., Ponce, J.: ‘Beyond bags of features: spatial pyramid matching for recognizing natural scene categories’. 2006 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, CVPR 2006, 2006, New York, NY, USA, 17–22 June 2006.
    16. 16)
      • 16. Bosch, A., Zisserman, A., Munoz, X.: ‘Representing shape with a spatial pyramid kernel’. 6th ACM Int. Conf. on Image and Video Retrieval, CIVR 2007, 2007, Amsterdam, Netherlands, 9–11 July 2007.
    17. 17)
      • 17. Xu, D., Chang, S.-F.: ‘Visual event recognition in news video using kernel methods with multi-level temporal alignment’. 2007 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, CVPR'07, 2007, Minneapolis, MN, USA, 17–22 June 2007.
    18. 18)
      • 18. Choi, J., Jeon, W.J., Lee, S.-C.: ‘Spatio-temporal pyramid matching for sports videos’. 1st Int. ACM Conf. on Multimedia Information Retrieval, MIR2008, Co-Located with the 2008 ACM Int. Conf. on Multimedia, MM'08, 2008, Vancouver, BC, Canada, 30–31 August 2008.
    19. 19)
      • 19. Wang, J., Liu, P., She, , Mary, F. H., et al: ‘Human action recognition based on pyramid histogram of oriented gradients’. 2011 IEEE Int. Conf. on Systems, Man, and Cybernetics, SMC 2011, 2011, Anchorage, AK, USA, 9–12 October 2011.
    20. 20)
      • 20. Ikizler-Cinbis, N., Sclaroff, S.: ‘Object, scene and actions: combining multiple features for human action recognition’. 11th European Conf. on Computer Vision, ECCV 2010, 2010, Heraklion, Crete, Greece, 5–11 September 2010.
    21. 21)
      • 21. Burghouts, G.J., Schutte, K., ten Hove, R.J.M., et al: ‘Instantaneous threat detection based on a semantic representation of activities, zones and trajectories’, Signal Image Video Process., 2014, 8, (1), pp. 191200.
    22. 22)
      • 22. Marszalek, M., Laptev, I., Schmid, C.: ‘Actions in context’. 2009 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009, 2009, Miami, FL, USA, 20–25 June 2009.
    23. 23)
      • 23. Qu, W., Zhang, Y., Feng, S., et al: ‘Action-scene model for recognizing human actions from background in realistic videos’. 15th Int. Conf. on Web-Age Information Management, WAIM 2014, 2014, Macau, China, 16–18 June 2014.
    24. 24)
      • 24. Jiang, Y.G., Li, Z.G., Chang, S.F.: ‘Modeling scene and object contexts for human action retrieval with few examples’, IEEE Trans. Circuits Syst. Video Technol., 2011, 21, (5), pp. 674681.
    25. 25)
      • 25. Niebles, J.C., Wang, H.C., Fei-Fei, L.: ‘Unsupervised learning of human action categories using spatial–temporal words’, Int. J. Comput. Vis., 2008, 79, (3), pp. 299318.
    26. 26)
      • 26. Wang, H., Schmid, C.: ‘Action recognition with improved trajectories’. 2013 14th IEEE Int. Conf. on Computer Vision, ICCV 2013, Sydney, NSW, Australia, 1–8 December 2013.
    27. 27)
      • 27. Zhang, H.B., Li, S.Z., Su, S.Z., et al: ‘Selecting effective and discriminative spatio-temporal interest points for recognizing human action’, IEICE Trans. Inf. Syst., 2013, E96d, (8), pp. 17831792.
    28. 28)
      • 28. Zhang, H.B., Li, S.A., Chen, S.Y., et al: ‘Adaptive photograph retrieval method’, Multimedia Tools Appl., 2014, 70, (3), pp. 21892209.
    29. 29)
      • 29. Liu, J., Luo, J., Shah, M.: ‘Recognizing realistic actions from videos in the Wild’. 2009 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009, Miami, FL, USA, 20–25 June 2009.
    30. 30)
      • 30. Rodriguez, M.D., Ahmed, J., Shah, M.: ‘Action MACH: A spatio-temporal maximum average correlation height filter for action recognition’. 26th IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, 2008, Anchorage, AK, USA, 23–28 June 2008.
    31. 31)
      • 31. Nguyen, T.V., Song, Z., Yan, S.C.: ‘STAP: Spatial–temporal attention-aware pooling for action recognition’, IEEE Trans. Circuits Syst. Video Technol., 2015, 25, (1), pp. 7786.
    32. 32)
      • 32. Zhang, Z., Liu, S., Han, L., et al: ‘Human action recognition using salient region detection in complex scenes’, in Mu, J., et al (Eds.): ‘The proceedings of the third international conference on communications, signal processing, and systems’ (Springer International Publishing, 2015), pp. 565572.
    33. 33)
      • 33. Liu, J.E., Yang, Y., Saleemi, I., et al: ‘Learning semantic features for action recognition via diffusion maps’, Comput. Vis. Image Underst., 2012, 116, (3), pp. 361377.
    34. 34)
      • 34. Le, Q.V., Zou, W.Y., Yeung, S.Y., et al: ‘Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis’. 2011 IEEE Conf. on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011.
    35. 35)
      • 35. Shao, L., Zhen, X.T., Tao, D.C., et al: ‘Spatio-temporal Laplacian pyramid coding for action recognition’, IEEE Trans. Cybern., 2014, 44, (6), pp. 817827.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2015.0420
Loading

Related content

content/journals/10.1049/iet-cvi.2015.0420
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address