Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Recognising human interaction from videos by a discriminative model

This study addresses the problem of recognising human interactions between two people. The main difficulties lie in the partial occlusion of body parts and the motion ambiguity in interactions. The authors observed that the interdependencies existing at both the action level and the body part level can greatly help disambiguate similar individual movements and facilitate human interaction recognition. Accordingly, they proposed a novel discriminative method, which model the action of each person by a large-scale global feature and local body part features, to capture such interdependencies for recognising interaction of two people. A variant of multi-class Adaboost method is proposed to automatically discover class-specific discriminative three-dimensional body parts. The proposed approach is tested on the authors newly introduced BIT-interaction dataset and the UT-interaction dataset. The results show that their proposed model is quite effective in recognising human interactions.

References

    1. 1)
    2. 2)
    3. 3)
    4. 4)
    5. 5)
    6. 6)
    7. 7)
    8. 8)
    9. 9)
      • 16. Choi, W., Shahid, K., Savarese, S.: ‘Learning context for collective activity recognition’. CVPR, 2011.
    10. 10)
      • 17. Laptev, I., Lindeberg, T.: ‘Space-time interest points’. Ninth IEEE Int. Conf. Computer Vision, 2003, pp. 432439.
    11. 11)
      • 31. Yu, T.-H., Kim, T.-K., Cipolla, R.: ‘Real-time action recognition by spatiotemporal semantic and structural forests’. BMVC, 2010.
    12. 12)
      • 2. Liu, J., Ali, S., Shah, M.: ‘Recognizing human actions using multiple features’. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
    13. 13)
      • 19. Niebles, J.C., Wang, H., Fei-Fei, L.: ‘Unsupervised learning of human action categories using spatial-temporal words’, Int. J. Comput. Vis., 2008, 79, (3), pp. 299318 (doi: 10.1007/s11263-007-0122-4).
    14. 14)
      • 12. Gupta, A., Kembhavi, A., Davis, L.: ‘Observing human–object interactions: using spatial and functional compatibility for recognition’, IEEE Trans. Patt. Anal. Mach. Intell., 2009, 31, (10), pp. 17751789 (doi: 10.1109/TPAMI.2009.83).
    15. 15)
      • 25. Kong, Y., Zhang, X., Hu, W., Jia, Y.: ‘Adaptive learning codebook for action recognition’, Patt. Recognit. Lett., 2011, 32, (8), pp. 11781186 (doi: 10.1016/j.patrec.2011.03.006).
    16. 16)
      • 10. Filipovych, R., Ribeiro, E.: ‘Recognizing primitive interactions by exploring actor-object states’. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, 2008, pp. 17.
    17. 17)
      • 29. Kong, Y., Jia, Y., Fu, Y.: ‘Learning human interaction by interactive phrases’. ECCV, 2012.
    18. 18)
      • 5. Kong, Y., Jia, Y.: ‘A hierarchical model for human interaction recognition’. Int. Conf. Multimedia and Expo, 2012, vol. 2, pp. 913.
    19. 19)
      • 24. Efros, A., Berg, A., Mori, G., Malik, J.: ‘Recognizing action at a distance’. 2003 Proc. Ninth IEEE Int. Conf. Computer Vision, 2003, vol. 2, pp. 726733.
    20. 20)
      • 23. Lucas, B.D., Kanade, T.: ‘An iterative image registration technique with an application to stereo vision’. IJCAI81, 1981, pp. 674679.
    21. 21)
      • 28. McCallum, A.: ‘Efficiently inducing features of conditional random fields’. Proc. 19th Conf. Uncertainty in Artificial Intelligence, UAI'03, 2003, pp. 403410.
    22. 22)
      • 4. Niebles, J.C., Chen, C.-W., Fei-Fei, L.: ‘Modeling temporal structure of decomposable motion segments for activity classification’. ECCV, 2010, vol. 6312.
    23. 23)
      • 15. Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: ‘Learning realistic human actions from movies’. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
    24. 24)
      • 14. Gong, S., Xiang, T.: ‘Recognition of group activities using dynamic probabilistic networks’. Ninth IEEE Int. Conf. Computer Vision, 2003, vol. 2, pp. 742749.
    25. 25)
      • 7. Ryoo, M., Aggarwal, J.: ‘Recognition of composite human activities through context-free grammar based representation’. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, 2006, vol. 2, pp. 17091718.
    26. 26)
      • 8. Ryoo, M., Aggarwal, J.: ‘Spatio-temporal relationship match: video structure comparison for recognition of complex human activities’. ICCV, 2009, pp. 15931600.
    27. 27)
      • 3. Wang, Y., Mori, G.: ‘Max-margin hidden conditional random fields for human action recognition’. IEEE Conf. Computer Vision and Pattern Recognition, 2009, pp. 872879.
    28. 28)
      • 9. Lan, T., Wang, Y., Yang, W., Mori, G.: ‘Beyond actions: discriminative models for contextual group activities’. NIPS, 2010.
    29. 29)
      • 30. Ryoo, M.S., Aggarwal, J.K.: ‘UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA), http://www.cvrc.ece.utexas.edu/SDHA2010/HumanInteraction.html, 2010.
    30. 30)
      • 18. Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: ‘Behavior recognition via sparse spatio-temporal features’. VS-PETS, 2005.
    31. 31)
      • 27. Lafferty, J., McCallum, A., Pereira, F.: ‘Conditional random fields: probabilistic models for segmenting and labeling sequence data’. ICML, 2001.
    32. 32)
      • 20. Wong, S.-F., Kim, T.-K., Cipolla, R.: ‘Learning motion categories using both semantic and structural information’. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
    33. 33)
      • 21. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: ‘Object detection with discriminatively trained part-based models’, IEEE Trans. Patt. Anal. Mach. Intell., 2010, 32, pp. 16271645 (doi: 10.1109/TPAMI.2009.167).
    34. 34)
      • 1. Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: ‘Actions as space time shapes’, IEEE Trans. Patt. Anal. Mach. Intell., 2007, 29, (12), pp. 22472253 (doi: 10.1109/TPAMI.2007.70711).
    35. 35)
      • 11. Kjellström, H., Romero, J., Martínez, D., Kragić, D.: ‘Simultaneous visual recognition of manipulation actions and manipulated objects’. ECCV, 2008.
    36. 36)
      • 13. Yao, B., Fei-Fei, L.: ‘Modeling mutual context of object and human pose in human–object interaction activities’. IEEE Conf. Computer Vision and Pattern Recognition, 2010, pp. 1724.
    37. 37)
      • 26. Zhu, J., Zou, H., Rosset, S., Hastie, T.: ‘Multi-class Adaboost’, Stat. Interf., 2009, 2, pp. 349360 (doi: 10.4310/SII.2009.v2.n3.a8).
    38. 38)
      • 22. Quattoni, A., Wang, S., Morency, L.-P., Collins, M., Darrell, T.: ‘Hidden conditional random fields’, IEEE Trans. Patt. Anal. Mach. Intell., 2007, 29, pp. 18481852 (doi: 10.1109/TPAMI.2007.1124).
    39. 39)
      • 6. Oliver, N.M., Rosario, B., Pentland, A.P.: ‘A Bayesian computer vision system for modeling human interactions’, IEEE Trans. Patt. Anal. Mach. Intell., 2000, 22, (8), pp. 831843 (doi: 10.1109/34.868684).
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2013.0042
Loading

Related content

content/journals/10.1049/iet-cvi.2013.0042
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address