Recognising human interaction from videos by a discriminative model
- Author(s): Yu Kong 1 ; Wei Liang 1 ; Zhen Dong 1 ; Yunde Jia 1
-
-
View affiliations
-
Affiliations:
1:
Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing 100081, People's Republic of China
-
Affiliations:
1:
Beijing Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing 100081, People's Republic of China
- Source:
Volume 8, Issue 4,
August 2014,
p.
277 – 286
DOI: 10.1049/iet-cvi.2013.0042 , Print ISSN 1751-9632, Online ISSN 1751-9640
This study addresses the problem of recognising human interactions between two people. The main difficulties lie in the partial occlusion of body parts and the motion ambiguity in interactions. The authors observed that the interdependencies existing at both the action level and the body part level can greatly help disambiguate similar individual movements and facilitate human interaction recognition. Accordingly, they proposed a novel discriminative method, which model the action of each person by a large-scale global feature and local body part features, to capture such interdependencies for recognising interaction of two people. A variant of multi-class Adaboost method is proposed to automatically discover class-specific discriminative three-dimensional body parts. The proposed approach is tested on the authors newly introduced BIT-interaction dataset and the UT-interaction dataset. The results show that their proposed model is quite effective in recognising human interactions.
Inspec keywords: object recognition; learning (artificial intelligence); image motion analysis; video signal processing; statistical analysis; hidden feature removal; feature extraction
Other keywords: 3D automatic class specific discriminative body part discovery; UT-interaction dataset; body part level; discriminative model; BIT-interaction dataset; video processing; local body part feature; partial occlusion; human interaction recognition; action level; motion ambiguity; multiclass Adaboost method; global feature
Subjects: Knowledge engineering techniques; Other topics in statistics; Computer vision and image processing techniques; Image recognition; Other topics in statistics; Video signal processing
References
-
-
1)
- P.F. Felzenszwalb , R.B. Girshick , D. McAllester , D. Ramanan . Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. , 9 , 1627 - 1645
-
2)
-
6. Oliver, N.M., Rosario, B., Pentland, A.P.: ‘A Bayesian computer vision system for modeling human interactions’, IEEE Trans. Patt. Anal. Mach. Intell., 2000, 22, (8), pp. 831–843 (doi: 10.1109/34.868684).
-
-
3)
- J. Niebles , H. Wang , L. Fei-Fei . Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. , 3 , 299 - 318
-
4)
-
12. Gupta, A., Kembhavi, A., Davis, L.: ‘Observing human–object interactions: using spatial and functional compatibility for recognition’, IEEE Trans. Patt. Anal. Mach. Intell., 2009, 31, (10), pp. 1775–1789 (doi: 10.1109/TPAMI.2009.83).
-
-
5)
- L. Gorelick , M. Blank , E. Shechtman , M. Irani , R. Basri . Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. , 12 , 2247 - 2253
-
6)
- A. Quattoni , S. Wang , L. Morency , M. Collins , T. Darrell . Hidden conditional random fields. IEEE Trans. Pattern Anal. Mach. Intell. , 10 , 1848 - 1853
-
7)
-
14. Kong, Y., Zhang, X., Hu, W., Jia, Y.: ‘Adaptive learning codebook for action recognition’, Pattern Recognit. Lett., 2011, 32, (8), pp. 1178–1186 (doi: 10.1016/j.patrec.2011.03.006).
-
-
8)
-
26. Zhu, J., Zou, H., Rosset, S., Hastie, T.: ‘Multi-class Adaboost’, Stat. Interf., 2009, 2, pp. 349–360 (doi: 10.4310/SII.2009.v2.n3.a8).
-
-
9)
-
16. Choi, W., Shahid, K., Savarese, S.: ‘Learning context for collective activity recognition’. CVPR, 2011.
-
-
10)
-
17. Laptev, I., Lindeberg, T.: ‘Space-time interest points’. Ninth IEEE Int. Conf. Computer Vision, 2003, pp. 432–439.
-
-
11)
-
31. Yu, T.-H., Kim, T.-K., Cipolla, R.: ‘Real-time action recognition by spatiotemporal semantic and structural forests’. BMVC, 2010.
-
-
12)
-
2. Liu, J., Ali, S., Shah, M.: ‘Recognizing human actions using multiple features’. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
-
-
13)
-
19. Niebles, J.C., Wang, H., Fei-Fei, L.: ‘Unsupervised learning of human action categories using spatial-temporal words’, Int. J. Comput. Vis., 2008, 79, (3), pp. 299–318 (doi: 10.1007/s11263-007-0122-4).
-
-
14)
-
12. Gupta, A., Kembhavi, A., Davis, L.: ‘Observing human–object interactions: using spatial and functional compatibility for recognition’, IEEE Trans. Patt. Anal. Mach. Intell., 2009, 31, (10), pp. 1775–1789 (doi: 10.1109/TPAMI.2009.83).
-
-
15)
-
25. Kong, Y., Zhang, X., Hu, W., Jia, Y.: ‘Adaptive learning codebook for action recognition’, Patt. Recognit. Lett., 2011, 32, (8), pp. 1178–1186 (doi: 10.1016/j.patrec.2011.03.006).
-
-
16)
-
10. Filipovych, R., Ribeiro, E.: ‘Recognizing primitive interactions by exploring actor-object states’. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, 2008, pp. 1–7.
-
-
17)
-
29. Kong, Y., Jia, Y., Fu, Y.: ‘Learning human interaction by interactive phrases’. ECCV, 2012.
-
-
18)
-
5. Kong, Y., Jia, Y.: ‘A hierarchical model for human interaction recognition’. Int. Conf. Multimedia and Expo, 2012, vol. 2, pp. 9–13.
-
-
19)
-
24. Efros, A., Berg, A., Mori, G., Malik, J.: ‘Recognizing action at a distance’. 2003 Proc. Ninth IEEE Int. Conf. Computer Vision, 2003, vol. 2, pp. 726–733.
-
-
20)
-
23. Lucas, B.D., Kanade, T.: ‘An iterative image registration technique with an application to stereo vision’. IJCAI81, 1981, pp. 674–679.
-
-
21)
-
28. McCallum, A.: ‘Efficiently inducing features of conditional random fields’. Proc. 19th Conf. Uncertainty in Artificial Intelligence, UAI'03, 2003, pp. 403–410.
-
-
22)
-
4. Niebles, J.C., Chen, C.-W., Fei-Fei, L.: ‘Modeling temporal structure of decomposable motion segments for activity classification’. ECCV, 2010, vol. 6312.
-
-
23)
-
15. Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: ‘Learning realistic human actions from movies’. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
-
-
24)
-
14. Gong, S., Xiang, T.: ‘Recognition of group activities using dynamic probabilistic networks’. Ninth IEEE Int. Conf. Computer Vision, 2003, vol. 2, pp. 742–749.
-
-
25)
-
7. Ryoo, M., Aggarwal, J.: ‘Recognition of composite human activities through context-free grammar based representation’. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, 2006, vol. 2, pp. 1709–1718.
-
-
26)
-
8. Ryoo, M., Aggarwal, J.: ‘Spatio-temporal relationship match: video structure comparison for recognition of complex human activities’. ICCV, 2009, pp. 1593–1600.
-
-
27)
-
3. Wang, Y., Mori, G.: ‘Max-margin hidden conditional random fields for human action recognition’. IEEE Conf. Computer Vision and Pattern Recognition, 2009, pp. 872–879.
-
-
28)
-
9. Lan, T., Wang, Y., Yang, W., Mori, G.: ‘Beyond actions: discriminative models for contextual group activities’. NIPS, 2010.
-
-
29)
-
30. Ryoo, M.S., Aggarwal, J.K.: ‘UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA), http://www.cvrc.ece.utexas.edu/SDHA2010/HumanInteraction.html, 2010.
-
-
30)
-
18. Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: ‘Behavior recognition via sparse spatio-temporal features’. VS-PETS, 2005.
-
-
31)
-
27. Lafferty, J., McCallum, A., Pereira, F.: ‘Conditional random fields: probabilistic models for segmenting and labeling sequence data’. ICML, 2001.
-
-
32)
-
20. Wong, S.-F., Kim, T.-K., Cipolla, R.: ‘Learning motion categories using both semantic and structural information’. IEEE Conf. Computer Vision and Pattern Recognition, 2007.
-
-
33)
-
21. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: ‘Object detection with discriminatively trained part-based models’, IEEE Trans. Patt. Anal. Mach. Intell., 2010, 32, pp. 1627–1645 (doi: 10.1109/TPAMI.2009.167).
-
-
34)
-
1. Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: ‘Actions as space time shapes’, IEEE Trans. Patt. Anal. Mach. Intell., 2007, 29, (12), pp. 2247–2253 (doi: 10.1109/TPAMI.2007.70711).
-
-
35)
-
11. Kjellström, H., Romero, J., Martínez, D., Kragić, D.: ‘Simultaneous visual recognition of manipulation actions and manipulated objects’. ECCV, 2008.
-
-
36)
-
13. Yao, B., Fei-Fei, L.: ‘Modeling mutual context of object and human pose in human–object interaction activities’. IEEE Conf. Computer Vision and Pattern Recognition, 2010, pp. 17–24.
-
-
37)
-
26. Zhu, J., Zou, H., Rosset, S., Hastie, T.: ‘Multi-class Adaboost’, Stat. Interf., 2009, 2, pp. 349–360 (doi: 10.4310/SII.2009.v2.n3.a8).
-
-
38)
-
22. Quattoni, A., Wang, S., Morency, L.-P., Collins, M., Darrell, T.: ‘Hidden conditional random fields’, IEEE Trans. Patt. Anal. Mach. Intell., 2007, 29, pp. 1848–1852 (doi: 10.1109/TPAMI.2007.1124).
-
-
39)
-
6. Oliver, N.M., Rosario, B., Pentland, A.P.: ‘A Bayesian computer vision system for modeling human interactions’, IEEE Trans. Patt. Anal. Mach. Intell., 2000, 22, (8), pp. 831–843 (doi: 10.1109/34.868684).
-
-
1)