access icon free Multiple subsequence combination in human action recognition

Human action recognition is an active research area with applications in several domains such as visual surveillance, video retrieval and human–computer interaction. Current approaches assign action labels to video streams considering the whole video as a single sequence but, in some cases, the large variability between frames may lead to misclassifications. The authors propose a multiple subsequence combination (MSC) method that divides the video into several consecutive subsequences. It applies part-based and bag of visual words approaches to classify each subsequence. Then, it combines subsequence labels to assign an action label to the video. The proposed approach was tested on the KTH, UCF sports, Youtube and Robo-Kitchen datasets, which have large differences in terms of video length, object appearance and pose, object scale, viewpoint, background, as well as number, type and complexity of actions performed. Two main results were achieved. First, the MSC approach shows better performances compared to classify the video as a whole, even when few subsequences are used. Second, the approach is robust and stable since, for each dataset, its performances are comparable to the part-based approach at the state-of-the-art.

Inspec keywords: image sequences; pose estimation; video signal processing; object recognition; computational complexity

Other keywords: actions complexity; Robo-Kitchen datasets; MSC; video streams; UCF sports; bag of visual words approaches; consecutive subsequences; multiple subsequence combination; object appearance; object pose; KTH; part-based approaches; Youtube; human action recognition

Subjects: Computational complexity; Image recognition; Computer vision and image processing techniques; Video signal processing

References

    1. 1)
      • 8. Chen, M.Y., Hauptmann, A.: ‘MoSIFT: Recognizing human actions in surveillance videos’ (Carnegie Mellon University, 2009), CMU-CS-09-161.
    2. 2)
      • 25. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: ‘On combining classifiers’, IEEE Trans. Patt. Anal. Mach. Intell., 1998, 20, (3), pp. 226239 (doi: 10.1109/34.667881).
    3. 3)
      • 2. Jhuang, H., Serre, T., Wolf, L., Poggio, T.: ‘A biologically inspired system for action recognition’. IEEE Int. Conf. on Computer Vision, 2007, pp. 18.
    4. 4)
      • 11. Liu, J., Shah, M.: ‘Learning human actions via information maximization’. IEEE Conf. on Computer Vision and Pattern Recognition, 2008, pp. 18.
    5. 5)
      • 30. Tax, D.M.J., Van Breukelen, M., Duin, R.P.W., Kittler, J.: ‘Combining multiple classifiers by averaging or by multiplying?’, Patt. Recogn., 2000, 33, (9), pp. 14751485 (doi: 10.1016/S0031-3203(99)00138-7).
    6. 6)
      • 35. Gao, Z., Chen, M.Y., Hauptmann, A.G., Cai, A.: ‘Comparing evaluation protocols on the KTH dataset’. Human Behavior Understanding, 2010, pp. 88100.
    7. 7)
      • 22. Liu, J., Luo, J., Shah, M.: ‘Recognizing realistic actions from videos ‘in the wild’. IEEE Conf. on Computer Vision and Pattern Recognition, 2009, pp. 19962003.
    8. 8)
      • 34. Kuncheva, L.I.: ‘A theoretical study on six classifier fusion strategies’, IEEE Trans. Patt. Anal. Mach. Intell., 2002, 24, (2), pp. 281286 (doi: 10.1109/34.982906).
    9. 9)
      • 12. Niebles, J.C., Wang, H., Fei-Fei, L.: ‘Unsupervised learning of human action categories using spatial-temporal words’, Int. J. Comput. Vis., 2008, 79, (3), pp. 299318 (doi: 10.1007/s11263-007-0122-4).
    10. 10)
      • 21. Rodriguez, M.D., Ahmed, J., Shah, M.: ‘Action MACH: a spatio-temporal maximum average correlation height filter for action recognition’. IEEE Conf. Computer Vision and Pattern Recognition, 2008, pp. 18.
    11. 11)
      • 3. Niebles, J.C., Fei-Fei, L.: ‘A hierarchical model of shape and appearance for human action classification’. IEEE Conf. on Computer Vision and Pattern Recognition, 2007, pp. 18.
    12. 12)
      • 29. Laptev, I.: ‘On space–time interest points’, Int. J. Comput. Vis., 2005, 64, (2), pp. 107123 (doi: 10.1007/s11263-005-1838-7).
    13. 13)
      • 20. Slonim, N., Tishby, N.: ‘Agglomerative information bottleneck’. NIPS-12, 1999, pp. 617623.
    14. 14)
      • 18. Poppe, R.: ‘A survey on vision-based human action recognition’, Image Vis. Comput., 2010, 28, (6), pp. 976990 (doi: 10.1016/j.imavis.2009.11.014).
    15. 15)
      • 16. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: ‘Action recognition by dense trajectories’. IEEE Conf. on Computer Vision and Pattern Recognition, 2011, pp. 31693176.
    16. 16)
      • 1. Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: ‘Actions as space–time shapes’, IEEE Trans. Patt. Anal. Mach. Intell., 2007, 29, (12), pp. 22472253 (doi: 10.1109/TPAMI.2007.70711).
    17. 17)
      • 10. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: ‘Learning realistic human actions from movies’. IEEE Conf. on Computer Vision and Pattern Recognition, 2008, pp. 18.
    18. 18)
      • 24. Ho, T.K., Hull, J.J., Srihari, S.N.: ‘Decision combination in multiple classifier systems’, IEEE Trans. Patt. Anal. Mach. Intell., 1994, 16, (1), pp. 6675 (doi: 10.1109/34.273716).
    19. 19)
      • 15. Ta, A.P., Wolf, C., Lavoue, G., Baskurt, A., Jolion, J.M.: ‘Pairwise features for human action recognition’. Int. Conf. on Pattern Recognition, 2010, pp. 32243227.
    20. 20)
      • 6. Bregonzio, M., Gong, S., Xiang, T.: ‘Recognising action as clouds of space–time interest points’. IEEE Conf. on Computer Vision and Pattern Recognition, 2009, pp. 19481955.
    21. 21)
      • 5. Yilmaz, A., Shah, M.: ‘Actions sketch: a novel action representation’. IEEE Conf. on Computer Vision and Pattern Recognition, 2005, pp. 984989.
    22. 22)
      • 9. Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: ‘Behavior recognition via sparse spatio-temporal features’. IEEE Int. Workshop Visual Surveillance Performance Evaluation Tracking Surveillance, 2005, pp. 6572.
    23. 23)
      • 14. Scovanner, P., Ali, S., Shah, M.: ‘A three-dimensional SIFT descriptor and its application to action recognition’. Int. Conf. on Multimedia, 2007, pp. 357360.
    24. 24)
      • 32. Wu, T.F., Lin, C.J., Weng, R.C.: ‘Probability estimates for multi-class classification by pairwise coupling’, J. Mach. Learn. Res., 2004, 5, pp. 9751005.
    25. 25)
      • 4. Schindler, K., van Gool, L.: ‘Action snippets: how many frames does human action recognition require?’. IEEE Conf. on Computer Vision and Pattern Recognition, 2008, pp. 18.
    26. 26)
      • 31. Chang, C.C., Lin, C.J.: ‘LIBSVM: a library for support vector machines’, ACM Trans. Intell. Syst. Technol., 2011, 2, pp. 27:127:27 (doi: 10.1145/1961189.1961199).
    27. 27)
      • 19. Turaga, P., Chellappa, R., Subrahmanian, V.S.: ‘Machine recognition of human activities: a survey’, IEEE Trans. Circuits, Syst. Video Technol., 2008, 18, (11), pp. 14731488 (doi: 10.1109/TCSVT.2008.2005594).
    28. 28)
      • 17. Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: ‘Visual categorization with bags of keypoints’. Workshop on Statistical Learning in Computer Vision, ECCV, 2004, pp. 122.
    29. 29)
      • 13. Schuldt, C., Laptev, I., Caputo, B.: ‘Recognizing human actions: a local SVM approach’. IEEE Int. Conf. on Pattern Recognition, 2004, pp. 3236.
    30. 30)
      • 28. Lucas, B., Kanade, T.: ‘A three-dimensional SIFT descriptor and its application to action recognition’. Int. Joint Conf. on Artificial Intelligence, 1981, pp. 674679.
    31. 31)
      • 23. Rybok, L., Friedberger, S., Hanebeck, U.D., Stiefelhagen, R.: ‘The KIT Robo-Kitchen data set for the evaluation of view-based activity recognition systems’. Proc. 2011 11th IEEE-RAS Int. Conf. on IEEE Humanoid Robots (Humanoids), 2011, pp. 128133.
    32. 32)
      • 33. Brown, G., Wyatt, J., Harris, R., Yao, X.: ‘Diversity creation methods: a survey and categorisation’, Inf. Fusion, 2005, 6, (1), pp. 520 (doi: 10.1016/j.inffus.2004.04.004).
    33. 33)
      • 26. Kuncheva, L.I., Bezdek, J.C., Duin, R.P.W.: ‘Decision templates for multiple classifier fusion: an experimental comparison’, Patt. Recogn., 2001, 34, (2), pp. 299314 (doi: 10.1016/S0031-3203(99)00223-X).
    34. 34)
      • 27. Lowe, D.G.: ‘An iterative image registration technique with an application to stereo vision’, Int. J. Comput. Vis., 2004, 60, (2), pp. 91110 (doi: 10.1023/B:VISI.0000029664.99615.94).
    35. 35)
      • 7. Chakraborty, B., Holte, M.B., Moeslund, T.B., Gonzalez, J., Roca, F.S.: ‘A selective spatio-temporal interest point detector for human action recognition in complex scenes’. IEEE Int. Conf. on Computer Vision, 2011, pp. 17761783.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2013.0015
Loading

Related content

content/journals/10.1049/iet-cvi.2013.0015
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading