Dimensionality reduction of Fisher vectors for human action recognition

Dimensionality reduction of Fisher vectors for human action recognition

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Computer Vision — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Automatic analysis of human behaviour in large collections of videos is rapidly gaining interest, even more so with the advent of file sharing sites such as YouTube. From one perspective, it can be observed that the size of feature vectors used for human action recognition from videos has been increasing enormously in the last five years, in the order of ∼100–500K. One possible reason might be the growing number of action classes/videos and hence the requirement of discriminating features (that usually end up to be higher-dimensional for larger databases). In this study, the authors review and investigate feature projection as a means to reduce the dimensions of the high-dimensional feature vectors and show their effectiveness in terms of performance. They hypothesise that dimensionality reduction techniques often unearth latent structures in the feature space and are effective in applications such as the fusion of high-dimensional features of different types; and action recognition in untrimmed videos. They conduct all the authors’ experiments using a Bag-of-Words framework for consistency and results are presented on large class benchmark databases such as the HMDB51 and UCF101 datasets.


    1. 1)
      • 1. Schuldt, C., Laptev, I., Caputo, B.: ‘Recognizing human actions: a local svm approach’. Int. Conf. on Pattern Recognition (ICPR), 2004, vol. 3, pp. 3236.
    2. 2)
      • 2. Blank, M., Gorelick, L., Shechtman, E., et al: ‘Actions as space-time shapes’. Int. Conf. on Computer Vision (ICCV), 2005, pp. 13951402.
    3. 3)
      • 3. Sigal, L., Balan, A.O., Black, M.J.: ‘Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion’, Int. J. Comput. Vis., 2010, 87, (1–2), pp. 427.
    4. 4)
      • 4. Rodriguez, M., Ahmed, J., Shah, M.: ‘Action mach a spatio-temporal maximum average correlation height filter for action recognition’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 18.
    5. 5)
      • 5. Marszalek, M., Laptev, I., Schmid, C.: ‘Actions in context’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 29292936.
    6. 6)
      • 6. Niebles, J.C., Chen, C.-W., Fei-Fei, L.: ‘Modeling temporal structure of decomposable motion segments for activity classification’. European Conf. on Computer vision (ECCV), 2010, pp. 392405.
    7. 7)
      • 7. Soomro, K., Zamir, A.R., Shah, M.: ‘UCF101: a dataset of 101 human action classes from videos in the wild’. CRCV-TR-12-01, 2012.
    8. 8)
      • 8. Andriluka, M., Pishchulin, L., Gehler, P., et al: ‘2d human pose estimation: new benchmark and state of the art analysis’. IEEE Confn on Computer Vision and Pattern Recognition (CVPR), 2014.
    9. 9)
      • 9. Karpathy, A., Toderici, G., Shetty, S., et al: ‘Large-scale video classification with convolutional neural networks’. CVPR, 2014.
    10. 10)
      • 10. Chaquet, J.M., Carmona, E.J., Fernandez-Caballero, A.: ‘A survey of video datasets for human action and activity recognition’, Comput. Vis. Image Underst., 2013, 117, (6), pp. 633659.
    11. 11)
      • 11. Wang, H., Schmid, C.: ‘LEAR-INRIA submission for the THUMOS workshop’. THUMOS: ICCV Challenge on Action Recognition with a Large Number of Classes, Winner, 2013.
    12. 12)
      • 12. Peng, X., Wang, L., Wang, X., et al: ‘Bag of visual words and fusion methods for action recognition: comprehensive study and good practice’, ArXiv e-printsarXiv:1405.4506.
    13. 13)
      • 13. Wang, H., Ullah, M.M., Kläser, A., et al: ‘Evaluation of local spatio-temporal features for action recognition’. British Machine Vision Conf. (BMVC), 2009, pp. 124.1124.11.
    14. 14)
      • 14. Jain, M., Jégou, H., Bouthemy, P.: ‘Better exploiting motion for better action recognition’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2013.
    15. 15)
      • 15. Wang, H., Schmid, C.: ‘Action recognition with improved trajectories’. Int. Conf. on Computer Vision (ICCV), 2013.
    16. 16)
      • 16. Peng, X., Qiao, L.W.Y., Peng, Q.: ‘A joint evaluation of dictionary learning and feature encoding for action recognition’. Accepted to Int. Conf. on Patter Recognition (ICPR), 2014.
    17. 17)
      • 17. Cai, Z., Peng, L.W.X., Qiao, Y.: ‘Multi-view super vector for action recognition’. Accepted to IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014.
    18. 18)
      • 18. Murthy, O.V.R., Goecke, R.: ‘Combined ordered and improved trajectories for large scale human action recognition’. THUMOS: ICCV Challenge on Action Recognition with a Large Number of Classes, Second runner-up, 2013.
    19. 19)
      • 19. Murthy, O.V.R., Radwan, I., Goecke, R.: ‘Dense body part trajectories for human action recognition’. Int. Conf. on Image Processing (ICIP), 2014.
    20. 20)
      • 20. Pearson, K.: ‘On lines and planes of closest fit to systems of points in space’, Philos. Mag., 1901, 2, (6), pp. 559572.
    21. 21)
      • 21. Hotelling, H.: ‘Analysis of a complex of statistical variables into principal components’, J. Educ. Psychol., 1933, 24.
    22. 22)
      • 22. Peng, X., Qiao, Y., Peng, Q.: ‘Large margin dimensionality reduction for action similarity labeling’, IEEE Signal Process. Lett., 2014, 21, (8), pp. 10221025.
    23. 23)
      • 23. Tian, C., Fan, G., Gao, X., et al: ‘Multiview face recognition: from tensorface to v-tensorface and k-tensorface.’, IEEE Trans. Syst. Man Cybern. B, 2012, 42, (2), pp. 320333.
    24. 24)
      • 24. Yang, J., Zhang, D., Frangi, A.F., et al: ‘Two-dimensional pca: a new approach to appearance-based face representation and recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2004, 26, (1), pp. 131137.
    25. 25)
      • 25. Lu, H., Plataniotis, K., Venetsanopoulos, A.: ‘Multilinear principal component analysis of tensor objects for recognition’. Int. Conf. on Pattern Recognition (ICPR), 2006, vol. 2, pp. 776779.
    26. 26)
      • 26. Lai, Z., Xu, Y., Jin, Z., et al: ‘Human gait recognition via sparse discriminant projection learning’, IEEE Trans. Circuits Syst. Video Technol., 2014, 24, (10), pp. 16511662.
    27. 27)
      • 27. Zou, H., Hastie, T., Tibshirani, R.: ‘Sparse principal component analysis’, J. Computat. Graph. Stat., 2006, 15, (2), pp. 265286.
    28. 28)
      • 28. Lai, Z., Xu, Y., Yang, J., et al: ‘Sparse tensor discriminant analysis’, IEEE Trans. Image Process., 2013, 22, (10), pp. 39043915.
    29. 29)
      • 29. Lai, Z., Xu, Y., Chen, Q., et al: ‘Multilinear sparse principal component analysis’, IEEE Trans. Neural Netw. Learn. Syst., 2014, 25, (10), pp. 19421950.
    30. 30)
      • 30. Wang, H., Kläser, A., Schmid, C., et al: ‘Action recognition by dense trajectories’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2011.
    31. 31)
      • 31. Laptev, I., Lindeberg, T.: ‘Space-time interest points’. Int. Conf. on Computer Vision (ICCV), 2003, pp. 432439.
    32. 32)
      • 32. Perronnin, F., Sánchez, J., Mensink, T.: ‘Improving the Fisher kernel for large-scale image classification’. European Conf. on Computer Vision (ECCV), 2010.
    33. 33)
      • 33. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., et al: ‘LIBLINEAR: a library for large linear classification’, J. Mach. Learn. Res., 2008, 9, pp. 18711874.
    34. 34)
      • 34. Kuehne, H., Jhuang, H., Garrote, E., et al: ‘HMDB: A large video database for human motion recognition’. Int. Conf. on Computer Vision (ICCV), 2011.
    35. 35)
      • 35. Jiang, Y.-G., Liu, J., Roshan Zamir, A., et al: ‘THUMOS challenge: action recognition with a large number of classes’, 2014.
    36. 36)
      • 36. van der Maaten, L., Postma, E.O., van den Herik, H.J.: ‘Dimensionality reduction: a comparative review’, 2009.
    37. 37)
      • 37. Schölkopf, B., Smola, A., Müller, K.-R.: ‘Nonlinear component analysis as a kernel eigenvalue problem’, Neural Comput., 1998, 10, (5), pp. 12991319.
    38. 38)
      • 38. Shawe-Taylor, J., Cristianini, N.: ‘Kernel methods for pattern analysis’ (Cambridge University Press, 2004).
    39. 39)
      • 39. Wang, Q.: ‘Kernel principal component analysis and its applications in face recognition and active shape models’, ArXiv e-prints.
    40. 40)
      • 40. Murthy, O.V.R., Goecke, R.: ‘Uc – hcc submission to thumos 2014’. THUMOS Challenge: Action Recognition with a Large Number of Classes, 2014.
    41. 41)
      • 41. Oneata, D., Verbeek, J., Schmid, C.: ‘Action and event recognition with fisher vectors on a compact feature set’. Int. Conf. on Computer Vision (ICCV), 2013.
    42. 42)
      • 42. Zhou, X., Yu, K., Zhang, T., et al: ‘Image classification using super-vector coding of local image descriptors’. European Conf. on Computer Vision (ECCV), 2010, pp. 141154.
    43. 43)
      • 43. Lazebnik, S., Schmid, C., Ponce, J.: ‘Beyond bags of features: spatial pyramid matching for recognizing natural scene categories’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2006, pp. 21692178.

Related content

This is a required field
Please enter a valid email address