Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Dimensionality reduction of Fisher vectors for human action recognition

Automatic analysis of human behaviour in large collections of videos is rapidly gaining interest, even more so with the advent of file sharing sites such as YouTube. From one perspective, it can be observed that the size of feature vectors used for human action recognition from videos has been increasing enormously in the last five years, in the order of ∼100–500K. One possible reason might be the growing number of action classes/videos and hence the requirement of discriminating features (that usually end up to be higher-dimensional for larger databases). In this study, the authors review and investigate feature projection as a means to reduce the dimensions of the high-dimensional feature vectors and show their effectiveness in terms of performance. They hypothesise that dimensionality reduction techniques often unearth latent structures in the feature space and are effective in applications such as the fusion of high-dimensional features of different types; and action recognition in untrimmed videos. They conduct all the authors’ experiments using a Bag-of-Words framework for consistency and results are presented on large class benchmark databases such as the HMDB51 and UCF101 datasets.

References

    1. 1)
      • 38. Shawe-Taylor, J., Cristianini, N.: ‘Kernel methods for pattern analysis’ (Cambridge University Press, 2004).
    2. 2)
      • 23. Tian, C., Fan, G., Gao, X., et al: ‘Multiview face recognition: from tensorface to v-tensorface and k-tensorface.’, IEEE Trans. Syst. Man Cybern. B, 2012, 42, (2), pp. 320333.
    3. 3)
      • 3. Sigal, L., Balan, A.O., Black, M.J.: ‘Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion’, Int. J. Comput. Vis., 2010, 87, (1–2), pp. 427.
    4. 4)
      • 43. Lazebnik, S., Schmid, C., Ponce, J.: ‘Beyond bags of features: spatial pyramid matching for recognizing natural scene categories’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2006, pp. 21692178.
    5. 5)
      • 42. Zhou, X., Yu, K., Zhang, T., et al: ‘Image classification using super-vector coding of local image descriptors’. European Conf. on Computer Vision (ECCV), 2010, pp. 141154.
    6. 6)
      • 18. Murthy, O.V.R., Goecke, R.: ‘Combined ordered and improved trajectories for large scale human action recognition’. THUMOS: ICCV Challenge on Action Recognition with a Large Number of Classes, Second runner-up, 2013.
    7. 7)
      • 5. Marszalek, M., Laptev, I., Schmid, C.: ‘Actions in context’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 29292936.
    8. 8)
      • 25. Lu, H., Plataniotis, K., Venetsanopoulos, A.: ‘Multilinear principal component analysis of tensor objects for recognition’. Int. Conf. on Pattern Recognition (ICPR), 2006, vol. 2, pp. 776779.
    9. 9)
      • 32. Perronnin, F., Sánchez, J., Mensink, T.: ‘Improving the Fisher kernel for large-scale image classification’. European Conf. on Computer Vision (ECCV), 2010.
    10. 10)
      • 4. Rodriguez, M., Ahmed, J., Shah, M.: ‘Action mach a spatio-temporal maximum average correlation height filter for action recognition’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 18.
    11. 11)
      • 24. Yang, J., Zhang, D., Frangi, A.F., et al: ‘Two-dimensional pca: a new approach to appearance-based face representation and recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2004, 26, (1), pp. 131137.
    12. 12)
      • 21. Hotelling, H.: ‘Analysis of a complex of statistical variables into principal components’, J. Educ. Psychol., 1933, 24.
    13. 13)
      • 2. Blank, M., Gorelick, L., Shechtman, E., et al: ‘Actions as space-time shapes’. Int. Conf. on Computer Vision (ICCV), 2005, pp. 13951402.
    14. 14)
      • 20. Pearson, K.: ‘On lines and planes of closest fit to systems of points in space’, Philos. Mag., 1901, 2, (6), pp. 559572.
    15. 15)
      • 40. Murthy, O.V.R., Goecke, R.: ‘Uc – hcc submission to thumos 2014’. THUMOS Challenge: Action Recognition with a Large Number of Classes, 2014.
    16. 16)
      • 27. Zou, H., Hastie, T., Tibshirani, R.: ‘Sparse principal component analysis’, J. Computat. Graph. Stat., 2006, 15, (2), pp. 265286.
    17. 17)
      • 29. Lai, Z., Xu, Y., Chen, Q., et al: ‘Multilinear sparse principal component analysis’, IEEE Trans. Neural Netw. Learn. Syst., 2014, 25, (10), pp. 19421950.
    18. 18)
      • 34. Kuehne, H., Jhuang, H., Garrote, E., et al: ‘HMDB: A large video database for human motion recognition’. Int. Conf. on Computer Vision (ICCV), 2011.
    19. 19)
      • 17. Cai, Z., Peng, L.W.X., Qiao, Y.: ‘Multi-view super vector for action recognition’. Accepted to IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014.
    20. 20)
      • 9. Karpathy, A., Toderici, G., Shetty, S., et al: ‘Large-scale video classification with convolutional neural networks’. CVPR, 2014.
    21. 21)
      • 16. Peng, X., Qiao, L.W.Y., Peng, Q.: ‘A joint evaluation of dictionary learning and feature encoding for action recognition’. Accepted to Int. Conf. on Patter Recognition (ICPR), 2014.
    22. 22)
      • 7. Soomro, K., Zamir, A.R., Shah, M.: ‘UCF101: a dataset of 101 human action classes from videos in the wild’. CRCV-TR-12-01, 2012.
    23. 23)
      • 1. Schuldt, C., Laptev, I., Caputo, B.: ‘Recognizing human actions: a local svm approach’. Int. Conf. on Pattern Recognition (ICPR), 2004, vol. 3, pp. 3236.
    24. 24)
      • 13. Wang, H., Ullah, M.M., Kläser, A., et al: ‘Evaluation of local spatio-temporal features for action recognition’. British Machine Vision Conf. (BMVC), 2009, pp. 124.1124.11.
    25. 25)
      • 26. Lai, Z., Xu, Y., Jin, Z., et al: ‘Human gait recognition via sparse discriminant projection learning’, IEEE Trans. Circuits Syst. Video Technol., 2014, 24, (10), pp. 16511662.
    26. 26)
      • 28. Lai, Z., Xu, Y., Yang, J., et al: ‘Sparse tensor discriminant analysis’, IEEE Trans. Image Process., 2013, 22, (10), pp. 39043915.
    27. 27)
      • 30. Wang, H., Kläser, A., Schmid, C., et al: ‘Action recognition by dense trajectories’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2011.
    28. 28)
      • 14. Jain, M., Jégou, H., Bouthemy, P.: ‘Better exploiting motion for better action recognition’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2013.
    29. 29)
      • 11. Wang, H., Schmid, C.: ‘LEAR-INRIA submission for the THUMOS workshop’. THUMOS: ICCV Challenge on Action Recognition with a Large Number of Classes, Winner, 2013.
    30. 30)
      • 10. Chaquet, J.M., Carmona, E.J., Fernandez-Caballero, A.: ‘A survey of video datasets for human action and activity recognition’, Comput. Vis. Image Underst., 2013, 117, (6), pp. 633659.
    31. 31)
      • 12. Peng, X., Wang, L., Wang, X., et al: ‘Bag of visual words and fusion methods for action recognition: comprehensive study and good practice’, ArXiv e-printsarXiv:1405.4506.
    32. 32)
      • 19. Murthy, O.V.R., Radwan, I., Goecke, R.: ‘Dense body part trajectories for human action recognition’. Int. Conf. on Image Processing (ICIP), 2014.
    33. 33)
      • 6. Niebles, J.C., Chen, C.-W., Fei-Fei, L.: ‘Modeling temporal structure of decomposable motion segments for activity classification’. European Conf. on Computer vision (ECCV), 2010, pp. 392405.
    34. 34)
      • 35. Jiang, Y.-G., Liu, J., Roshan Zamir, A., et al: ‘THUMOS challenge: action recognition with a large number of classes’, 2014. http://crcv.ucf.edu/THUMOS14/.
    35. 35)
      • 36. van der Maaten, L., Postma, E.O., van den Herik, H.J.: ‘Dimensionality reduction: a comparative review’, 2009.
    36. 36)
      • 15. Wang, H., Schmid, C.: ‘Action recognition with improved trajectories’. Int. Conf. on Computer Vision (ICCV), 2013.
    37. 37)
      • 37. Schölkopf, B., Smola, A., Müller, K.-R.: ‘Nonlinear component analysis as a kernel eigenvalue problem’, Neural Comput., 1998, 10, (5), pp. 12991319.
    38. 38)
      • 39. Wang, Q.: ‘Kernel principal component analysis and its applications in face recognition and active shape models’, ArXiv e-prints.
    39. 39)
      • 31. Laptev, I., Lindeberg, T.: ‘Space-time interest points’. Int. Conf. on Computer Vision (ICCV), 2003, pp. 432439.
    40. 40)
      • 33. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., et al: ‘LIBLINEAR: a library for large linear classification’, J. Mach. Learn. Res., 2008, 9, pp. 18711874.
    41. 41)
      • 41. Oneata, D., Verbeek, J., Schmid, C.: ‘Action and event recognition with fisher vectors on a compact feature set’. Int. Conf. on Computer Vision (ICCV), 2013.
    42. 42)
      • 22. Peng, X., Qiao, Y., Peng, Q.: ‘Large margin dimensionality reduction for action similarity labeling’, IEEE Signal Process. Lett., 2014, 21, (8), pp. 10221025.
    43. 43)
      • 8. Andriluka, M., Pishchulin, L., Gehler, P., et al: ‘2d human pose estimation: new benchmark and state of the art analysis’. IEEE Confn on Computer Vision and Pattern Recognition (CVPR), 2014.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2015.0091
Loading

Related content

content/journals/10.1049/iet-cvi.2015.0091
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address