access icon free Action recognition from mutually incoherent pose bases in static image

Action recognition in static image is challenging. The authors propose mutually incoherent pose bases which are implicit poselet co-occurrences and are learned by dictionary training to describe body pose. Poselets in a pose basis are not constrained in space and quantity, thus pose basis can describe body pose more flexibly than k-poselet. In their method, body pose in an image is represented by a sparse linear combination of pose bases because pose in an action varies while each image only captures a snapshot from a single viewpoint. In dictionary training, the challenge is how to stabilise the sparse representation which is the input of Support Vector Machine (SVM) for action recognition, because the original pose signal is ambiguous while dictionary is an over complete matrix. Their solution is to add cumulative coherence as penalty in objective function and induce pose bases become mutually incoherent. They evaluate the method on two popular datasets and experiment results show the pose representation has encouraging performance in action recognition. Furthermore, they empirically exploit the complementary role of the local pose feature with deep convolutional neural network features from holistic image. Experiment results demonstrate aggressive performance improvement by concatenating the two features.

Inspec keywords: support vector machines; pose estimation; matrix algebra; neural nets; image representation

Other keywords: dictionary training; implicit poselet co-occurrences; SVM; static image; objective function; local pose feature; sparse linear pose bases combination; mutually incoherent pose bases; sparse representation; overcomplete matrix; cumulative coherence; action recognition; pose representation; deep convolutional neural network features

Subjects: Neural computing techniques; Algebra; Algebra; Image recognition; Knowledge engineering techniques; Computer vision and image processing techniques

References

    1. 1)
      • 11. Bourdev, L., Malik, J.: ‘Poselet: body part detectors trained using 3D human pose annotations’. Int. Conf. Computer Vision, 2011, pp. 13651372.
    2. 2)
      • 5. Yang, W., Wang, Y., Mori, G.: ‘Recognizing human actions from still images with latent poses’. Computer Vision and Pattern Recognition, 2010, pp. 20302037.
    3. 3)
      • 44. Everingham, M., Eslami, S.-M., Gool, L.-V., et al: ‘The PASCAL visual object classes challenge: a retrospective’, Int. J. Comput. Vis., 2015, 111, (1), pp. 98136.
    4. 4)
      • 23. Donahue, J., Jia, Y., Vinyals, O., et al: ‘DeCAF: a deep convolutional activation feature for generic visual recognition’. Int. Conf. Machine Learning, 2014, pp. 18.
    5. 5)
      • 36. Aharon, M., Elad, M., Bruckstein, A: ‘K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation’, IEEE Trans. Signal Process., 2006, 54, (11), pp. 43114322.
    6. 6)
      • 32. Yang, W., Ouyang, W., Li, H., et al: ‘End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation’. Computer Vision and Pattern Recognition, 2016, pp. 30733082.
    7. 7)
      • 14. Yao, B., Jiang, X., Khosal, A., et al: ‘Human action recognition by learning bases of action attributes and parts’. Int. Conf. Computer Vision, 2011, pp. 13311338.
    8. 8)
      • 35. Pati, Y., Rezaiifar, R., Krishnaprasad, P.: ‘Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition’. Asilomar Conf. Signals, Systems and Computers, 1993, pp. 4044.
    9. 9)
      • 1. Poppe, R.: ‘A survey on vision-based human action recognition’, Image Vis. Comput., 2010, 28, (6), pp. 976990.
    10. 10)
      • 28. Fischler, M.-A., Elschlager, R.-A.: ‘The representation and matching of pictorial structures’, IEEE Trans. Comput., 1973, 22, (1), pp. 6792.
    11. 11)
      • 8. Ikizler, N., Duygulu, P.: ‘Histogram of oriented rectangles: a new pose descriptor for human action recognition’, Image Vis. Comput., 2009, 27, (10), pp. 15151526.
    12. 12)
      • 37. Huang, K., Aviyente, S.: ‘Sparse representation for signal classification’. Neural Information Processing System, 2006, pp. 609616.
    13. 13)
      • 22. Guo, Y., Liu, Y., Oerlemans, A., et al: ‘Deep learning for visual understanding: a review’, Neurocomputing, 2016, 187, pp. 2748.
    14. 14)
      • 42. Tropp, J.A.: ‘Greed is good: algorithmic results for sparse approximation’, IEEE Trans. IT, 2004, 50, (10), pp. 22312242.
    15. 15)
      • 27. Qian, Y., Chen, W., Shen, I.: ‘Mutually incoherent pose bases for action recognition’. ICPR, 2016, pp. 823828.
    16. 16)
      • 41. Donoho, D.L., Elad, M.: ‘Optimally sparse representation in general (non-orthogonal) dictionaries via l1 minimization’, Proc. Natl. Acad. Sci., 2002, 100, (5), pp. 21972202.
    17. 17)
      • 10. Felzenszwalb, P.-F., Huttenlocher, D.-P.: ‘Pictorial structures for object recognition’, Int. J. Comput. Vis., 2005, 61, (1), pp. 5579.
    18. 18)
      • 39. Yang, M., Zhang, D., Feng, X.: ‘Fisher discrimination dictionary learning for sparse representation’. Int. Conf. Computer Vision, 2011, pp. 543550.
    19. 19)
      • 38. Mairal, J., Bach, F., Ponce, J., et al: ‘Supervised dictionary learning’. Neural Information Processing System, 2008, pp. 18.
    20. 20)
      • 15. Qian, Y., Chen, W., Shen, I.: ‘Action recognition from pose signature in static image’, Int. J. Pattern Recognit. Artif. Intell., 2016, 30, (3), p. 1655010.
    21. 21)
      • 33. Donoho, D.-L., Johnstone, J.-M.: ‘Ideal spatial adaptation by wavelet shrinkage’, Biometrika, 1994, 81, (3), pp. 425455.
    22. 22)
      • 30. Zuffi, S., Freifeld, O., Black, M.-J.: ‘From pictorial structures to deformable structures’. Computer Vision and Pattern Recognition, 2013, pp. 35463553.
    23. 23)
      • 45. Cinbis, N.-I., Cinbis, R.-G., Sclaroff, S.: ‘Learning actions from the web’. Computer Vision and Pattern Recognition, 2009, pp. 9951002.
    24. 24)
      • 20. Deng, J., Berg, A., Satheesh, S., et al: ‘ImageNet large scale visual recognition competition 2012’, http://www.image-net.org/challenges/LSVRC/2012/ Accessed: April 5, 2017.
    25. 25)
      • 3. Lazebnik, S., Schmid, C., Ponce, J.: ‘Beyond bags of features: spatial pyramid matching for recognizing natural scene categories’. Computer Vision and Pattern Recognition, 2006, pp. 21692178.
    26. 26)
      • 19. Krizhevsky, A., Sutskever, I., Hinton, G.: ‘ImageNet classification with deep convolutional neural networks’. Neural Information Processing Systems, 2012, pp. 10971105.
    27. 27)
      • 12. Bourdev, L., Maji, S., Brox, T., et al: ‘Detecting people using mutually consistent poselet activations’. European Conf. Computer Vision, 2010, pp. 168181.
    28. 28)
      • 25. Chĺęron, G., Laptev, I., Schmid, C.: ‘P-CNN: pose-based CNN features for action recognition’. Int. Conf. Computer Vision, 2015, pp. 32183226.
    29. 29)
      • 21. Girshick, R., Donahue, J., Darrell, T., et al: ‘Rich feature hierarchies for accurate object detection and semantic segmentation’. Computer Vision and Pattern Recognition, 2014, pp. 580587.
    30. 30)
      • 9. Yao, B., Li, F.: ‘Action recognition with exemplar based 2.5D graph matching’. European Conf. Computer Vision, 2012, pp. 173186.
    31. 31)
      • 29. Felzenszwalb, P.-F., Girshick, R.B., McAllester, D., et al: ‘Object detection with discriminatively trained part-based models’, IEEE Trans. Pattern Anal. Mach. Intell., 2010, 32, (9), pp. 16271645.
    32. 32)
      • 18. Olshausen, B., Field, D.: ‘Sparse coding with an overcomplete basis set: a strategy employed by V1?’, Vis. Res., 1997, 37, (23), pp. 33113325.
    33. 33)
      • 40. Pham, D.-S., Venkatesh, S.: ‘Joint learning and dictionary construction for pattern recognition’. Computer Vision and Pattern Recognition, 2008, pp. 18.
    34. 34)
      • 43. Bo, L., Ren, X., Fox, D.: ‘Multipath sparse coding using hierarchical matching pursuit’. Computer Vision and Pattern Recognition, 2013, pp. 660667.
    35. 35)
      • 6. Wang, Y., Tran, D., Liao, Z., et al: ‘Discriminative hierarchical part-based models for human parsing and action recognition’, J. Mach. Learn. Res., 2012, 13, (1), pp. 30753102.
    36. 36)
      • 16. Gkioxari, G., Hariharan, B., Girshick, R., et al: ‘Using k-poselet for detecting people and localizing their keypoints’. Computer Vision and Pattern Recognition, 2014, pp. 35823589.
    37. 37)
      • 31. Chen, X., Yuille, A.-L.: ‘Articulated pose estimation by a graphical model with image dependent pairwise relations’. Neural Information Processing System, 2014, pp. 18.
    38. 38)
      • 17. Olshausen, B., Field, D.: ‘Emergence of simple-cell receptive field properties by learning a sparse code for natural images’, Nature, 1996, 381, pp. 607609.
    39. 39)
      • 13. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. Computer Vision and Pattern Recognition, 2005, pp. 886893.
    40. 40)
      • 2. Guo, G., Lai, A.: ‘A survey on still image based human action recognition’, Pattern Recognit., 2014, 47, (10), pp. 33433361.
    41. 41)
      • 34. Mallat, S., Zhang, Z.: ‘Matching pursuit in a time-frequency dictionary’, IEEE Trans. Signal Process., 1993, 41, (12), pp. 33973415.
    42. 42)
      • 7. Maji, S., Bourdev, L., Malik, J.: ‘Action recognition from a distributed representation of pose and appearance’. Computer Vision and Pattern Recognition, 2011, pp. 31773184.
    43. 43)
      • 24. Goodale, M.-A., Milner, A.-D.: ‘Separate visual pathways for perception and action’, Trends Neurosci., 1992, 15, (1), pp. 2025.
    44. 44)
      • 26. Glkioxari, G., Malik, J.: ‘Finding action tubes’. Computer Vision and Pattern Recognition, 2015, pp. 759768.
    45. 45)
      • 4. Delaitre, V., Laptev, I., Sivic, J.: ‘Recognizing human actions in still images: a study of bag-of-features and part-based representations’. British Machine Vision Conf., 2010, pp. 111.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2017.0233
Loading

Related content

content/journals/10.1049/iet-cvi.2017.0233
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading