http://iet.metastore.ingenta.com
1887

Action recognition from mutually incoherent pose bases in static image

Action recognition from mutually incoherent pose bases in static image

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Computer Vision — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Action recognition in static image is challenging. The authors propose mutually incoherent pose bases which are implicit poselet co-occurrences and are learned by dictionary training to describe body pose. Poselets in a pose basis are not constrained in space and quantity, thus pose basis can describe body pose more flexibly than k-poselet. In their method, body pose in an image is represented by a sparse linear combination of pose bases because pose in an action varies while each image only captures a snapshot from a single viewpoint. In dictionary training, the challenge is how to stabilise the sparse representation which is the input of Support Vector Machine (SVM) for action recognition, because the original pose signal is ambiguous while dictionary is an over complete matrix. Their solution is to add cumulative coherence as penalty in objective function and induce pose bases become mutually incoherent. They evaluate the method on two popular datasets and experiment results show the pose representation has encouraging performance in action recognition. Furthermore, they empirically exploit the complementary role of the local pose feature with deep convolutional neural network features from holistic image. Experiment results demonstrate aggressive performance improvement by concatenating the two features.

References

    1. 1)
      • 1. Poppe, R.: ‘A survey on vision-based human action recognition’, Image Vis. Comput., 2010, 28, (6), pp. 976990.
    2. 2)
      • 2. Guo, G., Lai, A.: ‘A survey on still image based human action recognition’, Pattern Recognit., 2014, 47, (10), pp. 33433361.
    3. 3)
      • 3. Lazebnik, S., Schmid, C., Ponce, J.: ‘Beyond bags of features: spatial pyramid matching for recognizing natural scene categories’. Computer Vision and Pattern Recognition, 2006, pp. 21692178.
    4. 4)
      • 4. Delaitre, V., Laptev, I., Sivic, J.: ‘Recognizing human actions in still images: a study of bag-of-features and part-based representations’. British Machine Vision Conf., 2010, pp. 111.
    5. 5)
      • 5. Yang, W., Wang, Y., Mori, G.: ‘Recognizing human actions from still images with latent poses’. Computer Vision and Pattern Recognition, 2010, pp. 20302037.
    6. 6)
      • 6. Wang, Y., Tran, D., Liao, Z., et al: ‘Discriminative hierarchical part-based models for human parsing and action recognition’, J. Mach. Learn. Res., 2012, 13, (1), pp. 30753102.
    7. 7)
      • 7. Maji, S., Bourdev, L., Malik, J.: ‘Action recognition from a distributed representation of pose and appearance’. Computer Vision and Pattern Recognition, 2011, pp. 31773184.
    8. 8)
      • 8. Ikizler, N., Duygulu, P.: ‘Histogram of oriented rectangles: a new pose descriptor for human action recognition’, Image Vis. Comput., 2009, 27, (10), pp. 15151526.
    9. 9)
      • 9. Yao, B., Li, F.: ‘Action recognition with exemplar based 2.5D graph matching’. European Conf. Computer Vision, 2012, pp. 173186.
    10. 10)
      • 10. Felzenszwalb, P.-F., Huttenlocher, D.-P.: ‘Pictorial structures for object recognition’, Int. J. Comput. Vis., 2005, 61, (1), pp. 5579.
    11. 11)
      • 11. Bourdev, L., Malik, J.: ‘Poselet: body part detectors trained using 3D human pose annotations’. Int. Conf. Computer Vision, 2011, pp. 13651372.
    12. 12)
      • 12. Bourdev, L., Maji, S., Brox, T., et al: ‘Detecting people using mutually consistent poselet activations’. European Conf. Computer Vision, 2010, pp. 168181.
    13. 13)
      • 13. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. Computer Vision and Pattern Recognition, 2005, pp. 886893.
    14. 14)
      • 14. Yao, B., Jiang, X., Khosal, A., et al: ‘Human action recognition by learning bases of action attributes and parts’. Int. Conf. Computer Vision, 2011, pp. 13311338.
    15. 15)
      • 15. Qian, Y., Chen, W., Shen, I.: ‘Action recognition from pose signature in static image’, Int. J. Pattern Recognit. Artif. Intell., 2016, 30, (3), p. 1655010.
    16. 16)
      • 16. Gkioxari, G., Hariharan, B., Girshick, R., et al: ‘Using k-poselet for detecting people and localizing their keypoints’. Computer Vision and Pattern Recognition, 2014, pp. 35823589.
    17. 17)
      • 17. Olshausen, B., Field, D.: ‘Emergence of simple-cell receptive field properties by learning a sparse code for natural images’, Nature, 1996, 381, pp. 607609.
    18. 18)
      • 18. Olshausen, B., Field, D.: ‘Sparse coding with an overcomplete basis set: a strategy employed by V1?’, Vis. Res., 1997, 37, (23), pp. 33113325.
    19. 19)
      • 19. Krizhevsky, A., Sutskever, I., Hinton, G.: ‘ImageNet classification with deep convolutional neural networks’. Neural Information Processing Systems, 2012, pp. 10971105.
    20. 20)
      • 20. Deng, J., Berg, A., Satheesh, S., et al: ‘ImageNet large scale visual recognition competition 2012’, http://www.image-net.org/challenges/LSVRC/2012/ Accessed: April 5, 2017.
    21. 21)
      • 21. Girshick, R., Donahue, J., Darrell, T., et al: ‘Rich feature hierarchies for accurate object detection and semantic segmentation’. Computer Vision and Pattern Recognition, 2014, pp. 580587.
    22. 22)
      • 22. Guo, Y., Liu, Y., Oerlemans, A., et al: ‘Deep learning for visual understanding: a review’, Neurocomputing, 2016, 187, pp. 2748.
    23. 23)
      • 23. Donahue, J., Jia, Y., Vinyals, O., et al: ‘DeCAF: a deep convolutional activation feature for generic visual recognition’. Int. Conf. Machine Learning, 2014, pp. 18.
    24. 24)
      • 24. Goodale, M.-A., Milner, A.-D.: ‘Separate visual pathways for perception and action’, Trends Neurosci., 1992, 15, (1), pp. 2025.
    25. 25)
      • 25. Chĺęron, G., Laptev, I., Schmid, C.: ‘P-CNN: pose-based CNN features for action recognition’. Int. Conf. Computer Vision, 2015, pp. 32183226.
    26. 26)
      • 26. Glkioxari, G., Malik, J.: ‘Finding action tubes’. Computer Vision and Pattern Recognition, 2015, pp. 759768.
    27. 27)
      • 27. Qian, Y., Chen, W., Shen, I.: ‘Mutually incoherent pose bases for action recognition’. ICPR, 2016, pp. 823828.
    28. 28)
      • 28. Fischler, M.-A., Elschlager, R.-A.: ‘The representation and matching of pictorial structures’, IEEE Trans. Comput., 1973, 22, (1), pp. 6792.
    29. 29)
      • 29. Felzenszwalb, P.-F., Girshick, R.B., McAllester, D., et al: ‘Object detection with discriminatively trained part-based models’, IEEE Trans. Pattern Anal. Mach. Intell., 2010, 32, (9), pp. 16271645.
    30. 30)
      • 30. Zuffi, S., Freifeld, O., Black, M.-J.: ‘From pictorial structures to deformable structures’. Computer Vision and Pattern Recognition, 2013, pp. 35463553.
    31. 31)
      • 31. Chen, X., Yuille, A.-L.: ‘Articulated pose estimation by a graphical model with image dependent pairwise relations’. Neural Information Processing System, 2014, pp. 18.
    32. 32)
      • 32. Yang, W., Ouyang, W., Li, H., et al: ‘End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation’. Computer Vision and Pattern Recognition, 2016, pp. 30733082.
    33. 33)
      • 33. Donoho, D.-L., Johnstone, J.-M.: ‘Ideal spatial adaptation by wavelet shrinkage’, Biometrika, 1994, 81, (3), pp. 425455.
    34. 34)
      • 34. Mallat, S., Zhang, Z.: ‘Matching pursuit in a time-frequency dictionary’, IEEE Trans. Signal Process., 1993, 41, (12), pp. 33973415.
    35. 35)
      • 35. Pati, Y., Rezaiifar, R., Krishnaprasad, P.: ‘Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition’. Asilomar Conf. Signals, Systems and Computers, 1993, pp. 4044.
    36. 36)
      • 36. Aharon, M., Elad, M., Bruckstein, A: ‘K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation’, IEEE Trans. Signal Process., 2006, 54, (11), pp. 43114322.
    37. 37)
      • 37. Huang, K., Aviyente, S.: ‘Sparse representation for signal classification’. Neural Information Processing System, 2006, pp. 609616.
    38. 38)
      • 38. Mairal, J., Bach, F., Ponce, J., et al: ‘Supervised dictionary learning’. Neural Information Processing System, 2008, pp. 18.
    39. 39)
      • 39. Yang, M., Zhang, D., Feng, X.: ‘Fisher discrimination dictionary learning for sparse representation’. Int. Conf. Computer Vision, 2011, pp. 543550.
    40. 40)
      • 40. Pham, D.-S., Venkatesh, S.: ‘Joint learning and dictionary construction for pattern recognition’. Computer Vision and Pattern Recognition, 2008, pp. 18.
    41. 41)
      • 41. Donoho, D.L., Elad, M.: ‘Optimally sparse representation in general (non-orthogonal) dictionaries via l1 minimization’, Proc. Natl. Acad. Sci., 2002, 100, (5), pp. 21972202.
    42. 42)
      • 42. Tropp, J.A.: ‘Greed is good: algorithmic results for sparse approximation’, IEEE Trans. IT, 2004, 50, (10), pp. 22312242.
    43. 43)
      • 43. Bo, L., Ren, X., Fox, D.: ‘Multipath sparse coding using hierarchical matching pursuit’. Computer Vision and Pattern Recognition, 2013, pp. 660667.
    44. 44)
      • 44. Everingham, M., Eslami, S.-M., Gool, L.-V., et al: ‘The PASCAL visual object classes challenge: a retrospective’, Int. J. Comput. Vis., 2015, 111, (1), pp. 98136.
    45. 45)
      • 45. Cinbis, N.-I., Cinbis, R.-G., Sclaroff, S.: ‘Learning actions from the web’. Computer Vision and Pattern Recognition, 2009, pp. 9951002.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2017.0233
Loading

Related content

content/journals/10.1049/iet-cvi.2017.0233
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address