http://iet.metastore.ingenta.com
1887

Histograms of sequences: a novel representation for human interaction recognition

Histograms of sequences: a novel representation for human interaction recognition

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Computer Vision — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

This study presents a novel representation based on hierarchical histogram of local feature sequences for human interaction recognition. The authors’ method basically combines the power of discriminative sequence mining and histogram representation for the effective recognition of human interactions. Our framework involves extracting visual features from the videos first, and then mining sequences of the visual features that occur consequently in space and time. After the mining step, we represent each video with a histogram pyramid of such sequences. We also propose to use soft clustering in the visual word construction step, such that more information-rich histograms can be obtained. The authors’ experimental results on challenging human interaction recognition data sets indicate that the proposed algorithm performs on par with the state-of-the-art methods.

References

    1. 1)
      • 1. Laptev, I., Marszaek, M., Schmid, C., et al: ‘Learning realistic human actions from movies’. IEEE Int. Conf. on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008, pp. 18.
    2. 2)
      • 2. Laptev, I.: ‘On space-time interest points’, Int. J. Comput. Vision, 2005, 64, (2), pp. 107123.
    3. 3)
      • 3. Niebles, J. C., Chen, C., Fei-Fei, L.: ‘Modeling temporal structure of decomposable motion segments for activity classification’. European Conf. on Computer Vision, Crete, Greece, 2010.
    4. 4)
      • 4. Wang, L., Qiao, Y., Tang, X.: ‘Mining motion atoms and phrases for complex action recognition’. IEEE Int. Conf. on Computer Vision, Sydney, Australia, 2013, pp. 26802687.
    5. 5)
      • 5. Wang, H., Schmid, C.: ‘Action recognition with improved trajectories’. IEEE Int. Conf. on Computer Vision, Sydney, Australia, 2013, pp. 35513558.
    6. 6)
      • 6. Choi, J., Wang, Z., Lee, S., et al: ‘A spatio-temporal pyramid matching for video retrieval’, Comput. Vis. Image Underst., 2013, 117, (6), pp. 660669.
    7. 7)
      • 7. Ryoo, M.S., Aggarwal, J.K.: ‘UT-Interaction dataset, ICPR’. Available at http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html.
    8. 8)
      • 8. Patron, A., Marszalek, M., Zisserman, A., et al: ‘High five: recognising human interactions in tv shows’. Proc. of the British Machine Vision Conf., Aberystwyth, England, 2010.
    9. 9)
      • 9. Aggarwal, J.K., Ryoo, M.S.: ‘Human activity analysis: a review’, ACM Comput. Surv., 2011, 43, (3), pp. 143.
    10. 10)
      • 10. Vrigkas, M., Nikou, C., Kakadiaris, I.: ‘A review of human activity recognition methods’, Front. Robot. AI, 2015, 2, (28), pp. 128.
    11. 11)
      • 11. Weinland, D., Ronfard, R., Boyer, E.: ‘A survey of vision-based methods for action representation, segmentation and recognition’, Comput. Vis. Image Underst., 2011, 115, (2), pp. 224241.
    12. 12)
      • 12. Marszałek, M., Laptev, I., Schmid, C.: ‘Actions in context’. IEEE Int. Conf. on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009.
    13. 13)
      • 13. Wang, H., Ullah, M. M., Kläser, A., et al: ‘Evaluation of local spatio-temporal features for action recognition’. British Machine Vision Conf., London, England, 2009.
    14. 14)
      • 14. Marín-Jiménez, M., Yeguas, E., Nicolás, P.: ‘Exploring stip-based models for recognizing human interactions in tv videos’, Pattern Recognit. Lett., 2013, 34, (15), pp. 18191828.
    15. 15)
      • 15. Patron-Perez, A., Marszalek, M., Reid, I., et al: ‘Structured learning of human interactions in tv shows’, IEEE Pattern Anal. Mach. Intell., 2012, 34, (12), pp. 24412453.
    16. 16)
      • 16. Hoai, M., Zisserman, A.: ‘Talking heads: detecting humans and recognizing their interactions’. IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014.
    17. 17)
      • 17. Wang, H., Klaser, A., Schmid, C., et al: ‘Action recognition by dense trajectories’. IEEE Conf. on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 2011, pp. 31693176.
    18. 18)
      • 18. Zhang, Y., Liu, X., Chang, M., et al: ‘Spatio-temporal phrases for activity recognition’. 12th European Conf. on Computer Vision, Florence, Italy, 2012, pp. 702721.
    19. 19)
      • 19. Ryoo, M.S., Aggarwal, J.K.: ‘Spatio-temporal relationship match: video structure comparison for recognition of complex human activities’. IEEE 12th Int. Conf. on Computer Vision, Kyoto, Japan, 2009.
    20. 20)
      • 20. Slimani, K. N. e. H., Benezeth, Y., Souami, F.: ‘Human interaction recognition based on the co-occurrence of visual words’. IEEE Conf. on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 2014, pp. 461466.
    21. 21)
      • 21. Savarese, S., DelPozo, A., Niebles, J. C., et al: ‘Spatial-temporal correlations for unsupervised action classification’. Proc. of the IEEE Workshop on Motion and Video Computing, Copper Mountain, CO, USA, 2008.
    22. 22)
      • 22. Khokher, M. R., Bouzerdoum, A., Phung, S. L.: ‘Human interaction recognition using low-rank matrix approximation and super descriptor tensor decomposition’. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 2017, pp. 18471851.
    23. 23)
      • 23. Zhang, B., Rota, P., Conci, N., et al: ‘Human interaction recognition in the wild: analyzing trajectory clustering from multiple-instance-learning perspective’. IEEE Int. Conf. on Multimedia and Expo, Turin, Italy, 2015, pp. 16.
    24. 24)
      • 24. Kong, Y., Fu, Y.: ‘Max-margin action prediction machine’, IEEE Trans. Pattern Anal. Mach. Intell., 2016, 38, (9), pp. 18441858.
    25. 25)
      • 25. Gaidon, A., Harchaoui, Z., Schmid, C.: ‘Activity representation with motion hierarchies’, Int. J. Comput. Vision, 2014, 107, (3), pp. 219238.
    26. 26)
      • 26. Burghouts, G.J., Schutte, K.: ‘Spatio-temporal layout of human actions for improved bag-of-words action detection’, Pattern Recognit. Lett., 2013, 34, (15), pp. 18611869.
    27. 27)
      • 27. Li, B., Ayazoglu, M., Mao, T., et al: ‘Activity recognition using dynamic subspace angles’. IEEE Conf. on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 2011, pp. 31933200.
    28. 28)
      • 28. Yu, G., Yuan, J., Liu, Z.: ‘Propagative hough voting for human activity recognition’. Proc. of the 12th European Conf. on Computer Vision, Florence, Italy, 2012, pp. 393706.
    29. 29)
      • 29. Vahdat, A., Gao, B., Ranjbar, M., et al: ‘A discriminative key pose sequence model for recognizing human interactions’. IEEE Int. Conf. on Computer Vision Workshops, Barcelona, Spain, 2011, pp. 17291736.
    30. 30)
      • 30. Ma, S., Sigal, L., Sclaroff, S.: ‘Space-time tree ensemble for action recognition’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015.
    31. 31)
      • 31. Shugao, M., Zhang, J., Ikizler-Cinbis, N., et al: ‘Action recognition and localization by hierarchical space-time segments’. IEEE Int. Conf. on Computer Vision, Sydney, Australia, 2013, pp. 27442751.
    32. 32)
      • 32. Rodriguez, M. D., Ahmed, J., Shah, M.: ‘Action mach: a spatio-temporal maximum average correlation height filter for action recognition’. Proc. of IEEE Int. Conf. on Computer and Pattern Recognition, Anchorage, AK, USA, 2008.
    33. 33)
      • 33. Liang, J., Xu, C., Feng, Z., et al: ‘Affective interaction recognition using spatio-temporal features and context’, Comput. Vis. Image Underst., 2016, 144, pp. 55165.
    34. 34)
      • 34. Liu, L., Shao, L., Rockett, P.: ‘Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition’, Pattern Recognit., 2013, 46, (7), pp. 18101818.
    35. 35)
      • 35. Raptis, M., Sigal, L.: ‘Poselet key-framing: A model for human activity recognition’. IEEE Conf. on Computer Vision and Pattern Recognition, Oregon, PO, USA, 2013, pp. 26502657.
    36. 36)
      • 36. Sefidgar, Y. S., Vahdat, A., Se, S., et al: ‘Discriminative key-component models for interaction detection and recognition’, Comput. Vis. Image Underst., 2015, 135, (C), pp. 1630.
    37. 37)
      • 37. Tran, D., Bourdey, L. D., Fergus, R., et al: ‘C3d: generic features for video analysis’. IEEE Int. Conf. on Computer Vision, Santiago, Chile, 2015.
    38. 38)
      • 38. Karpathy, A., Toderici, G., Shetty, S., et al: ‘Large-scale video classification with convolutional neural networks’. IEEE Conf. on Computer Vision and Pattern Recognition, Washington, USA, 2014.
    39. 39)
      • 39. Feichtenhofer, C., Pinz, A., Zisserman, A.: ‘Convolutional two-stream network fusion for video action recognition’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Zurich, Switzerland, 2016.
    40. 40)
      • 40. Li, Q., Qiu, Z., Yao, T., et al: ‘Action recognition by learning deep multi-granular spatio-temporal video representation’. Proc. of the 2016 ACM on Int. Conf. on Multimedia Retrieval, New York, NY, USA, 2016, pp. 159166.
    41. 41)
      • 41. Simonyan, K., Zisserman, A.: ‘Two-stream convolutional networks for action recognition in videos’. Proc. of the 27th Int. Conf. on Neural Information Processing Systems, Montreal, Canada, 2014, pp. 568576.
    42. 42)
      • 42. Brox, T., Malik, J.: ‘Object segmentation by long term analysis of point trajectories’. European Conf. on Computer Vision, Crete, Greece, 2010, pp. 282295.
    43. 43)
      • 43. Dalal, N., Triggs, B., Schmid, C.: ‘Human detection using oriented histograms of flow and appearance’. Proc. of the 9th European Conf. on Computer Vision, Graz, Austria, 2006, pp. 428441.
    44. 44)
      • 44. Lee, H., Battle, A., Raina, R., et al: ‘Efficient sparse coding algorithms’. Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 2006, pp. 801808.
    45. 45)
      • 45. Wang, J., Yang, J., Yu, K., et al: ‘Locality-constrained linear coding for image classification’. IEEE Int. Conf. on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2010, pp. 33603367.
    46. 46)
      • 46. Agrawal, R., Srikant, R.: ‘Mining sequential patterns’. Proc. of the Eleventh Int. Conf. on Data Engineering, Taipei, Taiwan, 1995, pp. 314.
    47. 47)
      • 47. Hirate, Y., Yamana, H.: ‘Generalized sequential pattern mining with item intervals’, J. Comput., 2006, 1, (3), pp. 5160.
    48. 48)
      • 48. Han, J., Wang, J., Li, C.: ‘Frequent closed sequence mining without candidate maintenance’, IEEE Trans. Knowl. Data Eng., 2007, 19, (8), pp. 10421056.
    49. 49)
      • 49. Ryoo, M.S.: ‘Human activity prediction: early recognition of ongoing activities from streaming videos’. Proc. of the 2011 Int. Conf. on Computer Vision, Barcelona, Spain, 2011, pp. 10361043.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2017.0471
Loading

Related content

content/journals/10.1049/iet-cvi.2017.0471
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address