Histograms of sequences: a novel representation for human interaction recognition

Aytac Cavent; Nazli Ikizler-Cinbis

Histograms of sequences: a novel representation for human interaction recognition

View Fulltext

Author(s): Aytac Cavent¹ and Nazli Ikizler-Cinbis¹
- Affiliations: 1: Department of Computer Engineering , Hacettepe University , 06800 Ankara , Turkey
Source: Volume 12, Issue 6, September 2018, p. 844 – 854
DOI: 10.1049/iet-cvi.2017.0471 , Print ISSN 1751-9632, Online ISSN 1751-9640

Received 21/09/2017, Accepted 09/04/2018, Revised 14/03/2018, Published 11/04/2018

This study presents a novel representation based on hierarchical histogram of local feature sequences for human interaction recognition. The authors’ method basically combines the power of discriminative sequence mining and histogram representation for the effective recognition of human interactions. Our framework involves extracting visual features from the videos first, and then mining sequences of the visual features that occur consequently in space and time. After the mining step, we represent each video with a histogram pyramid of such sequences. We also propose to use soft clustering in the visual word construction step, such that more information-rich histograms can be obtained. The authors’ experimental results on challenging human interaction recognition data sets indicate that the proposed algorithm performs on par with the state-of-the-art methods.

References

1. 1)
  - 38. Karpathy, A., Toderici, G., Shetty, S., et al: ‘Large-scale video classification with convolutional neural networks’. IEEE Conf. on Computer Vision and Pattern Recognition, Washington, USA, 2014.
2. 2)
  - 3. Niebles, J. C., Chen, C., Fei-Fei, L.: ‘Modeling temporal structure of decomposable motion segments for activity classification’. European Conf. on Computer Vision, Crete, Greece, 2010.
3. 3)
  - 20. Slimani, K. N. e. H., Benezeth, Y., Souami, F.: ‘Human interaction recognition based on the co-occurrence of visual words’. IEEE Conf. on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 2014, pp. 461–466.
4. 4)
  - 44. Lee, H., Battle, A., Raina, R., et al: ‘Efficient sparse coding algorithms’. Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 2006, pp. 801–808.
5. 5)
  - 16. Hoai, M., Zisserman, A.: ‘Talking heads: detecting humans and recognizing their interactions’. IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014.
6. 6)
  - 41. Simonyan, K., Zisserman, A.: ‘Two-stream convolutional networks for action recognition in videos’. Proc. of the 27th Int. Conf. on Neural Information Processing Systems, Montreal, Canada, 2014, pp. 568–576.
7. 7)
  - 28. Yu, G., Yuan, J., Liu, Z.: ‘Propagative hough voting for human activity recognition’. Proc. of the 12th European Conf. on Computer Vision, Florence, Italy, 2012, pp. 393–706.
8. 8)
  - 33. Liang, J., Xu, C., Feng, Z., et al: ‘Affective interaction recognition using spatio-temporal features and context’, Comput. Vis. Image Underst., 2016, 144, pp. 55–165.
9. 9)
  - 31. Shugao, M., Zhang, J., Ikizler-Cinbis, N., et al: ‘Action recognition and localization by hierarchical space-time segments’. IEEE Int. Conf. on Computer Vision, Sydney, Australia, 2013, pp. 2744–2751.
10. 10)
  - 25. Gaidon, A., Harchaoui, Z., Schmid, C.: ‘Activity representation with motion hierarchies’, Int. J. Comput. Vision, 2014, 107, (3), pp. 219–238.
11. 11)
  - 40. Li, Q., Qiu, Z., Yao, T., et al: ‘Action recognition by learning deep multi-granular spatio-temporal video representation’. Proc. of the 2016 ACM on Int. Conf. on Multimedia Retrieval, New York, NY, USA, 2016, pp. 159–166.
12. 12)
  - 15. Patron-Perez, A., Marszalek, M., Reid, I., et al: ‘Structured learning of human interactions in tv shows’, IEEE Pattern Anal. Mach. Intell., 2012, 34, (12), pp. 2441–2453.
13. 13)
  - 37. Tran, D., Bourdey, L. D., Fergus, R., et al: ‘C3d: generic features for video analysis’. IEEE Int. Conf. on Computer Vision, Santiago, Chile, 2015.
14. 14)
  - 36. Sefidgar, Y. S., Vahdat, A., Se, S., et al: ‘Discriminative key-component models for interaction detection and recognition’, Comput. Vis. Image Underst., 2015, 135, (C), pp. 16–30.
15. 15)
  - 17. Wang, H., Klaser, A., Schmid, C., et al: ‘Action recognition by dense trajectories’. IEEE Conf. on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 2011, pp. 3169–3176.
16. 16)
  - 45. Wang, J., Yang, J., Yu, K., et al: ‘Locality-constrained linear coding for image classification’. IEEE Int. Conf. on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2010, pp. 3360–3367.
17. 17)
  - 21. Savarese, S., DelPozo, A., Niebles, J. C., et al: ‘Spatial-temporal correlations for unsupervised action classification’. Proc. of the IEEE Workshop on Motion and Video Computing, Copper Mountain, CO, USA, 2008.
18. 18)
  - 48. Han, J., Wang, J., Li, C.: ‘Frequent closed sequence mining without candidate maintenance’, IEEE Trans. Knowl. Data Eng., 2007, 19, (8), pp. 1042–1056.
19. 19)
  - 22. Khokher, M. R., Bouzerdoum, A., Phung, S. L.: ‘Human interaction recognition using low-rank matrix approximation and super descriptor tensor decomposition’. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 2017, pp. 1847–1851.
20. 20)
  - 49. Ryoo, M.S.: ‘Human activity prediction: early recognition of ongoing activities from streaming videos’. Proc. of the 2011 Int. Conf. on Computer Vision, Barcelona, Spain, 2011, pp. 1036–1043.
21. 21)
  - 29. Vahdat, A., Gao, B., Ranjbar, M., et al: ‘A discriminative key pose sequence model for recognizing human interactions’. IEEE Int. Conf. on Computer Vision Workshops, Barcelona, Spain, 2011, pp. 1729–1736.
22. 22)
  - 11. Weinland, D., Ronfard, R., Boyer, E.: ‘A survey of vision-based methods for action representation, segmentation and recognition’, Comput. Vis. Image Underst., 2011, 115, (2), pp. 224–241.
23. 23)
  - 24. Kong, Y., Fu, Y.: ‘Max-margin action prediction machine’, IEEE Trans. Pattern Anal. Mach. Intell., 2016, 38, (9), pp. 1844–1858.
24. 24)
  - 7. Ryoo, M.S., Aggarwal, J.K.: ‘UT-Interaction dataset, ICPR’. Available at http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html.
25. 25)
  - 39. Feichtenhofer, C., Pinz, A., Zisserman, A.: ‘Convolutional two-stream network fusion for video action recognition’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Zurich, Switzerland, 2016.
26. 26)
  - 43. Dalal, N., Triggs, B., Schmid, C.: ‘Human detection using oriented histograms of flow and appearance’. Proc. of the 9th European Conf. on Computer Vision, Graz, Austria, 2006, pp. 428–441.
27. 27)
  - 42. Brox, T., Malik, J.: ‘Object segmentation by long term analysis of point trajectories’. European Conf. on Computer Vision, Crete, Greece, 2010, pp. 282–295.
28. 28)
  - 35. Raptis, M., Sigal, L.: ‘Poselet key-framing: A model for human activity recognition’. IEEE Conf. on Computer Vision and Pattern Recognition, Oregon, PO, USA, 2013, pp. 2650–2657.
29. 29)
  - 2. Laptev, I.: ‘On space-time interest points’, Int. J. Comput. Vision, 2005, 64, (2), pp. 107–123.
30. 30)
  - 14. Marín-Jiménez, M., Yeguas, E., Nicolás, P.: ‘Exploring stip-based models for recognizing human interactions in tv videos’, Pattern Recognit. Lett., 2013, 34, (15), pp. 1819–1828.
31. 31)
  - 19. Ryoo, M.S., Aggarwal, J.K.: ‘Spatio-temporal relationship match: video structure comparison for recognition of complex human activities’. IEEE 12th Int. Conf. on Computer Vision, Kyoto, Japan, 2009.
32. 32)
  - 34. Liu, L., Shao, L., Rockett, P.: ‘Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition’, Pattern Recognit., 2013, 46, (7), pp. 1810–1818.
33. 33)
  - 9. Aggarwal, J.K., Ryoo, M.S.: ‘Human activity analysis: a review’, ACM Comput. Surv., 2011, 43, (3), pp. 1–43.
34. 34)
  - 6. Choi, J., Wang, Z., Lee, S., et al: ‘A spatio-temporal pyramid matching for video retrieval’, Comput. Vis. Image Underst., 2013, 117, (6), pp. 660–669.
35. 35)
  - 32. Rodriguez, M. D., Ahmed, J., Shah, M.: ‘Action mach: a spatio-temporal maximum average correlation height filter for action recognition’. Proc. of IEEE Int. Conf. on Computer and Pattern Recognition, Anchorage, AK, USA, 2008.
36. 36)
  - 12. Marszałek, M., Laptev, I., Schmid, C.: ‘Actions in context’. IEEE Int. Conf. on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009.
37. 37)
  - 27. Li, B., Ayazoglu, M., Mao, T., et al: ‘Activity recognition using dynamic subspace angles’. IEEE Conf. on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 2011, pp. 3193–3200.
38. 38)
  - 8. Patron, A., Marszalek, M., Zisserman, A., et al: ‘High five: recognising human interactions in tv shows’. Proc. of the British Machine Vision Conf., Aberystwyth, England, 2010.
39. 39)
  - 26. Burghouts, G.J., Schutte, K.: ‘Spatio-temporal layout of human actions for improved bag-of-words action detection’, Pattern Recognit. Lett., 2013, 34, (15), pp. 1861–1869.
40. 40)
  - 4. Wang, L., Qiao, Y., Tang, X.: ‘Mining motion atoms and phrases for complex action recognition’. IEEE Int. Conf. on Computer Vision, Sydney, Australia, 2013, pp. 2680–2687.
41. 41)
  - 46. Agrawal, R., Srikant, R.: ‘Mining sequential patterns’. Proc. of the Eleventh Int. Conf. on Data Engineering, Taipei, Taiwan, 1995, pp. 3–14.
42. 42)
  - 47. Hirate, Y., Yamana, H.: ‘Generalized sequential pattern mining with item intervals’, J. Comput., 2006, 1, (3), pp. 51–60.
43. 43)
  - 10. Vrigkas, M., Nikou, C., Kakadiaris, I.: ‘A review of human activity recognition methods’, Front. Robot. AI, 2015, 2, (28), pp. 1–28.
44. 44)
  - 18. Zhang, Y., Liu, X., Chang, M., et al: ‘Spatio-temporal phrases for activity recognition’. 12th European Conf. on Computer Vision, Florence, Italy, 2012, pp. 702–721.
45. 45)
  - 23. Zhang, B., Rota, P., Conci, N., et al: ‘Human interaction recognition in the wild: analyzing trajectory clustering from multiple-instance-learning perspective’. IEEE Int. Conf. on Multimedia and Expo, Turin, Italy, 2015, pp. 1–6.
46. 46)
  - 5. Wang, H., Schmid, C.: ‘Action recognition with improved trajectories’. IEEE Int. Conf. on Computer Vision, Sydney, Australia, 2013, pp. 3551–3558.
47. 47)
  - 1. Laptev, I., Marszaek, M., Schmid, C., et al: ‘Learning realistic human actions from movies’. IEEE Int. Conf. on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008, pp. 1–8.
48. 48)
  - 13. Wang, H., Ullah, M. M., Kläser, A., et al: ‘Evaluation of local spatio-temporal features for action recognition’. British Machine Vision Conf., London, England, 2009.
49. 49)
  - 30. Ma, S., Sigal, L., Sclaroff, S.: ‘Space-time tree ensemble for action recognition’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Histograms of sequences: a novel representation for human interaction recognition

References

Related content