http://iet.metastore.ingenta.com
1887

Histograms of sequences: a novel representation for human interaction recognition

Histograms of sequences: a novel representation for human interaction recognition

For access to this article, please select a purchase option:

Buy eFirst article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
— Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

This study presents a novel representation based on hierarchical histogram of local feature sequences for human interaction recognition. The authors’ method basically combines the power of discriminative sequence mining and histogram representation for the effective recognition of human interactions. Our framework involves extracting visual features from the videos first, and then mining sequences of the visual features that occur consequently in space and time. After the mining step, we represent each video with a histogram pyramid of such sequences. We also propose to use soft clustering in the visual word construction step, such that more information-rich histograms can be obtained. The authors’ experimental results on challenging human interaction recognition data sets indicate that the proposed algorithm performs on par with the state-of-the-art methods.

References

    1. 1)
      • I. Laptev , M. Marszaek , C. Schmid .
        1. Laptev, I., Marszaek, M., Schmid, C., et al: ‘Learning realistic human actions from movies’. IEEE Int. Conf. on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008, pp. 18.
        . IEEE Int. Conf. on Computer Vision and Pattern Recognition , 1 - 8
    2. 2)
      • I. Laptev .
        2. Laptev, I.: ‘On space-time interest points’, Int. J. Comput. Vision, 2005, 64, (2), pp. 107123.
        . Int. J. Comput. Vision , 2 , 107 - 123
    3. 3)
      • J. C. Niebles , C. Chen , L. Fei-Fei .
        3. Niebles, J. C., Chen, C., Fei-Fei, L.: ‘Modeling temporal structure of decomposable motion segments for activity classification’. European Conf. on Computer Vision, Crete, Greece, 2010.
        . European Conf. on Computer Vision
    4. 4)
      • L. Wang , Y. Qiao , X. Tang .
        4. Wang, L., Qiao, Y., Tang, X.: ‘Mining motion atoms and phrases for complex action recognition’. IEEE Int. Conf. on Computer Vision, Sydney, Australia, 2013, pp. 26802687.
        . IEEE Int. Conf. on Computer Vision , 2680 - 2687
    5. 5)
      • H. Wang , C. Schmid .
        5. Wang, H., Schmid, C.: ‘Action recognition with improved trajectories’. IEEE Int. Conf. on Computer Vision, Sydney, Australia, 2013, pp. 35513558.
        . IEEE Int. Conf. on Computer Vision , 3551 - 3558
    6. 6)
      • J. Choi , Z. Wang , S. Lee .
        6. Choi, J., Wang, Z., Lee, S., et al: ‘A spatio-temporal pyramid matching for video retrieval’, Comput. Vis. Image Underst., 2013, 117, (6), pp. 660669.
        . Comput. Vis. Image Underst. , 6 , 660 - 669
    7. 7)
      • M.S. Ryoo , J.K. Aggarwal .
        7. Ryoo, M.S., Aggarwal, J.K.: ‘UT-Interaction dataset, ICPR’. Available at http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html.
        .
    8. 8)
      • A. Patron , M. Marszalek , A. Zisserman .
        8. Patron, A., Marszalek, M., Zisserman, A., et al: ‘High five: recognising human interactions in tv shows’. Proc. of the British Machine Vision Conf., Aberystwyth, England, 2010.
        . Proc. of the British Machine Vision Conf.
    9. 9)
      • J.K. Aggarwal , M.S. Ryoo .
        9. Aggarwal, J.K., Ryoo, M.S.: ‘Human activity analysis: a review’, ACM Comput. Surv., 2011, 43, (3), pp. 143.
        . ACM Comput. Surv. , 3 , 1 - 43
    10. 10)
      • M. Vrigkas , C. Nikou , I. Kakadiaris .
        10. Vrigkas, M., Nikou, C., Kakadiaris, I.: ‘A review of human activity recognition methods’, Front. Robot. AI, 2015, 2, (28), pp. 128.
        . Front. Robot. AI , 28 , 1 - 28
    11. 11)
      • D. Weinland , R. Ronfard , E. Boyer .
        11. Weinland, D., Ronfard, R., Boyer, E.: ‘A survey of vision-based methods for action representation, segmentation and recognition’, Comput. Vis. Image Underst., 2011, 115, (2), pp. 224241.
        . Comput. Vis. Image Underst. , 2 , 224 - 241
    12. 12)
      • M. Marszałek , I. Laptev , C. Schmid .
        12. Marszałek, M., Laptev, I., Schmid, C.: ‘Actions in context’. IEEE Int. Conf. on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009.
        . IEEE Int. Conf. on Computer Vision and Pattern Recognition
    13. 13)
      • H. Wang , M. M. Ullah , A. Kläser .
        13. Wang, H., Ullah, M. M., Kläser, A., et al: ‘Evaluation of local spatio-temporal features for action recognition’. British Machine Vision Conf., London, England, 2009.
        . British Machine Vision Conf.
    14. 14)
      • M. Marín-Jiménez , E. Yeguas , P. Nicolás .
        14. Marín-Jiménez, M., Yeguas, E., Nicolás, P.: ‘Exploring stip-based models for recognizing human interactions in tv videos’, Pattern Recognit. Lett., 2013, 34, (15), pp. 18191828.
        . Pattern Recognit. Lett. , 15 , 1819 - 1828
    15. 15)
      • A. Patron-Perez , M. Marszalek , I. Reid .
        15. Patron-Perez, A., Marszalek, M., Reid, I., et al: ‘Structured learning of human interactions in tv shows’, IEEE Pattern Anal. Mach. Intell., 2012, 34, (12), pp. 24412453.
        . IEEE Pattern Anal. Mach. Intell. , 12 , 2441 - 2453
    16. 16)
      • M. Hoai , A. Zisserman .
        16. Hoai, M., Zisserman, A.: ‘Talking heads: detecting humans and recognizing their interactions’. IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014.
        . IEEE Conf. on Computer Vision and Pattern Recognition
    17. 17)
      • H. Wang , A. Klaser , C. Schmid .
        17. Wang, H., Klaser, A., Schmid, C., et al: ‘Action recognition by dense trajectories’. IEEE Conf. on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 2011, pp. 31693176.
        . IEEE Conf. on Computer Vision and Pattern Recognition , 3169 - 3176
    18. 18)
      • Y. Zhang , X. Liu , M. Chang .
        18. Zhang, Y., Liu, X., Chang, M., et al: ‘Spatio-temporal phrases for activity recognition’. 12th European Conf. on Computer Vision, Florence, Italy, 2012, pp. 702721.
        . 12th European Conf. on Computer Vision , 702 - 721
    19. 19)
      • M.S. Ryoo , J.K. Aggarwal .
        19. Ryoo, M.S., Aggarwal, J.K.: ‘Spatio-temporal relationship match: video structure comparison for recognition of complex human activities’. IEEE 12th Int. Conf. on Computer Vision, Kyoto, Japan, 2009.
        . IEEE 12th Int. Conf. on Computer Vision
    20. 20)
      • K. N. e. H. Slimani , Y. Benezeth , F. Souami .
        20. Slimani, K. N. e. H., Benezeth, Y., Souami, F.: ‘Human interaction recognition based on the co-occurrence of visual words’. IEEE Conf. on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 2014, pp. 461466.
        . IEEE Conf. on Computer Vision and Pattern Recognition Workshops , 461 - 466
    21. 21)
      • S. Savarese , A. DelPozo , J. C. Niebles .
        21. Savarese, S., DelPozo, A., Niebles, J. C., et al: ‘Spatial-temporal correlations for unsupervised action classification’. Proc. of the IEEE Workshop on Motion and Video Computing, Copper Mountain, CO, USA, 2008.
        . Proc. of the IEEE Workshop on Motion and Video Computing
    22. 22)
      • M. R. Khokher , A. Bouzerdoum , S. L. Phung .
        22. Khokher, M. R., Bouzerdoum, A., Phung, S. L.: ‘Human interaction recognition using low-rank matrix approximation and super descriptor tensor decomposition’. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 2017, pp. 18471851.
        . IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP) , 1847 - 1851
    23. 23)
      • B. Zhang , P. Rota , N. Conci .
        23. Zhang, B., Rota, P., Conci, N., et al: ‘Human interaction recognition in the wild: analyzing trajectory clustering from multiple-instance-learning perspective’. IEEE Int. Conf. on Multimedia and Expo, Turin, Italy, 2015, pp. 16.
        . IEEE Int. Conf. on Multimedia and Expo , 1 - 6
    24. 24)
      • Y. Kong , Y. Fu .
        24. Kong, Y., Fu, Y.: ‘Max-margin action prediction machine’, IEEE Trans. Pattern Anal. Mach. Intell., 2016, 38, (9), pp. 18441858.
        . IEEE Trans. Pattern Anal. Mach. Intell. , 9 , 1844 - 1858
    25. 25)
      • A. Gaidon , Z. Harchaoui , C. Schmid .
        25. Gaidon, A., Harchaoui, Z., Schmid, C.: ‘Activity representation with motion hierarchies’, Int. J. Comput. Vision, 2014, 107, (3), pp. 219238.
        . Int. J. Comput. Vision , 3 , 219 - 238
    26. 26)
      • G.J. Burghouts , K. Schutte .
        26. Burghouts, G.J., Schutte, K.: ‘Spatio-temporal layout of human actions for improved bag-of-words action detection’, Pattern Recognit. Lett., 2013, 34, (15), pp. 18611869.
        . Pattern Recognit. Lett. , 15 , 1861 - 1869
    27. 27)
      • B. Li , M. Ayazoglu , T. Mao .
        27. Li, B., Ayazoglu, M., Mao, T., et al: ‘Activity recognition using dynamic subspace angles’. IEEE Conf. on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 2011, pp. 31933200.
        . IEEE Conf. on Computer Vision and Pattern Recognition , 3193 - 3200
    28. 28)
      • G. Yu , J. Yuan , Z. Liu .
        28. Yu, G., Yuan, J., Liu, Z.: ‘Propagative hough voting for human activity recognition’. Proc. of the 12th European Conf. on Computer Vision, Florence, Italy, 2012, pp. 393706.
        . Proc. of the 12th European Conf. on Computer Vision , 393 - 706
    29. 29)
      • A. Vahdat , B. Gao , M. Ranjbar .
        29. Vahdat, A., Gao, B., Ranjbar, M., et al: ‘A discriminative key pose sequence model for recognizing human interactions’. IEEE Int. Conf. on Computer Vision Workshops, Barcelona, Spain, 2011, pp. 17291736.
        . IEEE Int. Conf. on Computer Vision Workshops , 1729 - 1736
    30. 30)
      • S. Ma , L. Sigal , S. Sclaroff .
        30. Ma, S., Sigal, L., Sclaroff, S.: ‘Space-time tree ensemble for action recognition’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015.
        . IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)
    31. 31)
      • M. Shugao , J. Zhang , N. Ikizler-Cinbis .
        31. Shugao, M., Zhang, J., Ikizler-Cinbis, N., et al: ‘Action recognition and localization by hierarchical space-time segments’. IEEE Int. Conf. on Computer Vision, Sydney, Australia, 2013, pp. 27442751.
        . IEEE Int. Conf. on Computer Vision , 2744 - 2751
    32. 32)
      • M. D. Rodriguez , J. Ahmed , M. Shah .
        32. Rodriguez, M. D., Ahmed, J., Shah, M.: ‘Action mach: a spatio-temporal maximum average correlation height filter for action recognition’. Proc. of IEEE Int. Conf. on Computer and Pattern Recognition, Anchorage, AK, USA, 2008.
        . Proc. of IEEE Int. Conf. on Computer and Pattern Recognition
    33. 33)
      • J. Liang , C. Xu , Z. Feng .
        33. Liang, J., Xu, C., Feng, Z., et al: ‘Affective interaction recognition using spatio-temporal features and context’, Comput. Vis. Image Underst., 2016, 144, pp. 55165.
        . Comput. Vis. Image Underst. , 55 - 165
    34. 34)
      • L. Liu , L. Shao , P. Rockett .
        34. Liu, L., Shao, L., Rockett, P.: ‘Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition’, Pattern Recognit., 2013, 46, (7), pp. 18101818.
        . Pattern Recognit. , 7 , 1810 - 1818
    35. 35)
      • M. Raptis , L. Sigal .
        35. Raptis, M., Sigal, L.: ‘Poselet key-framing: A model for human activity recognition’. IEEE Conf. on Computer Vision and Pattern Recognition, Oregon, PO, USA, 2013, pp. 26502657.
        . IEEE Conf. on Computer Vision and Pattern Recognition , 2650 - 2657
    36. 36)
      • Y. S. Sefidgar , A. Vahdat , S. Se .
        36. Sefidgar, Y. S., Vahdat, A., Se, S., et al: ‘Discriminative key-component models for interaction detection and recognition’, Comput. Vis. Image Underst., 2015, 135, (C), pp. 1630.
        . Comput. Vis. Image Underst. , 16 - 30
    37. 37)
      • D. Tran , L. D. Bourdey , R. Fergus .
        37. Tran, D., Bourdey, L. D., Fergus, R., et al: ‘C3d: generic features for video analysis’. IEEE Int. Conf. on Computer Vision, Santiago, Chile, 2015.
        . IEEE Int. Conf. on Computer Vision
    38. 38)
      • A. Karpathy , G. Toderici , S. Shetty .
        38. Karpathy, A., Toderici, G., Shetty, S., et al: ‘Large-scale video classification with convolutional neural networks’. IEEE Conf. on Computer Vision and Pattern Recognition, Washington, USA, 2014.
        . IEEE Conf. on Computer Vision and Pattern Recognition
    39. 39)
      • C. Feichtenhofer , A. Pinz , A. Zisserman .
        39. Feichtenhofer, C., Pinz, A., Zisserman, A.: ‘Convolutional two-stream network fusion for video action recognition’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Zurich, Switzerland, 2016.
        . The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)
    40. 40)
      • Q. Li , Z. Qiu , T. Yao .
        40. Li, Q., Qiu, Z., Yao, T., et al: ‘Action recognition by learning deep multi-granular spatio-temporal video representation’. Proc. of the 2016 ACM on Int. Conf. on Multimedia Retrieval, New York, NY, USA, 2016, pp. 159166.
        . Proc. of the 2016 ACM on Int. Conf. on Multimedia Retrieval , 159 - 166
    41. 41)
      • K. Simonyan , A. Zisserman .
        41. Simonyan, K., Zisserman, A.: ‘Two-stream convolutional networks for action recognition in videos’. Proc. of the 27th Int. Conf. on Neural Information Processing Systems, Montreal, Canada, 2014, pp. 568576.
        . Proc. of the 27th Int. Conf. on Neural Information Processing Systems , 568 - 576
    42. 42)
      • T. Brox , J. Malik .
        42. Brox, T., Malik, J.: ‘Object segmentation by long term analysis of point trajectories’. European Conf. on Computer Vision, Crete, Greece, 2010, pp. 282295.
        . European Conf. on Computer Vision , 282 - 295
    43. 43)
      • N. Dalal , B. Triggs , C. Schmid .
        43. Dalal, N., Triggs, B., Schmid, C.: ‘Human detection using oriented histograms of flow and appearance’. Proc. of the 9th European Conf. on Computer Vision, Graz, Austria, 2006, pp. 428441.
        . Proc. of the 9th European Conf. on Computer Vision , 428 - 441
    44. 44)
      • H. Lee , A. Battle , R. Raina .
        44. Lee, H., Battle, A., Raina, R., et al: ‘Efficient sparse coding algorithms’. Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 2006, pp. 801808.
        . Advances in Neural Information Processing Systems , 801 - 808
    45. 45)
      • J. Wang , J. Yang , K. Yu .
        45. Wang, J., Yang, J., Yu, K., et al: ‘Locality-constrained linear coding for image classification’. IEEE Int. Conf. on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2010, pp. 33603367.
        . IEEE Int. Conf. on Computer Vision and Pattern Recognition , 3360 - 3367
    46. 46)
      • R. Agrawal , R. Srikant .
        46. Agrawal, R., Srikant, R.: ‘Mining sequential patterns’. Proc. of the Eleventh Int. Conf. on Data Engineering, Taipei, Taiwan, 1995, pp. 314.
        . Proc. of the Eleventh Int. Conf. on Data Engineering , 3 - 14
    47. 47)
      • Y. Hirate , H. Yamana .
        47. Hirate, Y., Yamana, H.: ‘Generalized sequential pattern mining with item intervals’, J. Comput., 2006, 1, (3), pp. 5160.
        . J. Comput. , 3 , 51 - 60
    48. 48)
      • J. Han , J. Wang , C. Li .
        48. Han, J., Wang, J., Li, C.: ‘Frequent closed sequence mining without candidate maintenance’, IEEE Trans. Knowl. Data Eng., 2007, 19, (8), pp. 10421056.
        . IEEE Trans. Knowl. Data Eng. , 8 , 1042 - 1056
    49. 49)
      • M.S. Ryoo .
        49. Ryoo, M.S.: ‘Human activity prediction: early recognition of ongoing activities from streaming videos’. Proc. of the 2011 Int. Conf. on Computer Vision, Barcelona, Spain, 2011, pp. 10361043.
        . Proc. of the 2011 Int. Conf. on Computer Vision , 1036 - 1043
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2017.0471
Loading

Related content

content/journals/10.1049/iet-cvi.2017.0471
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address