http://iet.metastore.ingenta.com
1887

Human interaction recognition fusing multiple features of depth sequences

Human interaction recognition fusing multiple features of depth sequences

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Computer Vision — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Human interaction recognition has played a major role in building intelligent video surveillance systems. Recently, depth data captured by the emerging RGB-D sensors began to show its importability in human interaction recognition. This study proposes a novel framework for human interaction recognition using depth information including an algorithm to reconstruct depth sequence with as few key frames as possible. The proposed framework includes two essential modules. First, key frames extraction by sparse constraint, then the fusion multi-feature, is constructed by using two types of available features and Max-pooling, respectively. Finally, multiple features are directly sent to the SVM for the recognition of the human activity. This study explores the static and dynamic feature fusion method to improve the recognition performance with contextual relevance of continuous frames. A weight is used to fuse shape and optical flow features, which not only enhance the description capability of human behavioural characteristics in the spatiotemporal domain, but also effectively reduces the adverse impact of certain distortion point of interest for target recognition. Experimental results show that the proposed approach yields considerable performance improvement over the state-of-the-art approaches with respect to accuracy on a public action dataset.

References

    1. 1)
      • D. Weinland , R. Ronfard , E. Boyer .
        1. Weinland, D., Ronfard, R., Boyer, E.: ‘A survey of vision-based methods for action representation, segmentation and recognition’, Comput. Vis. Image Underst., 2011, 2, (115), pp. 410.
        . Comput. Vis. Image Underst. , 115 , 4 - 10
    2. 2)
      • Y. Cao , D. Barrett .
        2. Cao, Y., Barrett, D.: ‘Recognizing human activities from partially observed videos’. Proc. Int. Conf. Computer Vision and Pattern Recognition, Portland, USA, 2011, pp. 26582665.
        . Proc. Int. Conf. Computer Vision and Pattern Recognition , 2658 - 2665
    3. 3)
      • V. Shuiwang Ji , W. Xu , M. Yang .
        3. Shuiwang Ji, V., Xu, W., Yang, M., et al: ‘3D convolutional neural networks for human action recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2013, 35, (1), pp. 221231.
        . IEEE Trans. Pattern Anal. Mach. Intell. , 1 , 221 - 231
    4. 4)
      • A. Karpathy , G. Toderici , S. Shetty .
        4. Karpathy, A., Toderici, G., Shetty, S., et al: ‘Large-scale video classification with convolutional neural networks’. Proc. Int. Conf. Computer Vision and Pattern Recognition, 2014, pp. 17251732.
        . Proc. Int. Conf. Computer Vision and Pattern Recognition , 1725 - 1732
    5. 5)
      • S. Ji , W. Xu , M. Yang .
        5. Ji, S., Xu, W., Yang, M., et al: ‘3D convolutional neural networks for human action recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2013, 35, (1), pp. 221231.
        . IEEE Trans. Pattern Anal. Mach. Intell. , 1 , 221 - 231
    6. 6)
      • L. Meng , L. Qing .
        6. Meng, L., Qing, L.: ‘Activity recognition based on semantic spatial relation’. Proc. Int. Conf. Pattern Recognition, Tsukuba, Japan, 2012, pp. 609612.
        . Proc. Int. Conf. Pattern Recognition , 609 - 612
    7. 7)
      • A. Patron-Perez , M. Marszalek , I Reid .
        7. Patron-Perez, A., Marszalek, M., Reid, I, et al: ‘Structured learning of human interactions in TV shows’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 12, (34), pp. 24412453.
        . IEEE Trans. Pattern Anal. Mach. Intell. , 34 , 2441 - 2453
    8. 8)
      • M. Raptis , L. Sigal .
        8. Raptis, M., Sigal, L.: ‘Poselet key-framing: a model for human activity recognition’. Proc. Int. Conf. Computer Vision and Pattern Recognition, Portland, USA, 2013, pp. 26502657.
        . Proc. Int. Conf. Computer Vision and Pattern Recognition , 2650 - 2657
    9. 9)
      • A. Vahdat , B. Gao , M. Ranjbar .
        9. Vahdat, A., Gao, B., Ranjbar, M., et al: ‘A discriminative key pose sequence model for recognizing human interactions’. Proc. Int. Conf. Computer Vision, Barcelona, Spain, 2011, pp. 17291736.
        . Proc. Int. Conf. Computer Vision , 1729 - 1736
    10. 10)
      • K. Slimani , Y Benezeth , F. Souami .
        10. Slimani, K., Benezeth, Y, Souami, F.: ‘Human interaction recognition based on the co-occurrence of visual words’. Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 461466.
        . Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition , 461 - 466
    11. 11)
      • Y. Kong , Y. Jia , Y. Fu .
        11. Kong, Y., Jia, Y., Fu, Y.: ‘Interactive phrases: semantic descriptions for human interaction recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2014, 9, (36), pp. 17751788.
        . IEEE Trans. Pattern Anal. Mach. Intell. , 36 , 1775 - 1788
    12. 12)
      • T. Yu , T. Kim , R. Cipolla .
        12. Yu, T., Kim, T., Cipolla, R.: ‘Real-time action recognition by spatio-temporal semantic and structural forests’. Proc. Int. Conf. British Machine Vision Conf., Aberystwyth, UK, 2010, pp. 112.
        . Proc. Int. Conf. British Machine Vision Conf. , 1 - 12
    13. 13)
      • G.J. Burghouts , K. Schutte .
        13. Burghouts, G.J., Schutte, K.: ‘Spatio-temporal layout of human actions for improved bag-of-words action detection’, Pattern Recognit. Lett., 2013, 34, pp. 18611869.
        . Pattern Recognit. Lett. , 1861 - 1869
    14. 14)
      • X. Peng , Q. Peng , Y. Qiao .
        14. Peng, X., Peng, Q., Qiao, Y.: ‘Exploring dense trajectory feature and encoding methods for human interaction recognition’. Proc. Int. Conf. Multimedia Computing and Service, Huangshan, China, 2013, pp. 2327.
        . Proc. Int. Conf. Multimedia Computing and Service , 23 - 27
    15. 15)
      • N. Li , X. Cheng , H. Guo .
        15. Li, N., Cheng, X., Guo, H., et al: ‘Recognition human interactions by genetic algorithm-based random forest spatio-temporal correlation’, Pattern Anal. Appl., 2016, 19, pp. 267282.
        . Pattern Anal. Appl. , 267 - 282
    16. 16)
      • J. Aqqarwal , L. Xia .
        16. Aqqarwal, J., Xia, L.: ‘human activity recognition from 3D data: a review’, Pattern Recognit. Lett., 2014, 48, pp. 7080.
        . Pattern Recognit. Lett. , 70 - 80
    17. 17)
      • J. Wang , Z. Liu , Y. Wu .
        17. Wang, J., Liu, Z., Wu, Y., et al: ‘Mining actionlet ensemble for action recognition with depth cameras’. Proc. Int. Conf. Computer Vision Pattern Recognition, 2012, vol. 36, no. 5, pp. 12901297.
        . Proc. Int. Conf. Computer Vision Pattern Recognition , 5 , 1290 - 1297
    18. 18)
      • A.W. Vieira , E.R. Nascimento , G.L. Oliveira .
        18. Vieira, A.W., Nascimento, E.R., Oliveira, G.L., et al: ‘STOP: space–time occupancy patterns for 3D action recognition from depth map sequences’, Prog. Pattern Recognit. Image Anal. Comput. Vis. Appl., 2012, pp. 252259.
        . Prog. Pattern Recognit. Image Anal. Comput. Vis. Appl. , 252 - 259
    19. 19)
      • X. Yang , Y. L. Tian .
        19. Yang, X., Tian, Y. L.: ‘Super normal vector for activity recognition using depth sequence’. Proc. Int. Conf. Computer Vision and Pattern Recognition, 2014, pp. 804811.
        . Proc. Int. Conf. Computer Vision and Pattern Recognition , 804 - 811
    20. 20)
      • Z. Yang , L. Zicheng , C. Hong .
        20. Yang, Z., Zicheng, L., Hong, C.: ‘RGB-depth feature for 3D human activity recognition’, Commun. China, 2013, 10, (7), pp. 93103.
        . Commun. China , 7 , 93 - 103
    21. 21)
      • L. Liu , L. Shao .
        21. Liu, L., Shao, L.: ‘Learning discriminative representations from RGB-D video data’. Proc. Int. Conf. Artificial Intelligence, 2013, pp. 14931500.
        . Proc. Int. Conf. Artificial Intelligence , 1493 - 1500
    22. 22)
      • Y. Cao , D. Barrett , A. Barbu .
        22. Cao, Y., Barrett, D., Barbu, A., et al: ‘Recognize human activities from partially observed videos’. Proc. Int. Conf. Computer Vision and Pattern Recognition, 2013, pp. 26582665.
        . Proc. Int. Conf. Computer Vision and Pattern Recognition , 2658 - 2665
    23. 23)
      • M. Meng , H. Drira , M. Daoudi .
        23. Meng, M., Drira, H., Daoudi, M., et al: ‘Human object interaction recognition using rate-invariant shape analysis of inter joint distances trajectories’. Proc. Int. Conf. Computer Vision and Pattern Recognition (CVPR) Workshops, 2016, pp. 3742.
        . Proc. Int. Conf. Computer Vision and Pattern Recognition (CVPR) Workshops , 37 - 42
    24. 24)
      • G. Yu , Z. Liu , J. Yuan .
        24. Yu, G., Liu, Z., Yuan, J.: ‘Discriminative orderlet mining for real-time recognition of human–object interaction’, Comput. Vis., 2015, 9007, pp. 5065.
        . Comput. Vis. , 50 - 65
    25. 25)
      • N.M. Oliver , B. Rosario , A.P. Pentland .
        25. Oliver, N.M., Rosario, B., Pentland, A.P.: ‘A Bayesian computer vision system for modeling human interactions’, IEEE Trans. Pattern Anal. Mach. Intell., 2000, 22, (8), pp. 831843.
        . IEEE Trans. Pattern Anal. Mach. Intell. , 8 , 831 - 843
    26. 26)
      • P. Scovanner , S. Ali , M. Shah .
        26. Scovanner, P., Ali, S., Shah, M.: ‘A 3-dimensional sift descriptor and its application to action recognition’. Proc. Int. Conf. Multimedia, 2007, pp. 357360.
        . Proc. Int. Conf. Multimedia , 357 - 360
    27. 27)
      • M. Ryoo , J. Aggarwal .
        27. Ryoo, M., Aggarwal, J.: ‘Spatio-temporal relationship match: video structure comparison for recognition of complex human activities’. Proc. Int. Conf. Computer Vision, 2009, vol. 30, no. 2, pp. 15931600.
        . Proc. Int. Conf. Computer Vision , 2 , 1593 - 1600
    28. 28)
      • K. Yun , J. Honorio , D. Chattopadhyay .
        28. Yun, K., Honorio, J., Chattopadhyay, D., et al: ‘Two-person interaction detection using body-pose features and multiple instance learning’. Proc. Int. Conf. Computer Vision and Pattern Recognition Workshops (CVPRW), 2012, pp. 2835.
        . Proc. Int. Conf. Computer Vision and Pattern Recognition Workshops (CVPRW) , 28 - 35
    29. 29)
      • X. Yang , C. Zhang , Y. Tian .
        29. Yang, X., Zhang, C., Tian, Y.: ‘Recognizing actions using depth motion maps based histograms of oriented gradients’. Proc. Int. Conf. Multimedia, 2012, pp. 10571060.
        . Proc. Int. Conf. Multimedia , 1057 - 1060
    30. 30)
      • Y. Nie , P. Zhang , B. Hu .
        30. Nie, Y., Zhang, P., Hu, B., et al: ‘Key frames hysteresis-seeking based on motion change points for RGB-D video’. IEEE Global Conf. Signal & Information Processing, 2016, pp. 11951199.
        . IEEE Global Conf. Signal & Information Processing , 1195 - 1199
    31. 31)
      • N. Azouji , Z. Azimifar .
        31. Azouji, N., Azimifar, Z.: ‘A new approach to speed up in action recognition based on keyframe extraction’, Eighth Iranian Conf. Machine Vision and Image Processing, 2013, pp. 219222.
        . Eighth Iranian Conf. Machine Vision and Image Processing , 219 - 222
    32. 32)
      • Y. Wang , Y. Shi .
        32. Wang, Y., Shi, Y.: ‘Human activities segmentation and location of keyframes based on 3D skeleton’. 33rd Chinese Control Conf., 2014, pp. 47864790.
        . 33rd Chinese Control Conf. , 4786 - 4790
    33. 33)
      • G. Guan , Z. Wang , S. Lu .
        33. Guan, G., Wang, Z., Lu, S., et al: ‘Keypoint-based keyframe selection’, IEEE Trans. Circuits Syst. Video Technol., 2012, 23, (4), pp. 729734.
        . IEEE Trans. Circuits Syst. Video Technol. , 4 , 729 - 734
    34. 34)
      • Y. Cong , J. Yuan , J. Luo .
        34. Cong, Y., Yuan, J., Luo, J.: ‘Towards scalable summarization of consumer videos via sparse dictionary selection’, IEEE Trans. Multimed., 2012, 14, (1), pp. 6675.
        . IEEE Trans. Multimed. , 1 , 66 - 75
    35. 35)
      • S. Mei , G. Guan , Z. Wang .
        35. Mei, S., Guan, G., Wang, Z., et al: ‘Video summarization via minimum sparse reconstruction’, Pattern Recognit., 2015, 48, (2), pp. 522533.
        . Pattern Recognit. , 2 , 522 - 533
    36. 36)
      • K. Schindler , L. Van. Gool .
        36. Schindler, K., Van. Gool, L.: ‘Action snippets: How many frames does human action recognition require?’. Proc. Int. Conf. Computer Vision and Pattern Recognition, 2008, pp. 18.
        . Proc. Int. Conf. Computer Vision and Pattern Recognition , 1 - 8
    37. 37)
      • O Oreifej , Z. Liu .
        37. Oreifej, O, Liu, Z.: ‘HON4D: histogram of oriented 4D normals for activity recognition from depth sequences’, Comput. Vis. Pattern Recognit., 2013, 9, (4), pp. 716723.
        . Comput. Vis. Pattern Recognit. , 4 , 716 - 723
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2017.0025
Loading

Related content

content/journals/10.1049/iet-cvi.2017.0025
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address