Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Human interaction recognition fusing multiple features of depth sequences

Human interaction recognition has played a major role in building intelligent video surveillance systems. Recently, depth data captured by the emerging RGB-D sensors began to show its importability in human interaction recognition. This study proposes a novel framework for human interaction recognition using depth information including an algorithm to reconstruct depth sequence with as few key frames as possible. The proposed framework includes two essential modules. First, key frames extraction by sparse constraint, then the fusion multi-feature, is constructed by using two types of available features and Max-pooling, respectively. Finally, multiple features are directly sent to the SVM for the recognition of the human activity. This study explores the static and dynamic feature fusion method to improve the recognition performance with contextual relevance of continuous frames. A weight is used to fuse shape and optical flow features, which not only enhance the description capability of human behavioural characteristics in the spatiotemporal domain, but also effectively reduces the adverse impact of certain distortion point of interest for target recognition. Experimental results show that the proposed approach yields considerable performance improvement over the state-of-the-art approaches with respect to accuracy on a public action dataset.

References

    1. 1)
      • 2. Cao, Y., Barrett, D.: ‘Recognizing human activities from partially observed videos’. Proc. Int. Conf. Computer Vision and Pattern Recognition, Portland, USA, 2011, pp. 26582665.
    2. 2)
      • 21. Liu, L., Shao, L.: ‘Learning discriminative representations from RGB-D video data’. Proc. Int. Conf. Artificial Intelligence, 2013, pp. 14931500.
    3. 3)
      • 35. Mei, S., Guan, G., Wang, Z., et al: ‘Video summarization via minimum sparse reconstruction’, Pattern Recognit., 2015, 48, (2), pp. 522533.
    4. 4)
      • 8. Raptis, M., Sigal, L.: ‘Poselet key-framing: a model for human activity recognition’. Proc. Int. Conf. Computer Vision and Pattern Recognition, Portland, USA, 2013, pp. 26502657.
    5. 5)
      • 24. Yu, G., Liu, Z., Yuan, J.: ‘Discriminative orderlet mining for real-time recognition of human–object interaction’, Comput. Vis., 2015, 9007, pp. 5065.
    6. 6)
      • 10. Slimani, K., Benezeth, Y, Souami, F.: ‘Human interaction recognition based on the co-occurrence of visual words’. Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 461466.
    7. 7)
      • 13. Burghouts, G.J., Schutte, K.: ‘Spatio-temporal layout of human actions for improved bag-of-words action detection’, Pattern Recognit. Lett., 2013, 34, pp. 18611869.
    8. 8)
      • 27. Ryoo, M., Aggarwal, J.: ‘Spatio-temporal relationship match: video structure comparison for recognition of complex human activities’. Proc. Int. Conf. Computer Vision, 2009, vol. 30, no. 2, pp. 15931600.
    9. 9)
      • 36. Schindler, K., Van. Gool, L.: ‘Action snippets: How many frames does human action recognition require?’. Proc. Int. Conf. Computer Vision and Pattern Recognition, 2008, pp. 18.
    10. 10)
      • 14. Peng, X., Peng, Q., Qiao, Y.: ‘Exploring dense trajectory feature and encoding methods for human interaction recognition’. Proc. Int. Conf. Multimedia Computing and Service, Huangshan, China, 2013, pp. 2327.
    11. 11)
      • 37. Oreifej, O, Liu, Z.: ‘HON4D: histogram of oriented 4D normals for activity recognition from depth sequences’, Comput. Vis. Pattern Recognit., 2013, 9, (4), pp. 716723.
    12. 12)
      • 1. Weinland, D., Ronfard, R., Boyer, E.: ‘A survey of vision-based methods for action representation, segmentation and recognition’, Comput. Vis. Image Underst., 2011, 2, (115), pp. 410.
    13. 13)
      • 9. Vahdat, A., Gao, B., Ranjbar, M., et al: ‘A discriminative key pose sequence model for recognizing human interactions’. Proc. Int. Conf. Computer Vision, Barcelona, Spain, 2011, pp. 17291736.
    14. 14)
      • 12. Yu, T., Kim, T., Cipolla, R.: ‘Real-time action recognition by spatio-temporal semantic and structural forests’. Proc. Int. Conf. British Machine Vision Conf., Aberystwyth, UK, 2010, pp. 112.
    15. 15)
      • 33. Guan, G., Wang, Z., Lu, S., et al: ‘Keypoint-based keyframe selection’, IEEE Trans. Circuits Syst. Video Technol., 2012, 23, (4), pp. 729734.
    16. 16)
      • 26. Scovanner, P., Ali, S., Shah, M.: ‘A 3-dimensional sift descriptor and its application to action recognition’. Proc. Int. Conf. Multimedia, 2007, pp. 357360.
    17. 17)
      • 23. Meng, M., Drira, H., Daoudi, M., et al: ‘Human object interaction recognition using rate-invariant shape analysis of inter joint distances trajectories’. Proc. Int. Conf. Computer Vision and Pattern Recognition (CVPR) Workshops, 2016, pp. 3742.
    18. 18)
      • 29. Yang, X., Zhang, C., Tian, Y.: ‘Recognizing actions using depth motion maps based histograms of oriented gradients’. Proc. Int. Conf. Multimedia, 2012, pp. 10571060.
    19. 19)
      • 30. Nie, Y., Zhang, P., Hu, B., et al: ‘Key frames hysteresis-seeking based on motion change points for RGB-D video’. IEEE Global Conf. Signal & Information Processing, 2016, pp. 11951199.
    20. 20)
      • 32. Wang, Y., Shi, Y.: ‘Human activities segmentation and location of keyframes based on 3D skeleton’. 33rd Chinese Control Conf., 2014, pp. 47864790.
    21. 21)
      • 19. Yang, X., Tian, Y. L.: ‘Super normal vector for activity recognition using depth sequence’. Proc. Int. Conf. Computer Vision and Pattern Recognition, 2014, pp. 804811.
    22. 22)
      • 28. Yun, K., Honorio, J., Chattopadhyay, D., et al: ‘Two-person interaction detection using body-pose features and multiple instance learning’. Proc. Int. Conf. Computer Vision and Pattern Recognition Workshops (CVPRW), 2012, pp. 2835.
    23. 23)
      • 20. Yang, Z., Zicheng, L., Hong, C.: ‘RGB-depth feature for 3D human activity recognition’, Commun. China, 2013, 10, (7), pp. 93103.
    24. 24)
      • 7. Patron-Perez, A., Marszalek, M., Reid, I, et al: ‘Structured learning of human interactions in TV shows’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 12, (34), pp. 24412453.
    25. 25)
      • 18. Vieira, A.W., Nascimento, E.R., Oliveira, G.L., et al: ‘STOP: space–time occupancy patterns for 3D action recognition from depth map sequences’, Prog. Pattern Recognit. Image Anal. Comput. Vis. Appl., 2012, pp. 252259.
    26. 26)
      • 31. Azouji, N., Azimifar, Z.: ‘A new approach to speed up in action recognition based on keyframe extraction’, Eighth Iranian Conf. Machine Vision and Image Processing, 2013, pp. 219222.
    27. 27)
      • 15. Li, N., Cheng, X., Guo, H., et al: ‘Recognition human interactions by genetic algorithm-based random forest spatio-temporal correlation’, Pattern Anal. Appl., 2016, 19, pp. 267282.
    28. 28)
      • 22. Cao, Y., Barrett, D., Barbu, A., et al: ‘Recognize human activities from partially observed videos’. Proc. Int. Conf. Computer Vision and Pattern Recognition, 2013, pp. 26582665.
    29. 29)
      • 17. Wang, J., Liu, Z., Wu, Y., et al: ‘Mining actionlet ensemble for action recognition with depth cameras’. Proc. Int. Conf. Computer Vision Pattern Recognition, 2012, vol. 36, no. 5, pp. 12901297.
    30. 30)
      • 6. Meng, L., Qing, L.: ‘Activity recognition based on semantic spatial relation’. Proc. Int. Conf. Pattern Recognition, Tsukuba, Japan, 2012, pp. 609612.
    31. 31)
      • 11. Kong, Y., Jia, Y., Fu, Y.: ‘Interactive phrases: semantic descriptions for human interaction recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2014, 9, (36), pp. 17751788.
    32. 32)
      • 5. Ji, S., Xu, W., Yang, M., et al: ‘3D convolutional neural networks for human action recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2013, 35, (1), pp. 221231.
    33. 33)
      • 34. Cong, Y., Yuan, J., Luo, J.: ‘Towards scalable summarization of consumer videos via sparse dictionary selection’, IEEE Trans. Multimed., 2012, 14, (1), pp. 6675.
    34. 34)
      • 25. Oliver, N.M., Rosario, B., Pentland, A.P.: ‘A Bayesian computer vision system for modeling human interactions’, IEEE Trans. Pattern Anal. Mach. Intell., 2000, 22, (8), pp. 831843.
    35. 35)
      • 3. Shuiwang Ji, V., Xu, W., Yang, M., et al: ‘3D convolutional neural networks for human action recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2013, 35, (1), pp. 221231.
    36. 36)
      • 16. Aqqarwal, J., Xia, L.: ‘human activity recognition from 3D data: a review’, Pattern Recognit. Lett., 2014, 48, pp. 7080.
    37. 37)
      • 4. Karpathy, A., Toderici, G., Shetty, S., et al: ‘Large-scale video classification with convolutional neural networks’. Proc. Int. Conf. Computer Vision and Pattern Recognition, 2014, pp. 17251732.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2017.0025
Loading

Related content

content/journals/10.1049/iet-cvi.2017.0025
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address