http://iet.metastore.ingenta.com
1887

Human-action recognition using a multi-layered fusion scheme of Kinect modalities

Human-action recognition using a multi-layered fusion scheme of Kinect modalities

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Computer Vision — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

This study addresses the problem of efficiently combining the joint, RGB and depth modalities of the Kinect sensor in order to recognise human actions. For this purpose, a multi-layered fusion scheme concatenates different specific features, builds specialised local and global SVM models and then iteratively fuses their different scores. The authors essentially contribute in two levels: (i) they combine the performance of local descriptors with the strength of global bags-of-visual-words representations. They are able then to generate improved local decisions that allow noisy frames handling. (ii) They also study the performance of multiple fusion schemes guided by different features concatenations, Fisher vectors representations concatenation and later iterative scores fusion. To prove the efficiency of their approach, they have evaluated their experiments on two challenging public datasets: CAD-60 and CGC-2014. Competitive results are obtained for both benchmarks.

References

    1. 1)
      • A. Choudhary , S. Chaudhury .
        1. Choudhary, A., Chaudhury, S.: ‘Video analytics revisited’, IET Comput. Vis., 2016, 10, (4), pp. 237247.
        . IET Comput. Vis. , 4 , 237 - 247
    2. 2)
      • J.K. Aggarwal , L. Xia .
        2. Aggarwal, J.K., Xia, L.: ‘Human activity recognition from 3d data: a review’, Pattern Recognit. Lett., 2014, 48, pp. 7080.
        . Pattern Recognit. Lett. , 70 - 80
    3. 3)
      • P. Dondi , L. Lombardi , M. Porta .
        3. Dondi, P., Lombardi, L., Porta, M.: ‘Development of gesture-based human–computer interaction applications by fusion of depth and colour video streams’, IET Comput. Vis., 2014, 8, (6), pp. 568578.
        . IET Comput. Vis. , 6 , 568 - 578
    4. 4)
      • H. Guo , J. Wang , H. Lu .
        4. Guo, H., Wang, J., Lu, H.: ‘Multiple deep features learning for object retrieval in surveillance videos’, IET Comput. Vis., 2016, 10, (4), pp. 268271.
        . IET Comput. Vis. , 4 , 268 - 271
    5. 5)
      • M. Vrigkas , C. Nikou , I.A. Kakadiaris .
        5. Vrigkas, M., Nikou, C., Kakadiaris, I.A.: ‘A review of human activity recognition methods’, Front. Robot. AI, 2015, 2, p. 28.
        . Front. Robot. AI , 28
    6. 6)
      • A. Haque , B. Peng , Z. Luo .
        6. Haque, A., Peng, B., Luo, Z., et al: ‘Towards viewpoint invariant 3d human pose estimation’. Proc. ECCV, 2016, pp. 160177.
        . Proc. ECCV , 160 - 177
    7. 7)
      • L. Wang , Y. Qiao , X. Tang .
        7. Wang, L., Qiao, Y., Tang, X.: ‘Video action detection with relational dynamic-poselets’. Proc. ECCV, 2014, pp. 565580.
        . Proc. ECCV , 565 - 580
    8. 8)
      • S. Hadfield , K. Lebeda , R. Bowden .
        8. Hadfield, S., Lebeda, K., Bowden, R.: ‘Hollywood 3d: what are the best 3d features for action recognition?’, Int. J. Comput. Vis., 2017, 121, pp. 95110.
        . Int. J. Comput. Vis. , 95 - 110
    9. 9)
      • I. Laptev , M. Marszalek , C. Schmid .
        9. Laptev, I., Marszalek, M., Schmid, C., et al: ‘Learning realistic human actions from movies’. Proc. CVPR, 2008, pp. 18.
        . Proc. CVPR , 1 - 8
    10. 10)
      • H. Jhuang , J. Gall , S. Zuffi .
        10. Jhuang, H., Gall, J., Zuffi, S., et al: ‘Towards understanding action recognition’. Proc. ICCV, 2013, pp. 31923199.
        . Proc. ICCV , 3192 - 3199
    11. 11)
      • J. Sung , C. Ponce , B. Selman .
        11. Sung, J., Ponce, C., Selman, B., et al: ‘Unstructured human activity detection from rgbd images’. Proc. ICRA, 2012, pp. 842849.
        . Proc. ICRA , 842 - 849
    12. 12)
      • I. Guyon , V. Athitsos , P. Jangyodsuk .
        12. Guyon, I., Athitsos, V., Jangyodsuk, P., et al: ‘The chalearn gesture dataset (cgd 2011)’, Mach. Vis. Appl., 2014, 25, (8), pp. 19291951.
        . Mach. Vis. Appl. , 8 , 1929 - 1951
    13. 13)
      • S. Escalera , X. Baró , J. Gonzàlez .
        13. Escalera, S., Baró, X., Gonzàlez, J., et al: ‘Chalearn looking at people challenge 2014: dataset and results’. Proc. ECCV Workshops, 2014, pp. 459473.
        . Proc. ECCV Workshops , 459 - 473
    14. 14)
      • Y. Guo , Y. Liu , A. Oerlemans .
        14. Guo, Y., Liu, Y., Oerlemans, A., et al: ‘Deep learning for visual understanding: a review’, Neurocomputing, 2016, 187, pp. 2748.
        . Neurocomputing , 27 - 48
    15. 15)
      • A. Krizhevsky , I. Sutskever , G.E. Hinton .
        15. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘ImageNet classification with deep convolutional neural networks’. Proc. NIPS, 2012, pp. 10971105.
        . Proc. NIPS , 1097 - 1105
    16. 16)
      • F. Perronnin , J. Sánchez , T. Mensink .
        16. Perronnin, F., Sánchez, J., Mensink, T.: ‘Improving the Fisher kernel for large-scale image classification’. Proc. ECCV, 2010, pp. 143156.
        . Proc. ECCV , 143 - 156
    17. 17)
      • T. Pfister , J. Charles , A. Zisserman .
        17. Pfister, T., Charles, J., Zisserman, A.: ‘Flowing convNets for human pose estimation in videos’. Proc. ICCV, 2015, pp. 19131921.
        . Proc. ICCV , 1913 - 1921
    18. 18)
      • N. Neverova , C. Wolf , G. Taylor .
        18. Neverova, N., Wolf, C., Taylor, G., et al: ‘Moddrop: adaptive multi-modal gesture recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2016, 38, (8), pp. 16921706.
        . IEEE Trans. Pattern Anal. Mach. Intell. , 8 , 1692 - 1706
    19. 19)
      • L. Wang , Y. Qiao , X. Tang .
        19. Wang, L., Qiao, Y., Tang, X.: ‘Action recognition with trajectory-pooled deep-convolutional descriptors’. Proc. CVPR, 2015, pp. 43054314.
        . Proc. CVPR , 4305 - 4314
    20. 20)
      • B. Seddik , S. Gazzah , N. Essoukri Ben Amara .
        20. Seddik, B., Gazzah, S., Essoukri Ben Amara, N.: ‘Hands, face and joints for multi-modal human-action temporal segmentation and recognition’. Proc. EUSIPCO, 2015, pp. 11431147.
        . Proc. EUSIPCO , 1143 - 1147
    21. 21)
      • B. Seddik , S. Gazzah , N. Essoukri Ben Amara .
        21. Seddik, B., Gazzah, S., Essoukri Ben Amara, N.: ‘Modalities combination for Italian sign language extraction and recognition’. Proc. ICIAP, 2015, pp. 710721.
        . Proc. ICIAP , 710 - 721
    22. 22)
      • J. Wan , Q. Ruan , W. Li .
        22. Wan, J., Ruan, Q., Li, W., et al: ‘One-shot learning gesture recognition from rgb-d data using bag of features’, J. Mach. Learn. Res., 2013, 14, pp. 25492582.
        . J. Mach. Learn. Res. , 2549 - 2582
    23. 23)
      • H.S. Koppula , R. Gupta , A. Saxena .
        23. Koppula, H.S., Gupta, R., Saxena, A.: ‘Learning human activities and object affordances from RGB-D videos’, Int. J. Robot. Res., 2013, 32, (8), pp. 951970.
        . Int. J. Robot. Res. , 8 , 951 - 970
    24. 24)
      • N.C. Camgöz , A.A. Kindiroglu , L. Akarun .
        24. Camgöz, N.C., Kindiroglu, A.A., Akarun, L.: ‘Gesture recognition using template based random forest classifiers’. Proc. ECCV Workshops, 2014, pp. 579594.
        . Proc. ECCV Workshops , 579 - 594
    25. 25)
      • R. Vemulapalli , F. Arrate , R. Chellappa .
        25. Vemulapalli, R., Arrate, F., Chellappa, R.: ‘R3DG features: relative 3d geometry-based skeletal representations for human action recognition’, Comput. Vis. Image Underst., 2016, 152, pp. 155166.
        . Comput. Vis. Image Underst. , 155 - 166
    26. 26)
      • S. Gaglio , G.L. Re , M. Morana .
        26. Gaglio, S., Re, G.L., Morana, M.: ‘Human activity recognition process using 3-d posture data’, IEEE Trans. Hum.-Mach. Syst., 2015, 45, (5), pp. 586597.
        . IEEE Trans. Hum.-Mach. Syst. , 5 , 586 - 597
    27. 27)
      • C. Monnier , S. German , A. Ost .
        27. Monnier, C., German, S., Ost, A.: ‘A multi-scale boosted detector for efficient and robust gesture recognition’. Proc. ECCV Workshops, 2014, pp. 491502.
        . Proc. ECCV Workshops , 491 - 502
    28. 28)
      • J. Shan , S. Akella .
        28. Shan, J., Akella, S.: ‘3d human action segmentation and recognition using pose kinetic energy’. Proc. ARSO, 2014, pp. 6975.
        . Proc. ARSO , 69 - 75
    29. 29)
      • M. Zanfir , M. Leordeanu , C. Sminchisescu .
        29. Zanfir, M., Leordeanu, M., Sminchisescu, C.: ‘The moving pose: an efficient 3d kinematics descriptor for low-latency action recognition and detection’. Proc. ICCV, 2013, pp. 27522759.
        . Proc. ICCV , 2752 - 2759
    30. 30)
      • J.Y. Chang .
        30. Chang, J.Y.: ‘Nonparametric gesture labeling from multi-modal data’. Proc. ECCV Workshops, 2014, pp. 503517.
        . Proc. ECCV Workshops , 503 - 517
    31. 31)
      • D.R. Faria , C. Premebida , U. Nunes .
        31. Faria, D.R., Premebida, C., Nunes, U.: ‘A probabilistic approach for human everyday activities recognition using body motion from rgb-d images’. Proc. RO-MAN, 2014, pp. 732737.
        . Proc. RO-MAN , 732 - 737
    32. 32)
      • E. Cippitelli , S. Gasparrini , E. Gambi .
        32. Cippitelli, E., Gasparrini, S., Gambi, E., et al: ‘A human activity recognition system using skeleton data from rgbd sensors’, Comput. Intell. Neurosci., 2016, 2016, pp. 114.
        . Comput. Intell. Neurosci. , 1 - 14
    33. 33)
      • B. Ben Amor , J. Su , A. Srivastava .
        33. Ben Amor, B., Su, J., Srivastava, A.: ‘Action recognition using rate-invariant analysis of skeletal shape trajectories’, IEEE Trans. Pattern Anal. Mach. Intell., 2016, 38, (1), pp. 113.
        . IEEE Trans. Pattern Anal. Mach. Intell. , 1 , 1 - 13
    34. 34)
      • S. Chun , C. Lee .
        34. Chun, S., Lee, C.: ‘Human action recognition using histogram of motion intensity and direction from multiple views’, IET Comput. Vis., 2016, 10, (4), pp. 250256.
        . IET Comput. Vis. , 4 , 250 - 256
    35. 35)
      • A. Hernández-Vela , M.Á. Bautista , X. Perez-Sala .
        35. Hernández-Vela, A., Bautista, M.Á., Perez-Sala, X., et al: ‘Probability-based dynamic time warping and bag-of-visual-and-depth-words for human gesture recognition in RGB-D’, Pattern Recognit. Lett., 2013, 50, pp. 112121.
        . Pattern Recognit. Lett. , 112 - 121
    36. 36)
      • H. Wang , A. Kläser , C. Schmid .
        36. Wang, H., Kläser, A., Schmid, C., et al: ‘Action recognition by dense trajectories’. Proc. CVPR, 2011, pp. 31693176.
        . Proc. CVPR , 3169 - 3176
    37. 37)
      • H. Wang , C. Schmid .
        37. Wang, H., Schmid, C.: ‘Action recognition with improved trajectories’. Proc. ICCV, 2013, pp. 35513558.
        . Proc. ICCV , 3551 - 3558
    38. 38)
      • F. Dominio , M. Donadeo , P. Zanuttigh .
        38. Dominio, F., Donadeo, M., Zanuttigh, P.: ‘Combining multiple depth-based descriptors for hand gesture recognition’, Pattern Recognit. Lett., 2014, 50, pp. 101111.
        . Pattern Recognit. Lett. , 101 - 111
    39. 39)
      • B. Liang , L. Zheng .
        39. Liang, B., Zheng, L.: ‘Multi-modal gesture recognition using skeletal joints and motion trail model’. Proc. ECCV Workshops, 2014, pp. 623638.
        . Proc. ECCV Workshops , 623 - 638
    40. 40)
      • O. Oreifej , Z. Liu .
        40. Oreifej, O., Liu, Z.: ‘Hon4d: histogram of oriented 4d normals for activity recognition from depth sequences’. Proc. CVPR, 2013, pp. 716723.
        . Proc. CVPR , 716 - 723
    41. 41)
      • C. Zhang , Y. Tian .
        41. Zhang, C., Tian, Y.: ‘Histogram of 3d facets: a depth descriptor for human action and hand gesture recognition’, Comput. Vis. Image Underst., 2015, 139, pp. 2939.
        . Comput. Vis. Image Underst. , 29 - 39
    42. 42)
      • Y. Zhu , W. Chen , G. Guo .
        42. Zhu, Y., Chen, W., Guo, G.: ‘Evaluating spatiotemporal interest point features for depth based action recognition’, Image Vis. Comput., 2014, 32, (8), pp. 453464.
        . Image Vis. Comput. , 8 , 453 - 464
    43. 43)
      • G.I. Parisi , C. Weber , S. Wermter .
        43. Parisi, G.I., Weber, C., Wermter, S.: ‘Self-organizing neural integration of pose-motion features for human action recognition’, Front. Neurorobot., 2015, 9, p. 3.
        . Front. Neurorobot. , 3
    44. 44)
      • W. Zhou , C. Wang , B. Xiao .
        44. Zhou, W., Wang, C., Xiao, B., et al: ‘Human action recognition using weighted pooling’, IET Comput. Vis., 2014, 8, (6), pp. 579587.
        . IET Comput. Vis. , 6 , 579 - 587
    45. 45)
      • X. Peng , L. Wang , X. Wang .
        45. Peng, X., Wang, L., Wang, X., et al: ‘Bag of visual words and fusion methods for action recognition: comprehensive study and good practice’, Comput. Vis. Image Underst., 2016, 150, pp. 109125.
        . Comput. Vis. Image Underst. , 109 - 125
    46. 46)
      • X. Peng , L. Wang , Z. Cai .
        46. Peng, X., Wang, L., Cai, Z., et al: ‘Action and gesture temporal spotting with super vector representation’. Proc. ECCV Workshops, 2014, pp. 518527.
        . Proc. ECCV Workshops , 518 - 527
    47. 47)
      • L. Onofri , P. Soda , G. Iannello .
        47. Onofri, L., Soda, P., Iannello, G.: ‘Multiple subsequence combination in human action recognition’, IET Comput. Vis., 2014, 8, (1), pp. 2634.
        . IET Comput. Vis. , 1 , 26 - 34
    48. 48)
      • A. Iosidis , A. Tefas , I. Pitas .
        48. Iosidis, A., Tefas, A., Pitas, I.: ‘Discriminant bag of words based representation for human action recognition’, Pattern Recognit. Lett., 2014, 49, pp. 185192.
        . Pattern Recognit. Lett. , 185 - 192
    49. 49)
      • L. Pigou , A. Van Den Oord , S. Dieleman .
        49. Pigou, L., Van Den Oord, A., Dieleman, S., et al: ‘Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video’, Int. J. Comput. Vis., 2016, 124, pp. 110.
        . Int. J. Comput. Vis. , 1 - 10
    50. 50)
      • D. Wu , L. Pigou , P.J. Kindermanz .
        50. Wu, D., Pigou, L., Kindermanz, P.J., et al: ‘Deep dynamic neural networks for multimodal gesture segmentation and recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2016, 38, (8), pp. 15831597.
        . IEEE Trans. Pattern Anal. Mach. Intell. , 8 , 1583 - 1597
    51. 51)
      • M. Selmi , M.A. El-Yacoubi , B. Dorizzi .
        51. Selmi, M., El-Yacoubi, M.A., Dorizzi, B.: ‘Two-layer discriminative model for human activity recognition’, IET Comput. Vis., 2016, 10, (4), pp. 273278.
        . IET Comput. Vis. , 4 , 273 - 278
    52. 52)
      • B. Ni , P. Moulin , S. Yan .
        52. Ni, B., Moulin, P., Yan, S.: ‘Order-Preserving sparse coding for sequence classification’. Proc. ECCV, 2012, pp. 173187.
        . Proc. ECCV , 173 - 187
    53. 53)
      • X. Yang , Y. Tian .
        53. Yang, X., Tian, Y.: ‘Effective 3d action recognition using eigenjoints’, J. Vis. Commun. Image Represent., 2014, 25, (1), pp. 211.
        . J. Vis. Commun. Image Represent. , 1 , 2 - 11
    54. 54)
      • P. Molchanov , X. Yang , S. Gupta .
        54. Molchanov, P., Yang, X., Gupta, S., et al: ‘Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural networks’. Proc. CVPR, 2016, pp. 42074215.
        . Proc. CVPR , 4207 - 4215
    55. 55)
      • G.D. Evangelidis , G. Singh , R. Horaud .
        55. Evangelidis, G.D., Singh, G., Horaud, R.: ‘Continuous gesture recognition from articulated poses’. Proc. ECCV Workshops, 2014, pp. 595607.
        . Proc. ECCV Workshops , 595 - 607
    56. 56)
      • B. Seddik , H. Maâmatou , S. Gazzah .
        56. Seddik, B., Maâmatou, H., Gazzah, S., et al: ‘Unsupervised facial expressions recognition and avatar reconstruction from kinect’. Proc. SSD, 2013, pp. 16.
        . Proc. SSD , 1 - 6
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2016.0326
Loading

Related content

content/journals/10.1049/iet-cvi.2016.0326
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address