http://iet.metastore.ingenta.com
1887

Structured RNN for human interaction

Structured RNN for human interaction

For access to this article, please select a purchase option:

Buy eFirst article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
— Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Understanding human activities has been an important research area in computer vision. Generally, the authors can model the human interactions as a temporal sequence with the transition in relationships of humans and objects. Besides, many studies have proved the effectiveness of long short-term memory (LSTM) on long-term temporal dependency problems. Here, the authors proposed a novel structured recurrent neural network (S-RNN) to model spatio-temporal relationships between human subjects and objects in daily human interactions. The authors represent the evolution of different components and the relationships between them over time by several subnets. Then, the hidden representations of those relations are fused and fed into the later layers to obtain the final hidden representation. The final prediction is carried out by the single-layer perceptron. The experimental results of different tasks on the CAD-120, SBU-Kinect-Interaction, multi-modal and multi-view and interactive, and NTU RGB+D data sets showed advantages of the proposed method compared with the state-of-art methods.

References

    1. 1)
      • J.K. Aggarwal , M.S. Ryoo .
        1. Aggarwal, J.K., Ryoo, M.S.: ‘Human activity analysis: a review’, ACM Comput. Surv., 2011, 43, (3), pp. 143.
        . ACM Comput. Surv. , 3 , 1 - 43
    2. 2)
      • M. Ye . (2013)
        2. Ye, M.: In Grzegorzek, M., Theobalt, C., Koch, R., et al (Eds.): ‘A survey on human motion analysis from depth data’ (Springer, Berlin, Heidelberg, 2013), pp. 149187.
        .
    3. 3)
      • K. Yun , J. Honorio , D. Chattopadhyay .
        3. Yun, K., Honorio, J., Chattopadhyay, D., et al: ‘Two-person interaction detection using body-pose features, multiple instance learning’. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 2012, pp. 2835.
        . IEEE Computer Society Conf. on Computer Vision and Pattern Recognition Workshops , 28 - 35
    4. 4)
      • J. Sung , C. Ponce , B. Selman .
        4. Sung, J., Ponce, C., Selman, B., et al: ‘Unstructured human activity detection from RGBD images’. IEEE Int. Conf. on Robotics and Automation, 2012, pp. 842849.
        . IEEE Int. Conf. on Robotics and Automation , 842 - 849
    5. 5)
      • B. Ni , Y. Pei , P. Moulin .
        5. Ni, B., Pei, Y., Moulin, P., et al: ‘Multilevel depth and image fusion for human activity detection’, IEEE Trans. Cybern., 2013, 43, (5), pp. 13831394.
        . IEEE Trans. Cybern. , 5 , 1383 - 1394
    6. 6)
      • R. Gupta , A.Y.S. Chia , D. Rajan .
        6. Gupta, R., Chia, A.Y.S., Rajan, D.: ‘Human activities recognition using depth images’. Proc. of the 21st ACM Int. Conf. on Multimedia. MM ‘13, Barcelona, Spain, 2013, pp. 283292.
        . Proc. of the 21st ACM Int. Conf. on Multimedia. MM ‘13 , 283 - 292
    7. 7)
      • H.S. Koppula , R. Gupta , A. Saxena .
        7. Koppula, H.S., Gupta, R., Saxena, A.: ‘Learning human activities and object affordances from RGB-D videos’, Int. J. Rob. Res., 2013, 32, (8), pp. 951970.
        . Int. J. Rob. Res. , 8 , 951 - 970
    8. 8)
      • H.S. Koppula , A. Saxena .
        8. Koppula, H.S., Saxena, A.: ‘Learning spatio-temporal structure from RGB-D videos for human activity detection and anticipation’. Proc. of the 30th Int. Conf. on Int. Conf. on Machine Learning, 2013, pp. 792800.
        . Proc. of the 30th Int. Conf. on Int. Conf. on Machine Learning , 792 - 800
    9. 9)
      • M. Li , H. Leung .
        9. Li, M., Leung, H.: ‘Multiview skeletal interaction recognition using active joint interaction graph’, IEEE Trans. Multimed., 2016, 18, (11), pp. 22932302.
        . IEEE Trans. Multimed. , 11 , 2293 - 2302
    10. 10)
      • A. Graves , N. Jaitly .
        10. Graves, A., Jaitly, N.: ‘Towards end-to-end speech recognition with recurrent neural networks’. Proc. of the 31st Int. Conf. on Machine Learning, 2014, pp. 17641772.
        . Proc. of the 31st Int. Conf. on Machine Learning , 1764 - 1772
    11. 11)
      • K. Fragkiadaki , S. Levine , P. Felsen .
        11. Fragkiadaki, K., Levine, S., Felsen, P., et al: ‘Recurrent network models for human dynamics’. Proc. of the 2015 IEEE Int. Conf. on Computer Vision, Santiago, Chile, 2015, pp. 43464354.
        . Proc. of the 2015 IEEE Int. Conf. on Computer Vision , 4346 - 4354
    12. 12)
      • J. Sung , C. Ponce , B. Selman .
        12. Sung, J., Ponce, C., Selman, B., et al: ‘Human activity detection from RGBD images’. Proc. of the 16th AAAI Conf. on Plan, Activity, and Intent Recognition, San Francisco, CA, USA, 2011, pp. 4755.
        . Proc. of the 16th AAAI Conf. on Plan, Activity, and Intent Recognition , 47 - 55
    13. 13)
      • A. Taha , H.H. Zayed , M.E. Khalifa .
        13. Taha, A., Zayed, H.H., Khalifa, M.E., et al: ‘Skeleton-based human activity recognition for video surveillance’, Int. J. Sci. Eng. Res., 2015, 6, (1), pp. 9931004.
        . Int. J. Sci. Eng. Res. , 1 , 993 - 1004
    14. 14)
      • Y. Du , W. Wang , L. Wang .
        14. Du, Y., Wang, W., Wang, L.: ‘Hierarchical recurrent neural network for skeleton based action recognition’. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 11101118.
        . IEEE Conf. on Computer Vision and Pattern Recognition , 1110 - 1118
    15. 15)
      • W. Zhu , C. Lan , J. Xing .
        15. Zhu, W., Lan, C., Xing, J., et al: ‘Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks’. Proc. of the Thirtieth AAAI Conf. on Artificial Intelligence, Phoenix, AZ, USA, 2016, pp. 36973703.
        . Proc. of the Thirtieth AAAI Conf. on Artificial Intelligence , 3697 - 3703
    16. 16)
      • A. Shahroudy , J. Liu , T.T. Ng .
        16. Shahroudy, A., Liu, J., Ng, T.T., et al: ‘Ntu RGB+D: a large scale dataset for 3d human activity analysis’. The IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016.
        . The IEEE Conf. on Computer Vision and Pattern Recognition
    17. 17)
      • J. Liu , A. Shahroudy , D. Xu .
        17. Liu, J., Shahroudy, A., Xu, D., et al: ‘Spatio-temporal lstm with trust gates for 3d human action recognition’. European Conf. on Computer Vision, 2016, pp. 816833.
        . European Conf. on Computer Vision , 816 - 833
    18. 18)
      • J. Wang , Z. Liu , Y. Wu .
        18. Wang, J., Liu, Z., Wu, Y., et al: ‘Mining actionlet ensemble for action recognition with depth cameras’. IEEE Conf. on Computer Vision and Pattern Recognition, Providence, RI, USA, 2012, pp. 12901297.
        . IEEE Conf. on Computer Vision and Pattern Recognition , 1290 - 1297
    19. 19)
      • N. Srivastava , G. Hinton , A. Krizhevsky .
        19. Srivastava, N., Hinton, G., Krizhevsky, A., et al: ‘Dropout: a simple way to prevent neural networks from overfitting’, J. Mach. Learn. Res., 2014, 15, (1), pp. 19291958.
        . J. Mach. Learn. Res. , 1 , 1929 - 1958
    20. 20)
      • M.S. Ryoo , J.K. Aggarwal .
        20. Ryoo, M.S., Aggarwal, J.K.: ‘Hierarchical recognition of human activities interacting with objects’. IEEE Conf. on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 2007, pp. 18.
        . IEEE Conf. on Computer Vision and Pattern Recognition , 1 - 8
    21. 21)
      • U. Akdemir , P. Turaga , R. Chellappa .
        21. Akdemir, U., Turaga, P., Chellappa, R.: ‘An ontology based approach for activity recognition from video’. Proc. of the 16th ACM Int. Conf. on Multimedia, Vancouver, Canada, 2008, pp. 709712.
        . Proc. of the 16th ACM Int. Conf. on Multimedia , 709 - 712
    22. 22)
      • Y. Kong , Y. Jia , Y. Fu .
        22. Kong, Y., Jia, Y., Fu, Y.: ‘Learning human interaction by interactive phrases’. Computer Vision – ECCV 2012, Firenze, Italy, 2012, pp. 300313.
        . Computer Vision – ECCV 2012 , 300 - 313
    23. 23)
      • A. Prest , C. Schmid , V. Ferrari .
        23. Prest, A., Schmid, C., Ferrari, V.: ‘Weakly supervised learning of interactions between humans and objects’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, (3), pp. 601614.
        . IEEE Trans. Pattern Anal. Mach. Intell. , 3 , 601 - 614
    24. 24)
      • Y. Kong , Y. Jia , Y. Fu .
        24. Kong, Y., Jia, Y., Fu, Y.: ‘Interactive phrases: semantic descriptions for human interaction recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2014, 36, (9), pp. 17751788.
        . IEEE Trans. Pattern Anal. Mach. Intell. , 9 , 1775 - 1788
    25. 25)
      • T. Lan , Y. Wang , W. Yang .
        25. Lan, T., Wang, Y., Yang, W., et al: ‘Discriminative latent models for recognizing contextual group activities’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, (8), pp. 15491562.
        . IEEE Trans. Pattern Anal. Mach. Intell. , 8 , 1549 - 1562
    26. 26)
      • A. Jain , A.R. Zamir , S. Savarese .
        26. Jain, A., Zamir, A.R., Savarese, S., et al: ‘Structural-RNN: deep learning on spatio-temporal graphs’. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 53085317.
        . IEEE Conf. on Computer Vision and Pattern Recognition , 5308 - 5317
    27. 27)
      • R. Pascanu , T. Mikolov , Y. Bengio .
        27. Pascanu, R., Mikolov, T., Bengio, Y.: ‘On the difficulty of training recurrent neural networks’. Proc. of the 30th Int. Conf. on Machine Learning, Atlanta, GA, USA, 2013, pp. 13101318.
        . Proc. of the 30th Int. Conf. on Machine Learning , 1310 - 1318
    28. 28)
      • S. Hochreiter , J. Schmidhuber .
        28. Hochreiter, S., Schmidhuber, J.: ‘Long short-term memory’, Neural Comput., 1997, 9, (8), pp. 17351780.
        . Neural Comput. , 8 , 1735 - 1780
    29. 29)
      • H.S. Koppula , A. Saxena .
        29. Koppula, H.S., Saxena, A.: ‘Anticipating human activities using object affordances for reactive robotic response’, IEEE Trans. Pattern Anal. Mach. Intell., 2016, 38, (1), pp. 1429.
        . IEEE Trans. Pattern Anal. Mach. Intell. , 1 , 14 - 29
    30. 30)
      • J. Duchi , E. Hazan , Y. Singer .
        30. Duchi, J., Hazan, E., Singer, Y.: ‘Adaptive subgradient methods for online learning and stochastic optimization’, J. Mach. Learn. Res., 2011, 12, pp. 21212159.
        . J. Mach. Learn. Res. , 2121 - 2159
    31. 31)
      • J.J. Gibson . (1979)
        31. Gibson, J.J.: ‘The ecological approach to visual perception’ (Houghton Mifflin, Boston, 1979).
        .
    32. 32)
      • N. Xu , A. Liu , W. Nie .
        32. Xu, N., Liu, A., Nie, W., et al: ‘Multi-modal & multi-view & interactive benchmark dataset for human action recognition’. Proc. of the 23th Int. Conf. on Multimedia, Brisbane, Australia, 2015.
        . Proc. of the 23th Int. Conf. on Multimedia
    33. 33)
      • N. Hu , G. Englebienne , Z. Lou .
        33. Hu, N., Englebienne, G., Lou, Z., et al: ‘Learning latent structure for activity recognition’. IEEE Int. Conf. on Robotics and Automation, Hong Kong, China, 2014, pp. 10481053.
        . IEEE Int. Conf. on Robotics and Automation , 1048 - 1053
    34. 34)
      • Y. Ji , G. Ye , H. Cheng .
        34. Ji, Y., Ye, G., Cheng, H.: ‘Interactive body part contrast mining for human interaction recognition’. 2014 IEEE Int. Conf. on Multimedia and Expo Workshops, Chendgu, China, 2014, pp. 16.
        . 2014 IEEE Int. Conf. on Multimedia and Expo Workshops , 1 - 6
    35. 35)
      • R. Vemulapalli , F. Arrate , R. Chellappa .
        35. Vemulapalli, R., Arrate, F., Chellappa, R.: ‘Human action recognition by representing 3d skeletons as points in a lie group’. IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 588595.
        . IEEE Conf. on Computer Vision and Pattern Recognition , 588 - 595
    36. 36)
      • J.F. Hu , W.S. Zheng , J. Lai .
        36. Hu, J.F., Zheng, W.S., Lai, J., et al: ‘Jointly learning heterogeneous features for RGB-D activity recognition’. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 53445352.
        . IEEE Conf. on Computer Vision and Pattern Recognition , 5344 - 5352
    37. 37)
      • Q. Ke , S. An , M. Bennamoun .
        37. Ke, Q., An, S., Bennamoun, M., et al: ‘Skeletonnet: mining deep part features for 3-d action recognition’, IEEE Signal Process. Lett., 2017, 24, (6), pp. 731735.
        . IEEE Signal Process. Lett. , 6 , 731 - 735
    38. 38)
      • J. Liu , G. Wang , P. Hu .
        38. Liu, J., Wang, G., Hu, P., et al: ‘Global context-aware attention LSTM networks for 3D action recognition’. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 36713680.
        . IEEE Conf. on Computer Vision and Pattern Recognition , 3671 - 3680
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2017.0487
Loading

Related content

content/journals/10.1049/iet-cvi.2017.0487
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address