Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon openaccess Stacked residual blocks based encoder–decoder framework for human motion prediction

Human motion prediction is an important and challenging task in computer vision with various applications. Recurrent neural networks (RNNs) and convolutional neural networks (CNNs) have been proposed to address this challenging task. However, RNNs exhibit their limitations on long-term temporal modelling and spatial modelling of motion signals. CNNs show their inflexible spatial and temporal modelling capability that mainly depends on a large convolutional kernel and the stride of convolutional operation. Moreover, those methods predict multiple future poses recursively, which easily suffer from noise accumulation. The authors present a new encoder–decoder framework based on the residual convolutional block with a small filter to predict future human poses, which can flexibly capture the hierarchical spatial and temporal representation of the human motion signals from the motion capture sensor. Specifically, the encoder is stacked by multiple residual convolutional blocks to hierarchically encode the spatio-temporal features of previous poses. The decoder is built with two fully connected layers to automatically reconstruct the spatial and temporal information of future poses in a non-recursive manner, which can avoid noise accumulation that differs from prior works. Experimental results show that the proposed method outperforms baselines on the Human3.6M dataset, which shows the effectiveness of the proposed method. The code is available at https://github.com/lily2lab/residual_prediction_network.

References

    1. 1)
      • 5. Zhou, J., Jia, X., Shen, L., et al: ‘Improved softmax loss for deep learning-based face and expression recognition’, Cogn. Comput. Syst., 2019, 1, (4), pp. 97102.
    2. 2)
      • 25. Hernandez, A., Gall, J., Moreno-Noguer, F.: ‘Human motion prediction via spatio-temporal inpainting’. Proc. IEEE Int. Conf. on Computer Vision, Seoul, Korea, 2019, pp. 71347143.
    3. 3)
      • 6. Fragkiadaki, K., Levine, S., Felsen, P., et al: ‘Recurrent network models for human dynamics’. Proc. IEEE Int. Conf. on Computer Vision, Santiago, Chile, 2015, pp. 43464354.
    4. 4)
      • 17. Chiu, H.-K., Adeli, E., Wang, B., et al: ‘Action-agnostic human pose forecasting’. 2019 IEEE Winter Conf. on Applications of Computer Vision (WACV), Waikoloa Village, Hawaii, USA, 2019, pp. 14231432.
    5. 5)
      • 21. Liu, Z., Wu, S., Jin, S., et al: ‘Towards natural and accurate future motion prediction of humans and animals’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 1000410012.
    6. 6)
      • 10. Li, C., Zhang, Z., Sun Lee, W., et al: ‘Convolutional sequence to sequence model for human dynamics’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA, 2018, pp. 52265234.
    7. 7)
      • 18. Ghosh, P., Song, J., Aksan, E., et al: ‘Learning human motion models for long-term predictions’. 2017 Int. Conf. on 3D Vision (3DV), Qingdao, China, 2017, pp. 458466.
    8. 8)
      • 11. Li, Y., Wang, Z., Yang, X., et al: ‘Efficient convolutional hierarchical autoencoder for human motion prediction’, Vis. Comput., 2019, 35, (6–8), pp. 11431156.
    9. 9)
      • 24. Kundu, J.N., Gor, M., Babu, R.V.: ‘BiHMP-GAN: bidirectional 3D human motion prediction GAN’. Proc. AAAI Conf. on Artificial Intelligence, Honolulu, Hawaii, USA, 2019, vol. 33, pp. 85538560.
    10. 10)
      • 15. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’. Int. Conf. on Learning Representations, San Diego, CA, USA, 2015.
    11. 11)
      • 14. Bai, S., Kolter, J.Z., Koltun, V.: ‘An empirical evaluation of generic convolutional and recurrent networks for sequence modeling’, arXiv preprint arXiv:1803.01271, 2018.
    12. 12)
      • 22. Du, X., Vasudevan, R., Johnson-Roberson, M.: ‘Bio-LSTM: a biomechanically inspired recurrent neural network for 3-D pedestrian pose and gait prediction’, IEEE Robot. Autom. Lett., 2019, 4, (2), pp. 15011508.
    13. 13)
      • 23. Gui, L.-Y., Wang, Y.-X., Liang, X., et al: ‘Adversarial geometry-aware human motion prediction’. Proc. European Conf. on Computer Vision (ECCV), Munich, Germany, 2018, pp. 786803.
    14. 14)
      • 20. Guo, X., Choi, J.: ‘Human motion prediction via learning local structure representations and temporal dependencies’. Proc. AAAI Conf. on Artificial Intelligence, Honolulu, Hawaii, USA, 2019, vol. 33, pp. 25802587.
    15. 15)
      • 27. He, K., Zhang, X., Ren, S., et al: ‘Deep residual learning for image recognition’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770778.
    16. 16)
      • 8. Martinez, J., Black, M.J., Romero, J.: ‘On human motion prediction using recurrent neural networks’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA, 2017, pp. 28912900.
    17. 17)
      • 9. Chung, J., Gulcehre, C., Cho, K., et al: ‘Empirical evaluation of gated recurrent neural networks on sequence modeling’, arXiv preprint arXiv:1412.3555, 2014.
    18. 18)
      • 26. Barsoum, E., Kender, J., Liu, Z.: ‘HP-GAN: probabilistic 3D human motion prediction via GAN’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 2018, pp. 14181427.
    19. 19)
      • 13. Butepage, J., Black, M.J., Kragic, D., et al: ‘Deep representation learning for human motion prediction and classification’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA, 2017, pp. 61586166.
    20. 20)
      • 29. Ionescu, C., Li, F., Sminchisescu, C.: ‘Latent structured models for human pose estimation’. Proc. IEEE Int. Conf. on Computer Vision, Barcelona, Spain, 2011.
    21. 21)
      • 19. Tang, Y., Ma, L., Liu, W., et al: ‘Long-term human motion prediction by modeling motion context and enhancing motion dynamic’, arXiv preprint arXiv:1805.02513, 2018.
    22. 22)
      • 16. Gopalakrishnan, A., Mali, A., Kifer, D., et al: ‘A neural temporal model for human motion prediction’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 1211612125.
    23. 23)
      • 1. Bütepage, J., Kjellström, H., Kragic, D.: ‘Anticipating many futures: online human motion prediction and generation for human–robot interaction’. 2018 IEEE Int. Conf. on Robotics and Automation (ICRA), Prague, Czech Republic, 2018, pp. 19.
    24. 24)
      • 7. Jain, A., Zamir, A.R., Savarese, S., et al: ‘Structural-RNN: deep learning on spatio-temporal graphs’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 53085317.
    25. 25)
      • 2. Mainprice, J., Berenson, D.: ‘Human–robot collaborative manipulation planning using early prediction of human motion’. 2013 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Tokyo, Japan, 2013, pp. 299306.
    26. 26)
      • 4. Kong, Y., Fu, Y.: ‘Human action recognition and prediction: a survey’, arXiv preprint arXiv:1806.11230, 2018.
    27. 27)
      • 28. Ionescu, C., Papava, D., Olaru, V., et al: ‘Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments’, IEEE Trans. Pattern Anal. Mach. Intell., 2014, 36, (7), pp. 13251339.
    28. 28)
      • 30. Mao, W., Liu, M., Salzmann, M., et al: ‘Learning trajectory dependencies for human motion prediction’. Proc. IEEE Int. Conf. on Computer Vision, Seoul, Korea, 2019, pp. 94899497.
    29. 29)
      • 3. Gui, L.-Y., Wang, Y.-X., Ramanan, D., et al: ‘Few-shot human motion prediction via meta-learning’. Proc. European Conf. on Computer Vision (ECCV), Munich, Germany, 2018, pp. 432450.
    30. 30)
      • 12. Pavllo, D., Feichtenhofer, C., Auli, M., et al: ‘Modeling human motion with quaternion-based neural networks’, Int. J. Comput. Vis., 2020, 128, pp. 855872.
http://iet.metastore.ingenta.com/content/journals/10.1049/ccs.2020.0008
Loading

Related content

content/journals/10.1049/ccs.2020.0008
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address