access icon free Mixing body-parts model for 2D human pose estimation in stereo videos

This study targets 2D articulated human pose estimation (i.e. localisation of body limbs) in stereo videos. Although in recent years depth-based devices (e.g. Microsoft Kinect) have gained popularity, as they perform very well in controlled indoor environments (e.g. living rooms, operating theatres or gyms), they suffer clear problems in outdoor scenarios and, therefore, human pose estimation is still an interesting unsolved problem. The authors propose here a novel approach that is able to localise upper-body keypoints (i.e. shoulders, elbows, and wrists) in temporal sequences of stereo image pairs. The authors’ method starts by locating and segmenting people in the image pairs by using disparity and appearance information. Then, a set of candidate body poses is computed for each view independently. Finally, temporal and stereo consistency is applied to estimate a final 2D pose. The authors’ validate their model on three challenging datasets: ‘stereo human pose estimation dataset’, ‘poses in the wild’ and ‘INRIA 3DMovie’. The experimental results show that the authors’ model not only establishes new state-of-the-art results on stereo sequences, but also brings improvements in monocular sequences.

Inspec keywords: pose estimation; stereo image processing; image sequences; video signal processing; image sensors

Other keywords: Microsoft Kinect; temporal sequences; stereo human pose estimation dataset; INRIA 3DMovie; stereo videos; mixing body-parts model; body poses; stereo consistency; stereo image pairs; 2D articulated human pose estimation; temporal consistency; controlled indoor environments; monocular sequences; localise upper-body keypoints; stereo sequences

Subjects: Optical, image and video signal processing; Image sensors; Computer vision and image processing techniques

References

    1. 1)
      • 1. Wang, C., Wang, Y., Yuille, A.: ‘An approach to pose-based action recognition’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 915922.
    2. 2)
      • 30. ‘OpenCV: Optical Flow computation on CUDA’. Available at http://docs.opencv.org/3.0-beta/modules/cudaoptflow/doc/optflow.html, Last visit: February 2017.
    3. 3)
      • 5. Sun, M., Kohli, P., Shotton, J.: ‘Conditional regression forests for human pose estimation’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 33943401.
    4. 4)
      • 28. Sapp, B., Weiss, D., Taskar, B.: ‘Parsing human motion with stretchable models’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2011, pp. 12811288.
    5. 5)
      • 7. Yang, Y., Ramanan, D.: ‘Articulated human detection with flexible mixtures of parts’, IEEE Trans. Pattern Anal. Mach. Intell., 2013, 35, pp. 28782890.
    6. 6)
      • 11. Žbontar, J., LeCun, Y.: ‘Stereo matching by training a convolutional neural network to compare image patches’, J. Mach. Learn. Res., 2016, 17, (65), pp. 132.
    7. 7)
      • 27. Ayvaci, A., Raptis, M., Soatto, S.: ‘Sparse occlusion detection with optical flow’, Intl. J. Comput. Vis., 2011, 97, pp. 322338.
    8. 8)
      • 4. Shotton, J., Fitzgibbon, A., Cook, M., et al: ‘Real-time human pose recognition in parts from a single depth image’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 12971304.
    9. 9)
      • 20. Burenius, M., Sullivan, J., Carlsson, S.: ‘3D pictorial structures for multiple view articulated pose estimation’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2013, pp. 36183625.
    10. 10)
      • 16. Zheng, S., Jayasumana, S., Romera-Paredes, B., et al: ‘Conditional random fields as recurrent neural networks’. Proc. of the Intl. Conf. on Computer Vision (ICCV), December 2015, pp. 15291537.
    11. 11)
      • 6. Pérez-Sala, X., Escalera, S., Angulo, C., et al: ‘A survey on model based approaches for 2D and 3D visual human pose recovery’, Sensors, 2014, 14, pp. 41894210.
    12. 12)
      • 10. Zhang, C., Li, Z., Cheng, Y., et al: ‘MeshStereo: A global stereo model with mesh alignment regularization for view interpolation’. Proc. of the Intl. Conf. on Computer Vision (ICCV), December 2015, pp. 20572065.
    13. 13)
      • 15. Long, J., Shelhamer, E., Darrell, T.: ‘Fully convolutional networks for semantic segmentation’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2015, pp. 34313440.
    14. 14)
      • 8. Chen, M., Tan, X.: ‘Part-based pose estimation with local and non-local contextual information’, IET Comput. Vis., 2014, 8, pp. 475486(11).
    15. 15)
      • 19. Shen, J., Yang, W., Liao, Q.: ‘Multiview human pose estimation with unconstrained motions’, Pattern Recognit. Lett., 2011, 32, (15), pp. 20252035.
    16. 16)
      • 14. Chen, L.-C., Papandreou, G., Kokkinos, I., et al: ‘Semantic image segmentation with deep convolutional nets and fully connected crfs’. Int. Conf. on Learning Representations, May 2015.
    17. 17)
      • 29. Mayer, N., Ilg, E., Hausser, P., et al: ‘A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 40404048.
    18. 18)
      • 18. Jiu, M., Wolf, C., Taylor, G., et al: ‘Human body part estimation from depth images via spatially-constrained deep learning’, Pattern Recognit. Lett., 2014, 50, pp. 122129.
    19. 19)
      • 26. Brox, T., Malik, J.: ‘Large displacement optical flow: Descriptor matching in variational motion estimation’, IEEE Trans. Pattern Anal. Mach. Intell., 2011, 33, pp. 500513.
    20. 20)
      • 25. Park, D., Ramanan, D.: ‘N-best maximal decoders for part models’. Proc. of the Intl. Conf. on Computer Vision (ICCV), 2011, pp. 26272634.
    21. 21)
      • 22. Seguin, G., Alahari, K., Sivic, J., et al: ‘Pose estimation and segmentation of multiple people in stereoscopic movies’, IEEE Trans. Pattern Anal. Mach. Intell., 2015, 37, (8), pp. 16431655.
    22. 22)
      • 9. Cherian, A., Mairal, J., Alahari, K., et al: ‘Mixing body-part sequences for human pose estimation’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2014, pp. 23612368.
    23. 23)
      • 24. Ren, S., He, K., Girshick, R., et al: ‘Faster R-CNN: Towards real-time object detection with region proposal networks’. Advances in Neural Information Processing Systems (NIPS), December 2015, pp. 9199.
    24. 24)
      • 13. Ferrari, V., Marin-Jimenez, M., Zisserman, A.: ‘Progressive search space reduction for human pose estimation’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), June 2008, pp. 18.
    25. 25)
      • 2. Zhou, W., Wang, C., Xiao, B., et al: ‘Human action recognition using weighted pooling’, IET Comput. Vis., 2014, 8, pp. 579587(8).
    26. 26)
      • 21. Kazemi, V., Burenius, M., Azizpour, H., et al: ‘Multi-view body part recognition with random forests’. Proc. of the British Machine Vision Conf. (BMVC), September 2013, pp. 48.148.11.
    27. 27)
      • 23. López-Quintero, M., Marín-Jiménez, M., Muñoz-Salinas, R., et al: ‘Stereo pictorial structure for 2D articulated human pose estimation’, Mach. Vis. Appl., 2015, 27, (2), pp. 157174.
    28. 28)
      • 3. Eichner, M., Marn-Jiménez, M.J., Zisserman, A., et al: ‘2D articulated human pose estimation and retrieval in (almost) unconstrained still images’, Intl. J. Comput. Vis., 2012, 99, (2), pp. 190214.
    29. 29)
      • 12. Rother, C., Kolmogorov, V., Blake, A.: ‘Grabcut: Interactive foreground extraction using iterated graph cuts’. ACM Transactions on Graphics (TOG), 2004, vol. 23, pp. 309314.
    30. 30)
      • 17. Pfister, T., Charles, J., Zisserman, A.: ‘Flowing convnets for human pose estimation in videos’. Proc. of the Intl. Conf. on Computer Vision (ICCV), December 2015, pp. 19131921.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2016.0249
Loading

Related content

content/journals/10.1049/iet-cvi.2016.0249
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading