http://iet.metastore.ingenta.com
1887

Multi-view pose estimation with mixtures of parts and adaptive viewpoint selection

Multi-view pose estimation with mixtures of parts and adaptive viewpoint selection

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Computer Vision — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

We propose a new method for human pose estimation which leverages information from multiple views to impose a strong prior on articulated pose. The novelty of the method concerns the types of coherence modelled. Consistency is maximised over the different views through different terms modelling classical geometric information (coherence of the resulting poses) as well as appearance information which is modelled as latent variables in the global energy function. Moreover, adequacy of each view is assessed and their contributions are adjusted accordingly. Experiments on the HumanEva and Utrecht multi-person motion datasets show that the proposed method significantly decreases the estimation error compared to single-view results.

References

    1. 1)
      • 1. Yang, Y., Ramanan, D.: ‘Articulated human detection with flexible mixtures of parts’, IEEE Trans PAMI, 2013, 35, (12), pp. 28782890.
    2. 2)
      • 2. Felzenszwalb, P.F., Huttenlocher, D.P.: ‘Pictorial structures for object recognition’, IJCV, 2005, 61, (1), pp. 5579.
    3. 3)
      • 3. Sapp, B., Jordan, C., Taskar, B.: ‘Adaptive pose priors for pictorial structures’. Conf. Computer Vision and Pattern Recognition, San Francisco, CA, 2010, pp. 422429.
    4. 4)
      • 4. Dantone, M., Gall, J., Leistner, C., et al: ‘Body parts dependent joint regressors for human pose estimation in still images’, IEEE Trans. PAMI, 2014, 36, (11), pp. 21312143.
    5. 5)
      • 5. Sigal, L., Balan, A., Black, M.J.: ‘Combined discriminative and generative articulated pose and non-rigid shape estimation’. Neural Information Processing Systems, Vancouver, Canada, 2008, pp. 13371344.
    6. 6)
      • 6. Zhang, D., Shah, M.: ‘Human pose estimation in videos’. Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 20122020.
    7. 7)
      • 7. Cherian, A., Mairal, J., Alahari, K., et al: ‘Mixing body-part sequences for human pose estimation’. Conf. Computer Vision and Pattern Recognition, Columbus, OH, 2014, pp. 23612368.
    8. 8)
      • 8. Pishchulin, L., Andriluka, M., Gehler, P., et al: ‘Poselet conditioned pictorial structures’. Conf. Computer Vision and Pattern Recognition, Portland, Oregon, 2013, pp. 588595.
    9. 9)
      • 9. Kiefel, M., Gehler, P.: ‘Human pose estimation with fields of parts’. European Conf. Computer Vision, Zurich, Switzerland, 2014, pp. 331346.
    10. 10)
      • 10. Eichner, M., Ferrari, V.: ‘Appearance sharing for collective human pose estimation’. Asian Conf. Computer Vision, Daejeon, Korea, 2013, pp. 138151.
    11. 11)
      • 11. Wang, C., Wang, Y., Lin, Z., et al: ‘Robust estimation of 3D human poses from a single image’. Conf. Computer Vision and Pattern Recognition, Columbus, OH, 2014, pp. 23692376.
    12. 12)
      • 12. Cho, E., Kim, D.: ‘Accurate human pose estimation by aggregating multiple pose hypotheses using modified kernel density approximation’, IEEE Signal Process. Lett., 2015, 22, (4), pp. 445449.
    13. 13)
      • 13. Sigal, L., Isard, M., Haussecker, H., et al: ‘Loose-limbed people: estimating 3D human pose and motion using non-parametric belief propagation’, IJCV, 2011, 98, (1), pp. 1548.
    14. 14)
      • 14. Burenius, M., Sullivan, J., Carlsson, S.: ‘3D pictorial structures for multiple view articulated pose estimation’. Conf. Computer Vision and Pattern Recognition, Portland, OR, 2013, pp. 36183625.
    15. 15)
      • 15. Schick, A., Stiefelhagen, R.: ‘3D pictorial structures for human pose estimation with supervoxels’. IEEE Winter Conf. Applications of Computer Vision, Hawaii, Hawaii, 2015, pp. 140147.
    16. 16)
      • 16. Belagiannis, V., Amin, S., Andriluka, M., et al: ‘3D pictorial structures revisited: multiple human pose estimation’, IEEE T on PAMI, 2015, PP, (99), pp. 11.
    17. 17)
      • 17. Canton Ferrer, C., Casas, J.R., Pardas, M.: ‘Voxel based annealed particle filtering for markerless 3D articulated motion capture’. 3DTV, Potsdam, Germany, 2009, pp. 14.
    18. 18)
      • 18. Zuffi, S., Black, M.J.: ‘The stitched puppet: a graphical model of 3D human shape and pose’. Conf. Computer Vision and Pattern Recognition, Boston, MA, 2015, pp. 35373546.
    19. 19)
      • 19. Hofmann, M., Gavrila, D.M.: ‘Multi-view 3D human pose estimation combining single-frame recovery, temporal integration and model adaptation’. Conf. Computer Vision and Pattern Recognition, Miami, FL, 2009, pp. 22142221.
    20. 20)
      • 20. Kazemi, V., Burenius, M., Azizpour, H., et al: ‘Multi-view body part recognition with random forests’. British Machine Vision Conf., Bristol, UK, 2013.
    21. 21)
      • 21. Puwein, J., Ballan, L., Ziegler, R., et al: ‘Joint camera pose estimation and 3D human pose estimation in a multi-camera setup’. Asian Conf. Computer Vision, Singapore, 2014, pp. 473487.
    22. 22)
      • 22. Amin, S., Andriluka, M., Rohrbach, M., et al: ‘Multi-view pictorial structures for 3D human pose estimation’. British Machine Vision Conf., Bristol, UK, 2013.
    23. 23)
      • 23. Felzenszwalb, P.F., Huttenlocher, D.P.: ‘Distance transforms of sampled functions.’, Theory Comput., 2012, 8, (1), pp. 415428.
    24. 24)
      • 24. Xiaohan.Nie, B., Xiong, C., Zhu, S.C.: ‘Joint action recognition and pose estimation from video’. Conf. Computer Vision and Pattern Recognition, Boston, MA, 2015, pp. 12931301.
    25. 25)
      • 25. Park, D., Ramanan, D.: ‘Articulated pose estimation with tiny synthetic videos’. Conf. Computer Vision and Pattern Recognition Workshop, Boston, MA, 2015, pp. 5866.
    26. 26)
      • 26. Agarwal, A., Triggs, B.: ‘Recovering 3D human pose from monocular images’, IEEE Trans. PAMI, 2006, 28, (1), pp. 4458.
    27. 27)
      • 27. Bo, L., Sminchisescu, C., Kanaujia, A., et al: ‘Fast algorithms for large scale conditional 3D prediction’. Conf. Computer Vision and Pattern Recognition, Anchorage, Alaska, 2008, pp. 18.
    28. 28)
      • 28. Urtasun, R., Darrell, T.: ‘Sparse probabilistic regression for activity-independent human pose inference’. Conf. Computer Vision and Pattern Recognition, Anchorage, Alaska, 2008, pp. 18.
    29. 29)
      • 29. Ouyang, W., Chu, X., Wang, X.: ‘Multi-source deep learning for human pose estimation’. Conf. Computer Vision and Pattern Recognition, Columbus, OH, 2014, pp. 23372344.
    30. 30)
      • 30. Fan, X., Zheng, K., Lin, Y., et al: ‘Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation’. Conf. Computer Vision and Pattern Recognition, Boston, MA, 2015, pp. 13471355.
    31. 31)
      • 31. Tompson, J.J., Jain, A., LeCun, Y., et al: ‘Joint training of a convolutional network and a graphical model for human pose estimation’. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 17991807.
    32. 32)
      • 32. Toshev, A., Szegedy, C.: ‘Deeppose: human pose estimation via deep neural networks’. Conf. Computer Vision and Pattern Recognition, Columbus, OH, 2014, pp. 16531660.
    33. 33)
      • 33. Chen, X., Yuille, A.L.: ‘Articulated pose estimation by a graphical model with image dependent pairwise relations’. Advances in Neural Information Processing Systems 27, Columbus, OH, 2014, pp. 17361744.
    34. 34)
      • 34. Carreira, J., Agrawal, P., Fragkiadaki, K., et al: ‘Human pose estimation with iterative error feedback’. Conf. Computer Vision and Pattern Recognition, Las Vegas, Nevada, 2016, pp. 47334742.
    35. 35)
      • 35. Yang, W., Ouyang, W., Li, H., et al: ‘End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation’. Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, 2016, pp. 30733082.
    36. 36)
      • 36. Chu, X., Ouyang, W., Li, H., et al: ‘Structured feature learning for pose estimation’. Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, 2016, pp. 47154723.
    37. 37)
      • 37. Newell, A., Yang, K., Deng, J.: ‘Stacked hourglass networks for human pose estimation’. European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 483499.
    38. 38)
      • 38. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., et al: ‘Object detection with discriminatively trained part-based models’, IEEE Trans. PAMI, 2010, 32, (9), pp. 16271645.
    39. 39)
      • 39. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’, CoRR, 2014, abs/1409.1556.
    40. 40)
      • 40. Sigal, L., Balan, A.O., Black, M.J.: ‘Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion’, IJCV, 2010, 87, (1–2), pp. 427.
    41. 41)
      • 41. van der Aa, N.P., Luo, X., Giezeman, G.J., et al: ‘Umpm benchmark: a multi-person dataset with synchronized video and motion capture data for evaluation of articulated human motion and interaction’. HICV/Int. Conf. Computer Vision Workshops 2011, Barcelona, Spain, 2011, pp. 12641269.
    42. 42)
      • 42. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. Conf. Computer Vision and Pattern Recognition, San Diego, CA, 2005, vol. 1, pp. 886893.
    43. 43)
      • 43. Neverova, N., Wolf, C., Taylor, G.W., et al: ‘Hand pose estimation through weakly-supervised learning of a rich intermediate representation’ (Pre-print: arxiv:151106728, 2015).
    44. 44)
      • 44. Fourure, D., Emonet, R., Fromont, E., et al: ‘Multi-task, multi-domain learning: application to semantic segmentation and pose regression’, 2017, 251, pp. 6880.
    45. 45)
      • 45. Srivastava, N., Hinton, G., Krizhevsky, A., et al: ‘Dropout: a simple way to prevent neural networks from overfitting’, J. Mach. Learn. Res., 2014, 15, (1), pp. 19291958.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2017.0146
Loading

Related content

content/journals/10.1049/iet-cvi.2017.0146
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address