Deep probabilistic human pose estimation

Ilia Petrov; Vlad Shakhuro; Anton Konushin

Deep probabilistic human pose estimation

View Fulltext

Author(s): Ilia Petrov¹ ; Vlad Shakhuro^{1, 2} ; Anton Konushin^{1, 2}
- Affiliations: 1: Faculty of Computational Mathematics and Cybernetics , Moscow State University , Moscow 119234 , Russia ;
  2: Faculty of Computer Science , National Research University Higher School of Economics , Moscow 101000 , Russia
Source: Volume 12, Issue 5, August 2018, p. 578 – 585
DOI: 10.1049/iet-cvi.2017.0382 , Print ISSN 1751-9632, Online ISSN 1751-9640

Received 11/08/2017, Accepted 01/02/2018, Revised 24/01/2018, Published 02/02/2018

The authors consider the problem of human pose estimation using probabilistic convolutional neural networks. They explore ways to improve human pose estimation accuracy on standard pose estimation benchmarks MPII human pose and Leeds Sports Pose (LSP) datasets using frameworks for probabilistic deep learning. Such frameworks transform deterministic neural network into a probabilistic one and allow sampling of independent and equiprobable hypotheses (different outputs) for a given input. Overlapping body parts and body joints hidden under clothes or other obstacles make the problem of human pose estimation ambiguous. In this context to get accurate estimation of joints’ position they use uncertainty in network's predictions, which is represented by variance of hypotheses, provided by a probabilistic convolutional neural network, and confidence is characterised by mean of them. Their work is based on current CNN cascades for pose estimation. They propose and evaluate three probabilistic convolutional neural networks built on top of deterministic ones with two probabilistic deep learning frameworks – DISCO networks and Bayesian SegNet. The authors evaluate their models on standard pose estimation benchmarks and show that proposed probabilistic models outperform base deterministic ones.

References

1. 1)
  - 16. Chen, X., Yuille, A. L.: ‘Articulated pose estimation by a graphical model with image dependent pairwise relations’. Advances in Neural Information Processing Systems, Montreal, Canada, 2014, pp. 1736–1744.
2. 2)
  - 6. Kendall, A., Badrinarayanan, V., Cipolla, R.: ‘Bayesian segnet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding’, CoRR abs/1511.02680, 2015, Pre-Print Version. Available at http://arxiv.org/abs/1511.02680.
3. 3)
  - 7. Kendall, A., Gal, Y.: ‘What uncertainties do we need in Bayesian deep learning for computer vision?’, CoRR abs/1511.02680, 2017, Pre-Print Version. Available at http://arxiv.org/abs/1703.04977.
4. 4)
  - 14. Yang, Y., Ramanan, D.: ‘Articulated human detection with flexible mixtures of parts’, IEEE Trans. Pattern Anal. Mach. Intell., 2013, 35, (12), pp. 2878–2890.
5. 5)
  - 22. Tompson, J., Goroshin, R., Jain, A., et al: ‘Efficient object localization using convolutional networks’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, USA, 2015, pp. 648–656.
6. 6)
  - 9. Chen, X., Yuille, A.: ‘Articulated pose estimation by graphical model with image dependent pairwise relations’. Advances in Neural Information Processing Systems, Montreal, Canada, 2014, pp. 1736–1744.
7. 7)
  - 2. Andriluka, M., Pishchulin, L., Gehler, P., et al: ‘2D human pose estimation: New benchmark and state of the art analysis’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, USA, 2014, pp. 3686–3693.
8. 8)
  - 25. Shalnov, E., Konushin, A.: ‘Human pose estimation in video via MCMC sampling’, Proc. 5th Int. Workshop on Image Mining. Theory and Applications, 2015, vol. 1, pp. 71–79.
9. 9)
  - 8. Toshev, A., Szegedy, C.: ‘DeepPose: human pose estimation via deep neural networks’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, USA, 2014, pp. 1653–1660.
10. 10)
  - 11. Sun, M., Savarese, S.: ‘Articulated part-based model for joint object detection and pose estimation’. Proc. IEEE Int. Conf. Computer Vision, Barcelona, Spain, 2011, pp. 723–730.
11. 11)
  - 3. Newell, A., Yang, K., Deng, J.: ‘Stacked hourglass networks for human pose estimation’. Proc. European Conf. Computer Vision, Amsterdam, Netherlands, 2016, pp. 483–499.
12. 12)
  - 18. Insafutdinov, E., Pishchulin, L., Andres, B., et al: ‘Deepercut: A deeper, stronger, and faster multi-person pose estimation model’. Proc. European Conf. Computer Vision, Amsterdam, the Netherlands, 2016, pp. 34–50.
13. 13)
  - 19. Wei, S. E., Ramakrishna, V., Kanade, T., et al: ‘Convolutional pose machines’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Washington, USA, 2016, pp. 4724–4732.
14. 14)
  - 12. Tian, Y., Zitnick, C.L., Narasimhan, S.G.: ‘Exploring the spatial hierarchy of mixture models for human pose estimation’. Proc. European Conf. Computer Vision, Firenze, Italy, 2012, pp. 256–269.
15. 15)
  - 4. Bouchacourt, D., Mudigonda, P. K., Nowozin, S.: ‘DISCO nets: DISsimilarity COefficients networks’. Advances in Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 352–360.
16. 16)
  - 20. Srivastava, N., Hinton, G.E., Krizhevsky, A., et al: ‘Dropout: a simple way to prevent neural networks from overfitting’, J. Mach. Learn. Res., 2014, 15, (1), pp. 1929–1958.
17. 17)
  - 23. Abadi, M., Agarwal, A., Barham, P., et al: ‘Tensorflow: large-scale machine learning on heterogeneous distributed systems’, CoRR abs/1603.04467, 2016, Pre-Print Version. Available at http://arxiv.org/abs/1603.04467.
18. 18)
  - 5. Gal, Y., Ghahramani, Z.: ‘Bayesian convolutional neural networks with Bernoulli approximate variational inference’, CoRR abs/1506.02158, 2015, Pre-Print Version. Available at http://arxiv.org/abs/1506.02158.
19. 19)
  - 15. Tompson, J. J., Jain, A., LeCun, Y., et al: ‘Joint training of a convolutional network and a graphical model for human pose estimation’. Advances in Neural Information Processing Systems, Montreal, Canada, 2014, pp. 1799–1807.
20. 20)
  - 13. Andriluka, M., Roth, S., Schiele, B.: ‘Pictorial structures revisited: people detection and articulated pose estimation’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Miami, USA, 2009, pp. 1014–1021.
21. 21)
  - 17. Pishchulin, L., Insafutdinov, E., Tang, S., et al: ‘DeepCut: joint subset partition and labeling for multi person pose estimation’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Washington, USA, 2016, pp. 4929–4937.
22. 22)
  - 24. Tieleman, T., Hinton, G.: ‘Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude’, COURSERA: Neural Netw. Mach. Learn., 2012, 4, (2), pp. 26–31.
23. 23)
  - 21. Johnson, S., Everingham, M.: ‘Learning effective human pose estimation from inaccurate annotation’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Colorado, USA, 2011, pp. 1465–1472.
24. 24)
  - 1. Johnson, S., Everingham, M.: ‘Clustered pose and nonlinear appearance models for human pose estimation’. Proc. British Machine Vision Conf., Aberystwyth, UK, September 2010, pp. 12.1–12.11.
25. 25)
  - 10. Karlinsky, L., Ullman, S.: ‘Using linking features in learning non-parametric part models’. Proc. European Conf. Computer Vision, Firenze, Italy, 2012, pp. 326–339.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Deep probabilistic human pose estimation

References

Related content