© The Institution of Engineering and Technology
Hand pose is emerging as an important interface for human–computer interaction. This study presents a data-driven method to estimate a high-quality depth map of a hand from a stereoscopic camera input by introducing a novel superpixel-based regression framework that takes advantage of the smoothness of the depth surface of the hand. To this end, the authors introduce conditional regressive random forest (CRRF), a method that combines a conditional random field (CRF) and an RRF to model the mapping from a stereo red, green and blue image pair to a depth image. The RRF provides a unary term that adaptively selects different stereo-matching measures as it implicitly determines matching pixels in a coarse-to-fine manner. While the RRF makes depth prediction for each superpixel independently, the CRF unifies the prediction of depth by modelling pairwise interactions between adjacent superpixels. Experimental results show that CRRF can generate a depth image more accurately than the leading contemporary techniques using an inexpensive stereo camera.
References
-
-
1)
-
26. HirschmÃijller, H., Scharstein, D.: ‘Evaluation of cost functions for stereo matching’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Minneapolis, MN, USA, June 2007.
-
2)
-
11. Fanello, S., Keskin, C., Izadi, S., et al: ‘Learning to be a depth camera for close-range human capture and interaction’, J. of ACM (Assoc. Comput. Mach.) Trans. Graph., 2014, 33, (4), pp. 86.1–86.11.
-
3)
-
2. Ding, M., Fan, G.: ‘Articulated and generalized Gaussian kernel correlation for human pose estimation’, IEEE Trans. Image Process., 2016, 25, (2), pp. 776–789.
-
4)
-
23. Murphy, K.P.: ‘Machine learning – a probabilistic perspective’ (MIT Press, Cambridge, 2012).
-
5)
-
20. Liu, K., Kehtarnavaz, N.J.: ‘Real-time robust vision-based hand gesture recognition using stereo images’, J. Real-Time Image Process., 2016, 11, (1), pp. 201–209.
-
6)
-
5. Oikonomidis, I., Kyriazis, N., Argyros, A.A.: ‘Full Dof tracking of a hand interacting with an object by modelling occlusions and physical constraints’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, June 2011.
-
7)
-
12. Phung, S., Bouzerdoum, A., Chai, D.: ‘Skin segmentation using color pixel classification: analysis and comparison’, IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27, (1), pp. 148–154.
-
8)
-
22. Criminisi, A., Shotton, J.: ‘Decision forests for computer vision and medical image analysis’ (Springer, Berlin, 2013).
-
9)
-
7. Ye, M., Shen, Y., Du, C.: ‘Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera’, IEEE Trans. Pattern Anal. Mach. Intell., 2016, 38, (8), pp. 1517–1532.
-
10)
-
16. Liu, B., Gould, S., Koller, D.: ‘Single image depth estimation from predicted semantic labels’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, San Francisco, CA, USA, June 2010.
-
11)
-
19. Payet, N., Todorovic, S.: ‘Random forest random field’. Proc. Advances in Neural Information Processing Systems, Vancouver, British Columbia, Canada, 2010.
-
12)
-
21. Grzeszczuk, R., Bradski, G., Chu, M.H., et al: ‘Stereo based gesture recognition invariant to 3d pose and lighting’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Hilton Heads Island, SC, USA, June 2000.
-
13)
-
8. Achanta, R., Shaji, A., Smith, K., et al: ‘SLIC superpixels compared to state-of-the-art superpixel methods’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, (11), pp. 2274–2282.
-
14)
-
15. Middlebury Dataset Website. .
-
15)
-
1. Microsoft HoloLens Website. .
-
16)
-
10. Romero, J., Kjellström, H., Kragic, D.: ‘Monocular real-time 3d articulated hand pose estimation’. 9th IEEE-RAS International Conference on Humanoid Robots, Paris, France, December 2009, pp. 87–92.
-
17)
-
25. Thacker, N., Aherne, F., Rockett, P.: ‘The Bhattacharyya metric as an absolute similarity measure for frequency coded data’. Techniques in Pattern Recognition, Prague, Czech Republic, June 1997.
-
18)
-
18. Eigen, D., Puhrsch, C., Fergus, R.: ‘Depth Map prediction from a single image using a multi-scale deep network’. Proc. Advances in Neural Information Processing Systems, Montreal, Quebec, Canada, 2014.
-
19)
-
30. Basaru, R., Alonso, E., Child, C., et al: ‘Handydepth: example-based stereoscopic hand depth estimation using eigenleaf node features’. Proc. IWSSIP Int. Conf., Bratislava, Slovakia, May 2016.
-
20)
-
27. Zhang, Z.: ‘Flexible camera calibration by viewing a plane from unknown orientations’. Proc. Int. Conf. Computer Vision, Corfu, Greece, September 1999.
-
21)
-
28. Basaru, R., Alonso, E., Child, C., et al: ‘Quantized census for stereoscopic image matching’. Proc. 3DV Conf. Workshop, Dynamic Shape Measurement and Analysis, Tokyo, Japan, December 2014.
-
22)
-
17. Saxena, A., Chung, S.H., Ng, A.Y.: ‘Learning depth from single monocular images’. Proc. Advances in Neural Information Processing Systems, Vancouver, British Columbia, Canada, 2005.
-
23)
-
14. Hasan, M.M., Mishra, P.K.: ‘Novel algorithm for skin color based segmentation using mixture of GMMs’, Signal Image Process. Int. J., 2013, 4, (4), pp. 139–148.
-
24)
-
31. Hirschmuller, H.: ‘Accurate and efficient stereo processing by semi-global matching and mutual information’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, San Diego, CA, USA, June 2005.
-
25)
-
13. Hasan, M.M., Mishra, P.K.: ‘Superior skin color model using multiple of Gaussian mixture model’, British J. Sci., 2012, 6, (1), pp. 1–14.
-
26)
-
6. Oikonomidis, I., Kyriazis, N., Argyros, A.A.: ‘Tracking the articulated motion of two strongly interacting hands’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Providence, RI, USA, June 2012.
-
27)
-
29. Pietikainen, M., Hadid, A., Zhao, G., et al: ‘Computer vision using local binary patterns’ (Springer-Verlag, London, 2011).
-
28)
-
24. Liu, F., Gould, S., Shen, C.: ‘Deep convolutional neural fields for depth estimation from a single image’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, OH, USA, June 2014.
-
29)
-
32. DLR – Institute of Robotics and Mechatronics Website. .
-
30)
-
3. Shotton, J., Fitzgibbon, A., Cook, M., et al: ‘Real-time human pose recognition in parts from single depth images’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, June 2011.
-
31)
-
9. Keskin, C., Kiraç, F., Kara, Y.E., et al: ‘Real time hand pose estimation using depth sensors’, in Fossati, A., Gall, J., Grabner, H. (Eds.): ‘Consumer Depth Cameras for Computer Vision’ (Springer, New York, 2013), pp. 119–137.
-
32)
-
4. Sun, M., Kohli, P., Shotton, J.: ‘Conditional regression forests for human pose estimation’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Providence, RI, USA, June 2012.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2017.0227
Related content
content/journals/10.1049/iet-cvi.2017.0227
pub_keyword,iet_inspecKeyword,pub_concept
6
6