access icon free Data-driven recovery of hand depth using CRRF on stereo images

Hand pose is emerging as an important interface for human–computer interaction. This study presents a data-driven method to estimate a high-quality depth map of a hand from a stereoscopic camera input by introducing a novel superpixel-based regression framework that takes advantage of the smoothness of the depth surface of the hand. To this end, the authors introduce conditional regressive random forest (CRRF), a method that combines a conditional random field (CRF) and an RRF to model the mapping from a stereo red, green and blue image pair to a depth image. The RRF provides a unary term that adaptively selects different stereo-matching measures as it implicitly determines matching pixels in a coarse-to-fine manner. While the RRF makes depth prediction for each superpixel independently, the CRF unifies the prediction of depth by modelling pairwise interactions between adjacent superpixels. Experimental results show that CRRF can generate a depth image more accurately than the leading contemporary techniques using an inexpensive stereo camera.

Inspec keywords: image segmentation; image classification; cameras; regression analysis; image matching; stereo image processing; pose estimation

Other keywords: conditional random field; depth surface; important interface; stereoscopic camera input; high-quality depth map; green image pair; different stereo-matching measures; conditional regressive random forest; blue image pair; stereo images; pairwise interactions; CRRF; depth image; stereo red image pair; hand depth; novel superpixel; human–computer interaction; data-driven method; regression framework; CRF; RRF; depth prediction; adjacent superpixels; inexpensive stereo camera; hand pose

Subjects: Optical, image and video signal processing; Computer vision and image processing techniques; Other topics in statistics

References

    1. 1)
      • 26. HirschmÃijller, H., Scharstein, D.: ‘Evaluation of cost functions for stereo matching’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Minneapolis, MN, USA, June 2007.
    2. 2)
      • 11. Fanello, S., Keskin, C., Izadi, S., et al: ‘Learning to be a depth camera for close-range human capture and interaction’, J. of ACM (Assoc. Comput. Mach.) Trans. Graph., 2014, 33, (4), pp. 86.186.11.
    3. 3)
      • 2. Ding, M., Fan, G.: ‘Articulated and generalized Gaussian kernel correlation for human pose estimation’, IEEE Trans. Image Process., 2016, 25, (2), pp. 776789.
    4. 4)
      • 23. Murphy, K.P.: ‘Machine learning – a probabilistic perspective’ (MIT Press, Cambridge, 2012).
    5. 5)
      • 20. Liu, K., Kehtarnavaz, N.J.: ‘Real-time robust vision-based hand gesture recognition using stereo images’, J. Real-Time Image Process., 2016, 11, (1), pp. 201209.
    6. 6)
      • 5. Oikonomidis, I., Kyriazis, N., Argyros, A.A.: ‘Full Dof tracking of a hand interacting with an object by modelling occlusions and physical constraints’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, June 2011.
    7. 7)
      • 12. Phung, S., Bouzerdoum, A., Chai, D.: ‘Skin segmentation using color pixel classification: analysis and comparison’, IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27, (1), pp. 148154.
    8. 8)
      • 22. Criminisi, A., Shotton, J.: ‘Decision forests for computer vision and medical image analysis’ (Springer, Berlin, 2013).
    9. 9)
      • 7. Ye, M., Shen, Y., Du, C.: ‘Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera’, IEEE Trans. Pattern Anal. Mach. Intell., 2016, 38, (8), pp. 15171532.
    10. 10)
      • 16. Liu, B., Gould, S., Koller, D.: ‘Single image depth estimation from predicted semantic labels’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, San Francisco, CA, USA, June 2010.
    11. 11)
      • 19. Payet, N., Todorovic, S.: ‘Random forest random field’. Proc. Advances in Neural Information Processing Systems, Vancouver, British Columbia, Canada, 2010.
    12. 12)
      • 21. Grzeszczuk, R., Bradski, G., Chu, M.H., et al: ‘Stereo based gesture recognition invariant to 3d pose and lighting’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Hilton Heads Island, SC, USA, June 2000.
    13. 13)
      • 8. Achanta, R., Shaji, A., Smith, K., et al: ‘SLIC superpixels compared to state-of-the-art superpixel methods’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, (11), pp. 22742282.
    14. 14)
      • 15. Middlebury Dataset Website. Available at http://vision.middlebury.edu/stereo/data/, accessed 27th May 2017.
    15. 15)
      • 1. Microsoft HoloLens Website. Available at https://www.microsoft.com/en-gb/hololens, accessed 18th January 2018.
    16. 16)
      • 10. Romero, J., Kjellström, H., Kragic, D.: ‘Monocular real-time 3d articulated hand pose estimation’. 9th IEEE-RAS International Conference on Humanoid Robots, Paris, France, December 2009, pp. 8792.
    17. 17)
      • 25. Thacker, N., Aherne, F., Rockett, P.: ‘The Bhattacharyya metric as an absolute similarity measure for frequency coded data’. Techniques in Pattern Recognition, Prague, Czech Republic, June 1997.
    18. 18)
      • 18. Eigen, D., Puhrsch, C., Fergus, R.: ‘Depth Map prediction from a single image using a multi-scale deep network’. Proc. Advances in Neural Information Processing Systems, Montreal, Quebec, Canada, 2014.
    19. 19)
      • 30. Basaru, R., Alonso, E., Child, C., et al: ‘Handydepth: example-based stereoscopic hand depth estimation using eigenleaf node features’. Proc. IWSSIP Int. Conf., Bratislava, Slovakia, May 2016.
    20. 20)
      • 27. Zhang, Z.: ‘Flexible camera calibration by viewing a plane from unknown orientations’. Proc. Int. Conf. Computer Vision, Corfu, Greece, September 1999.
    21. 21)
      • 28. Basaru, R., Alonso, E., Child, C., et al: ‘Quantized census for stereoscopic image matching’. Proc. 3DV Conf. Workshop, Dynamic Shape Measurement and Analysis, Tokyo, Japan, December 2014.
    22. 22)
      • 17. Saxena, A., Chung, S.H., Ng, A.Y.: ‘Learning depth from single monocular images’. Proc. Advances in Neural Information Processing Systems, Vancouver, British Columbia, Canada, 2005.
    23. 23)
      • 14. Hasan, M.M., Mishra, P.K.: ‘Novel algorithm for skin color based segmentation using mixture of GMMs’, Signal Image Process. Int. J., 2013, 4, (4), pp. 139148.
    24. 24)
      • 31. Hirschmuller, H.: ‘Accurate and efficient stereo processing by semi-global matching and mutual information’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, San Diego, CA, USA, June 2005.
    25. 25)
      • 13. Hasan, M.M., Mishra, P.K.: ‘Superior skin color model using multiple of Gaussian mixture model’, British J. Sci., 2012, 6, (1), pp. 114.
    26. 26)
      • 6. Oikonomidis, I., Kyriazis, N., Argyros, A.A.: ‘Tracking the articulated motion of two strongly interacting hands’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Providence, RI, USA, June 2012.
    27. 27)
      • 29. Pietikainen, M., Hadid, A., Zhao, G., et al: ‘Computer vision using local binary patterns’ (Springer-Verlag, London, 2011).
    28. 28)
      • 24. Liu, F., Gould, S., Shen, C.: ‘Deep convolutional neural fields for depth estimation from a single image’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, OH, USA, June 2014.
    29. 29)
      • 32. DLR – Institute of Robotics and Mechatronics Website. Available at http://www.dlr.de/rm/en/desktopdefault.aspx/tabid-9389/16104_read-39811, accessed 27th January 2018.
    30. 30)
      • 3. Shotton, J., Fitzgibbon, A., Cook, M., et al: ‘Real-time human pose recognition in parts from single depth images’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, June 2011.
    31. 31)
      • 9. Keskin, C., Kiraç, F., Kara, Y.E., et al: ‘Real time hand pose estimation using depth sensors’, in Fossati, A., Gall, J., Grabner, H. (Eds.): ‘Consumer Depth Cameras for Computer Vision’ (Springer, New York, 2013), pp. 119137.
    32. 32)
      • 4. Sun, M., Kohli, P., Shotton, J.: ‘Conditional regression forests for human pose estimation’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Providence, RI, USA, June 2012.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2017.0227
Loading

Related content

content/journals/10.1049/iet-cvi.2017.0227
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading