access icon openaccess RGB-D static gesture recognition based on convolutional neural network

In the area of human–computer interaction (HCI) and computer vision, gesture recognition has always been a research hotspot. With the appearance of depth camera, gesture recognition using RGB-D camera has gradually become mainstream in this field. However, how to effectively use depth information to construct a robust gesture recognition system is still a problem. In this paper, an RGB-D static gesture recognition method based on fine-tuning Inception V3 is proposed, which can eliminate the steps of gesture segmentation and feature extraction in traditional algorithms. Compared with general CNN algorithms, the authors adopt a two-stage training strategy to fine-tune the model. This method sets a feature concatenate layer of RGB and depth images in the CNN structure, using depth information to promote the performance of gesture recognition. Finally, on the American Sign Language (ASL) Recognition dataset, the authors compared their method with other traditional machine learning methods, CNN algorithms, and the RGB input only method. Among three groups of comparative experiments, the authors’ method reached the highest accuracy of 91.35%, reaching the state-of-the-art currently on ASL dataset.

Inspec keywords: human computer interaction; feature extraction; image segmentation; computer vision; image colour analysis; gesture recognition; neural nets; learning (artificial intelligence)

Other keywords: RGB-D camera; traditional machine learning methods; depth images; RGB-D static gesture recognition method; fine-tuning Inception V3; robust gesture recognition system; RGB input; computer vision; feature extraction; ASL recognition dataset; depth information; gesture segmentation; American Sign Language Recognition dataset; depth camera; CNN algorithms; RGB input only method; CNN structure; convolutional neural network

Subjects: Image recognition; Computer vision and image processing techniques; Knowledge engineering techniques; Neural nets (theory)

References

    1. 1)
      • 1. Pisharady, P.K., Saerbeck, M.: ‘Recent methods and databases in vision-based hand gesture recognition: a review[J]’, Comput. Vis. Image Underst., 2015, 141, pp. 152165.
    2. 2)
      • 7. Szegedy, C., Vanhoucke, V., Ioffe, S., et al: ‘Rethinking the inception architecture for computer vision[C]’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, USA, June 2016, pp. 28182826.
    3. 3)
      • 16. Li, Y., Wang, X., Liu, W., et al: ‘Deep attention network for joint hand gesture localization and recognition using static RGB-D images[J]’, Inf. Sci., 2018, 441, pp. 6678.
    4. 4)
      • 2. Hasan, H., Abdul-Kareem, S.: ‘Human–computer interaction using vision-based hand gesture recognition systems: a survey[J]’, Neural Comput. Appl., 2014, 25, (2), pp. 251261.
    5. 5)
      • 13. Han, M., Chen, J., Li, L., et al: ‘Visual hand gesture recognition with convolution neural network[C]’. IEEE/acis Int. Conf. on Software Engineering, Artificial Intelligence, NETWORKING and Parallel/distributed Computing, Shanghai, China, June 2016, pp. 287291.
    6. 6)
      • 14. Shih, H.C., Liu, E.R.: ‘Machine-to-machine interaction based on remote 3D arm pointing using single RGBD camera[J]’, Lect. Notes Electr. Eng., 2014, 260, pp. 11091114.
    7. 7)
      • 12. Zhang, C., Yang, X., Tian, Y.L.: ‘Histogram of 3D facets: a characteristic descriptor for hand gesture recognition[C]’. IEEE Int. Conf. and Workshops on Automatic Face and Gesture Recognition, Shanghai, China, April 2013, pp. 18.
    8. 8)
      • 11. Estrela, B., Camarachavez, G., Campos, M.F., et al: ‘Sign language recognition using partial least squares and RGB-D information[C]’. Proceedings of the IX Workshop de Visao Computacional, WVC. 2013, Rio de Janeiro, Brazil, June 2013.
    9. 9)
      • 15. Chen, X., Koskela, M.: ‘Using appearance-based hand features for dynamic RGB-D gesture recognition[C]’. IEEE Int. Conf. on Pattern Recognition, Stockholm, Sweden, August 2014, pp. 411416.
    10. 10)
      • 10. Pugeault, N., Bowden, R.: ‘Spelling it out: real-time ASL fingerspelling recognition[C]’. IEEE Int. Conf. on Computer Vision Workshops, Barcelona, Spain, November 2011, pp. 11141119.
    11. 11)
      • 9. Esteva, A., Kuprel, B., Novoa, R.A., et al: ‘Dermatologist-level classification of skin cancer with deep neural networks[J]’, Nature, 2017, 542, (7639), p. 115.
    12. 12)
      • 3. Khan, R.Z., Ibraheem, N.A.: ‘Hand gesture recognition: a literature review[J]’, Int. J. Artif. Intell. Appl., 2012, 3, (4), p. 161.
    13. 13)
      • 8. Gulshan, V., Peng, L., Coram, M., et al: ‘Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs[J]’, JAMA, 2016, 316, (22), pp. 24022410.
    14. 14)
      • 5. Szegedy, C., Liu, W., Jia, Y., et al: ‘Going deeper with convolutions[C]’. IEEE Comput. Vis. Pattern Recognit., 2015, pp. 19, 1, (9).
    15. 15)
      • 4. Zhang, Z.: ‘Microsoft kinect sensor and its effect[J]’, IEEE Multimed., 2012, 19, (2), pp. 410.
    16. 16)
      • 6. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition[J]’, arXiv preprint arXiv:1409.1556, 2014.
http://iet.metastore.ingenta.com/content/journals/10.1049/joe.2018.8327
Loading

Related content

content/journals/10.1049/joe.2018.8327
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading