With the rapid development of sensor technology and artificial intelligence, the video gesture recognition technology under the background of big data makes human–computer interaction more natural and flexible, bringing the richer interactive experience to teaching, on-board control, electronic games etc. To perform robust recognition under the conditions of illumination change, background clutter, rapid movement, and partial occlusion, an algorithm based on multi-level feature fusion of two-stream convolutional neural network is proposed, which includes three main steps. Firstly, the Kinect sensor obtains red–green–blue-depth (RGB-D) images to establish a gesture database. At the same time, data enhancement is performed on the training set and test set. Then, a model of multi-level feature fusion of a two-stream convolutional neural network is established and trained. Experiments show that the proposed network model can robustly track and recognise gestures under complex backgrounds (such as similar complexion, illumination changes, and occlusion), and compared with the single-channel model, the average detection accuracy is improved by 1.08%, and mean average precision is improved by 3.56%.

References

1. 1)
  - 9. Chen, M., AIRegib, G., Juang, B.H.: ‘Feature processing and modeling for 6D motion gesture recognition’, IEEE Trans. Multimed., 2013, 15, (3), pp. 561–571.
2. 2)
  - 39. Bo, L., Ren, X., Fox, D.: ‘Unsupervised feature learning for RGB-D based object recognition’. Experimental Robotics, 2013, 88, pp. 387–402.
3. 3)
  - 23. Hu, J. B., Sun, Y., Li, G.F., et al: ‘Probability analysis for grasp planning facing the field of medical robotics’, Measurement, 2019, 141, pp. 227–234.
4. 4)
  - 17. He, Y., Li, G.F., Liao, Y.J., et al: ‘Gesture recognition based on an improved local sparse representation classification algorithm’, Cluster Comput., 2019, 22, (Suppl. 5), pp. 10935–10946.
5. 5)
  - 31. Luo, B.W., Sun, Y., Li, G.F., et al: ‘Decomposition algorithm for depth image of human health posture based on brain health’, Neural Comput. Appl., 2019. Available at https://doi.org/10.1007/s00521-019-04141-9.
6. 6)
  - 18. Li, G.F., Zhang, L.L., Sun, Y., et al: ‘Towards the sEMG hand: internet of things sensors and haptic feedback application’, Multimedia Tools Appl., 2019, 78, (21), pp. 29765–29782.
7. 7)
  - 15. Ma, R.Y., Zhang, L.L., Li, G.F., et al: ‘Grasping force prediction based on sEMG signals’, Alexandria Eng. J., 2020, 59, (3), pp. 1135–1147.
8. 8)
  - 2. Chakraborty, B.K., Sarma, D., Bhuyan, M.K., et al: ‘Review of constraints on vision-based gesture recognition for human–computer interaction’, IET Comput. Vis., 2018, 12, (1), pp. 3–15.
9. 9)
  - 10. Wang, R., Paris, S., Popović, J.: ‘6D hands: markerless hand-tracking for computer aided design’. Proc. Int. Conf. on the 24th Annual ACM Symp. on User Interface Software and Technology, 2011, pp. 549–558.
10. 10)
  - 16. Miao, W., Li, G.F., Jiang, G.Z., et al: Optimal grasp planning of multi-fingered robotic hands: a review. Appl. Comput. Math., 2015, 14, (3), pp. 238–247.
11. 11)
  - 38. Socher, R., Huval, B., Bath, B., et al: ‘Convolutional-recursive deep learning for 3d object classification’. Proc. Int. Conf. on Advances in Neural Information Processing Systems, Stanford, CA, USA, 2012, pp. 656–664.
12. 12)
  - 40. Li, Y., Wang, X., Liu, W., et al: ‘Deep attention network for joint hand gesture localization and recognition using static RGB-D images’, Inf. Sci., 2018, 441, pp. 66–78.
13. 13)
  - 30. Li, G.F., Jiang, D., Zhou, Y.L., et al: ‘Human lesion detection method based on image information and brain signal’, IEEE Access, 2019, 7, pp. 11533–11542.
14. 14)
  - 22. Liao, Y.J., Sun, Y., Li, G.F., et al: ‘Simultaneous calibration: a joint optimization approach for multiple Kinect and external cameras’, Sensors, 2017, 17, (7), p. 1491.
15. 15)
  - 5. Li, B., Sun, Y., Li, G.F., et al: ‘Gesture recognition based on modified adaptive orthogonal matching pursuit algorithm’, Cluster Comput.., 2019, 22, (Suppl. 1), pp. 503–512.
16. 16)
  - 4. Chen, C., Yao, Y., Hsu, W.W., et al: ‘Continuous camera placement using multiple objective optimisation process’, IET Comput. Vis., 2014, 9, (3), pp. 340–353.
17. 17)
  - 14. Sun, Y., Xu, C., Li, G.F., et al: ‘Intelligent human computer interaction based on redundant EMG signal’, Alexandria Eng. J., 2020, 59, (3), pp. 1135–1147.
18. 18)
  - 8. Qi, J.X., Jiang, G.Z., Li, G.F., et al: ‘Intelligent human-computer interaction based on surface EMG gesture recognition’, IEEE Access, 2019, 7, pp. 61378–61387.
19. 19)
  - 37. Blum, M., Springenberg, J.T., Wülfing, J., et al: ‘A learned feature descriptor for object recognition in RGB-D data’. Proc. IEEE Int. Conf. on Robotics and Automation, St Paul, MN, USA, 2012, pp. 1298–1303.
20. 20)
  - 24. Li, G. F., Li, J.H., Ju, Z.J., et al: ‘A novel feature extraction method for machine learning based on surface electromyography from healthy brain’, Neural Comput. Appl., 2019, 31, (12), pp. 9013–9022.
21. 21)
  - 3. Kong, Y., Ding, Z., Li, J., et al: ‘Deeply learned view-invariant features for cross view action recognition’, IEEE Trans. Image Process., 2017, 26, (6), pp. 3028–3037.
22. 22)
  - 20. Jiang, D., Zheng, Z.J., Li, G.F., et al: ‘Gesture recognition based on binocular vision’, Cluster Comput.., 2019, 22, (Suppl. 6), pp. 13261–13271.
23. 23)
  - 12. Xie, R., Cao, J.: ‘Accelerometer-based hand gesture recognition by neural network and similarity matching’, IEEE Sensors, 2016, 16, (11), pp. 4537–4545.
24. 24)
  - 27. Li, C.C., Li, G.F., Li, G.Z., et al: ‘Surface EMG data aggregation processing for intelligent prosthetic action recognition’, Neural Comput. Appl., 2018. Available at https://doi.org/10.1007/s00521-018-3909-z.
25. 25)
  - 35. Hu, H.: ‘Saliency and depth-based unsupervised object segmentation’, IET Image Process.., 2016, 10, (11), pp. 893–899.
26. 26)
  - 34. Bhuyan, M.K.: ‘FSM-based recognition of dynamic hand gestures via gesture summarization using key video object planes’, World Acad. Sci., Eng. Technol., 2012, 6, (68), pp. 724–735.
27. 27)
  - 6. Chen, D.S., Li, G.F., Sun, Y., et al: ‘An interactive image segmentation method in hand gesture recognition’, Sensors, 2017, 17, (2), p. 253.
28. 28)
  - 28. Tan, C., Sun, Y., Li, G.F., et al: ‘Research on gesture recognition of smart data fusion features in the IOT’, Neural Comput. Appl., 2019. Available at https://doi.org/10.1007/s00521-019-04023-0.
29. 29)
  - 1. Sagayam, K.M., Hemanth, D.J.: ‘Hand posture and gesture recognition techniques for virtual reality applications: a survey’, Virtual Real., 2017, 21, (2), pp. 91–107.
30. 30)
  - 21. Li, G.F., Tang, H., Sun, Y., et al: ‘Hand gesture recognition based on convolution neural network’, Cluster Comput.., 2019, 22, (Suppl. 2), pp. 2719–2729.
31. 31)
  - 25. Li, G.F., Wu, H., Jiang, G.Z., et al: ‘Dynamic gesture recognition in the internet of things’, IEEE Access, 2019, 7, pp. 23713–23724.
32. 32)
  - 19. Jiang, D., Li, G.F., Sun, Y., et al: ‘Gesture recognition based on skeletonization algorithm and CNN with ASL database’, Multimedia Tools Appl.., 2019, 78, (21), pp. 29953–29970.
33. 33)
  - 11. Gupta, H.P., Chudgar, H.S., Mukheriee, S.: ‘A continuous hand gestures recognition technique for human-machine interaction using accelerometer and gyro scope sensors’, IEEE Sensors, 2016, 16, (16), pp. 6425–6432.
34. 34)
  - 13. Sun, Y., Li, C.Q., Li, G.F., et al: ‘Gesture recognition based on kinect and sEMG signal fusion’, Mobile Netw. Appl., 2018, 23, (4), pp. 797–805.
35. 35)
  - 32. ‘Microsoft X-box Kinect’. Available at http://xbox.com.
36. 36)
  - 7. Liu, W., Anguelov, D., Erhan, D., et al: ‘SSD: single shot multibox detector’. Proc. Int. European Conf. on Computer Vision, 2016, pp. 21–37.
37. 37)
  - 29. Qi, J.X., Jiang, G.Z., Li, G.F., et al: ‘Surface EMG hand gesture recognition system based on PCA and GRNN’, Neural Comput. Appl., 2019, 32, (10), pp. 6343–6351.
38. 38)
  - 36. Wu, K., Yu, Y.Z.: ‘Automatic object extraction from images using deep neural networks and the level-set method’, IET Image Process.., 2018, 12, (7), pp. 1131–1141.
39. 39)
  - 26. Li, J., Mi, Y., Li, G.F., et al: ‘CNN-based facial expression recognition from annotated RGB-D images for human–robot interaction’, Int. J. Humanoid Robot., 2019, 16, (4), p. 1941002.
40. 40)
  - 33. Cheng, W.T., Sun, Y., Li, G.F., et al: ‘Jointly network: a network based on CNN and RBM for gesture recognition’, Neural Comput. Appl., 2019, 31, (Suppl. 1), pp. 309–323.

Gesture recognition algorithm based on multi-scale feature fusion in RGB-D images

References

Related content