Human action recognition (HAR) is a very challenging task because of intra-class variations and complex backgrounds. Here, a motion history image (MHI)-based interest point refinement is proposed to remove the noisy interest points. Histogram of oriented gradient (HOG) and histogram of optical flow (HOF) techniques are extended from spatial to spatio-temporal domain to preserve the temporal information. These local features are used to build the trees for the random forest technique. During tree building, a semi-supervised learning is proposed for better splitting of data points at each node. For recognition of an action, mutual information is estimated for all the extracted interest points to each of the trained class by passing them through the random forest. The proposed method is evaluated on KTH, Weizmann, and UCF Sports standard datasets. The experimental results indicate that the proposed technique provides better performance compared to earlier reported techniques.

References

1. 1)
  - 17. Bobick, A.F., Davis, J.W.: ‘The recognition of human movement using temporal templates’, IEEE Trans. Pattern Anal. Mach. Intell., 2001, 23, (3), pp. 257–267.
2. 2)
  - 8. Ahmed, K., El-Henawy, I., Mahmoud, H.A.: ‘Action recognition technique based on fast HOG3D of integral foreground snippets and random forest’. 2017 Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, 2017, pp. 1–7.
3. 3)
  - 21. Yuan, Y., Zhao, Y., Wang, Q.: ‘Action recognition using spatial-optical data organization and sequential learning framework’, Neurocomputing, 2018, 315, pp. 221–233.
4. 4)
  - 1. Chou, K.P., Prasad, M., Wu, D., et al: ‘Robust feature-based automated multi-view human action recognition system’, IEEE Access, 2018, 6, pp. 15283–15296.
5. 5)
  - 27. Schuldt, C., Laptev, I., Caputo, B.: ‘Recognizing human actions: a local SVM approach’. Proc. of the 17th Int. Conf. on Pattern Recognition, 2004. ICPR 2004, Cambridge, UK, 2004, vol. 3, pp. 32–36.
6. 6)
  - 5. Silambarasi, R., Sahoo, S.P., Ari, S.: ‘3D spatial-temporal view based motion tracing in human action recognition’. 2017 Int. Conf. on Communication and Signal Processing (ICCSP), Chennai, India, 2017, pp. 1833–1837.
7. 7)
  - 26. Breiman, L.: ‘Random forests’, Mach. Learn., 2001, 45, (1), pp. 5–32.
8. 8)
  - 18. Huang, Y., Cao, X., Wang, Q., et al: ‘Long-short term features for dynamic scene classification’, IEEE Trans. Circuits Syst. Video Technol., 2018, DOI: 10.1109/TCSVT.2018.2823360.
9. 9)
  - 22. Laptev, I., Marszalek, M., Schmid, C., et al: ‘Learning realistic human actions from movies’. IEEE Conf. on Computer Vision and Pattern Recognition, 2008. CVPR 2008, Anchorage, AK, USA, 2008, pp. 1–8.
10. 10)
  - 19. Lv, F., Nevatia, R.: ‘Single view human action recognition using key pose matching and viterbi path searching’. IEEE Conf. on Computer Vision and Pattern Recognition, 2007. CVPR'07, Minneapolis, MN, USA, 2007, pp. 1–8.
11. 11)
  - 23. Ahad, M.A.R., Tan, J.K., Kim, H., et al: ‘Motion history image: its variants and applications’, Mach. Vis. Appl., 2012, 23, (2), pp. 255–281.
12. 12)
  - 15. Yu, G., Yuan, J., Liu, Z.: ‘Unsupervised random forest indexing for fast action search’. 2011 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, USA, 2011, pp. 865–872.
13. 13)
  - 30. Megrhi, S., Jmal, M., Souidene, W., et al: ‘Spatio-temporal action localization and detection for human action recognition in big dataset’, J. Vis. Commun. Image Represent., 2016, 41, pp. 375–390.
14. 14)
  - 14. Laptev, I.: ‘On space-time interest points’, Int. J. Comput. Vis., 2005, 64, (2–3), pp. 107–123.
15. 15)
  - 4. Qin, Y., Mo, L., Xie, B.: ‘Feature fusion for human action recognition based on classical descriptors and 3D convolutional networks’. 2017 Eleventh Int. Conf. on Sensing Technology (ICST), Sydney, NSW, Australia, 2017, pp. 1–5.
16. 16)
  - 9. Jung, M., Hwang, J., Tani, J.: ‘Multiple spatio-temporal scales neural network for contextual visual recognition of human actions’. 2014 Joint IEEE Int. Conf. on Development and Learning and Epigenetic Robotics (ICDL-Epirob), Genoa, Italy, 2014, pp. 235–241.
17. 17)
  - 25. Serag, A., Macnaught, G., Denison, F.C., et al: ‘Histograms of oriented 3D gradients for fully automated fetal brain localization and robust motion correction in 3T magnetic resonance images’, BioMed Res. Int., 2017, 2017, pp. 1–8.
18. 18)
  - 7. Ladjailia, A., Bouchrika, I., Merouani, H.F., et al: ‘Automated detection of similar human actions using motion descriptors’. 2015 16th Int. Conf. on Sciences and Techniques of Automatic Control and Computer Engineering (STA), Monastir, Tunisia, 2015, pp. 398–403.
19. 19)
  - 29. Rodriguez, M.D., Ahmed, J., Shah, M.: ‘Action mach a spatio-temporal maximum average correlation height filter for action recognition’. IEEE Conf. on Computer Vision and Pattern Recognition, 2008. CVPR 2008, Anchorage, AK, USA, 2008, pp. 1–8.
20. 20)
  - 28. Gorelick, L., Blank, M., Shechtman, E., et al: ‘Actions as space-time shapes’. IEEE trans. pattern Anal. Mach. Intell., 200729, (12), pp. 2247–2253.
21. 21)
  - 2. Yu, G., Goussies, N.A., Yuan, J., et al: ‘Fast action detection via discriminative random forest voting and top-k subvolume search’, IEEE Trans. Multimed., 2011, 13, (3), pp. 507–517.
22. 22)
  - 12. Lin, B., Fang, B.: ‘Spatial-temporal histograms of gradients and HOD-VLAD encoding for human action recognition’. 2017 Int. Conf. on Security, Pattern Analysis, and Cybernetics (SPAC), Shenzhen, China, 2017, pp. 678–683.
23. 23)
  - 13. Xu, Y., Wang, L., Cheng, J., et al: ‘DTA: double LSTM with temporal-wise attention network for action recognition’. 2017 3rd IEEE Int. Conf. on Computer and Communications (ICCC), Chengdu, China, 2017, pp. 1676–1680.
24. 24)
  - 16. Huang, C.P., Hsieh, C.H., Lai, K.T., et al: ‘Human action recognition using histogram of oriented gradient of motion history image’. 2011 First Int. Conf. on Instrumentation, Measurement, Computer, Communication and Control, Beijing, China, 2011, pp. 353–356.
25. 25)
  - 20. Ji, S., Xu, W., Yang, M., et al: ‘3D convolutional neural networks for human action recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2013, 35, (1), pp. 221–231.
26. 26)
  - 24. Barron, J.L., Thacker, N.A.: ‘Tutorial: computing 2D and 3D optical flow’, Imaging Science and Biomedical Engineering Division, Medical School, University of Manchester, 2005, vol 1.
27. 27)
  - 10. Song, Y., Tang, S., Zheng, Y.T., et al: ‘A distribution based video representation for human action recognition’. 2010 IEEE Int. Conf. on Multimedia and Expo (ICME), Suntec City, Singapore, 2010, pp. 772–777.
28. 28)
  - 3. Samanta, S., Chanda, B.: ‘Space-time facet model for human activity classification’, IEEE Trans. Multimed., 2014, 16, (6), pp. 1525–1535.
29. 29)
  - 11. Lui, Y.M., Beveridge, J.R.: ‘Tangent bundle for human action recognition’. 2011 IEEE Int. Conf. on Automatic Face & Gesture Recognition and Workshops (FG 2011), Santa Barbara, CA, USA, 2011, pp. 97–102.
30. 30)
  - 6. Li, C., Su, B., Liu, Y., et al: ‘Human action recognition using spatio-temoporal descriptor’. 2013 6th Int. Congress on Image and Signal Processing (CISP), 2013, vol. 1, pp. 107–111.

3D Features for human action recognition with semi-supervised learning

References

Related content