Multi-view human action recognition using 2D motion templates based on MHIs and their HOG description

Fiza Murtaza; Muhammad Haroon Yousaf; Sergio A. Velastin

Multi-view human action recognition using 2D motion templates based on MHIs and their HOG description

View Fulltext

Author(s): Fiza Murtaza ¹ ; Muhammad Haroon Yousaf ¹ ; Sergio A. Velastin ^{2, 3}
- Affiliations: 1: Department of Computer Engineering, University of Engineering and Technology, Taxila, Pakistan ;
  2: Department of Informatic Engineering, University of Santiago, Santiago, Chile ;
  3: Department of Computer Science and Engineering, University Carlos III de Madrid, Colmenarejo, Spain
Source: Volume 10, Issue 7, October 2016, p. 758 – 767
DOI: 10.1049/iet-cvi.2015.0416 , Print ISSN 1751-9632, Online ISSN 1751-9640

Received 01/11/2015, Accepted 26/04/2016, Revised 21/04/2016, Published 29/04/2016

In this study, a new multi-view human action recognition approach is proposed by exploiting low-dimensional motion information of actions. Before feature extraction, pre-processing steps are performed to remove noise from silhouettes, incurred due to imperfect, but realistic segmentation. Two-dimensional motion templates based on motion history image (MHI) are computed for each view/action video. Histograms of oriented gradients (HOGs) are used as an efficient description of the MHIs which are classified using nearest neighbor (NN) classifier. As compared with existing approaches, the proposed method has three advantages: (i) does not require a fixed number of cameras setup during training and testing stages hence missing camera-views can be tolerated, (ii) requires less memory and bandwidth requirements and hence (iii) is computationally efficient which makes it suitable for real-time action recognition. As far as the authors know, this is the first report of results on the MuHAVi-uncut dataset having a large number of action categories and a large set of camera-views with noisy silhouettes which can be used by future workers as a baseline to improve on. Experimentation results on multi-view with this dataset gives a high-accuracy rate of 95.4% using leave-one-sequence-out cross-validation technique and compares well to similar state-of-the-art approaches.

References

1. 1)
  - 11. Ahmad, M., Parvin, I., Lee, S.W.: ‘Silhouette history and energy image information for human movement recognition’, J. Multimedia, 2010, 5, (1), pp. 12–21.
2. 2)
  - 33. Sepulveda, J., Velastin, S.A.: ‘Evaluation of background subtraction algorithms using MuHAVi, a multicamera human action video dataset’. Sixth Chilean Conf. on Pattern Recognition, Talca, Chile, 10–14 November 2014, pp. 10–14.
3. 3)
  - 37. Ogale, A., Karapurkar, A., Guerra-Filho, G., et al: ‘View-invariant identification of pose sequences for action recognition’. VACE, 2004.
4. 4)
  - 44. Gelb, A.: ‘Applied optimal estimation’, The Analytic Scientific Corporation, MIT press, USA1974, ISBN: 9780262200271..
5. 5)
  - 15. Lowe, D.G.: ‘Distinctive image features from scale-invariant keypoints’, Int. J. Comput. Vis., 2004, 60, (2), pp. 91–110.
6. 6)
  - 3. Bobick, A., Davis, J.: ‘The recognition of human movement using temporal templates’, IEEE Trans. Pattern Recognit. Mach. Intell., 2001, 23, (1), pp. 257–267.
7. 7)
  - 5. Singh, S., Velastin, S.A., Ragheb, H.: ‘MuHAVi: a multicamera human action video dataset for the evaluation of action recognition methods’. Seventh IEEE Int. Conf. on Advanced Video and Signal Based Surveillance (AVSS), 2010, pp. 48–55.
8. 8)
  - 7. Huang, C.P., Hsieh, C.H., Lai, K.T., et al: ‘Human action recognition using histogram of oriented gradient of motion history image’. 2011 First Int. Conf. on Instrumentation, Measurement, Computer, Communication and Control, 2011.
9. 9)
  - 46. Chen, S., Tian, Y., Liu, Q., et al: ‘Recognizing expressions from face and body gesture by temporal normalized motion and appearance features’, Image Vis. Comput., 2013, 31, (2), pp. 175–185.
10. 10)
  - 1. Poppe, R.: ‘A survey on vision-based human action recognition’, Image Vis. Comput., 2010, 28, (6), pp. 976–990.
11. 11)
  - 6. Di, W., Shao, L.: ‘Silhouette analysis-based action recognition via exploiting human poses’, Circuits and Systems for Video Technology’, IEEE Trans., 23, (2), 2013, pp. 236–243.
12. 12)
  - 16. Rudoy, D., Manor, L.Z.: ‘Viewpoint selection for human actions’, Int. J. Comput. Vis., 2012, 97, pp. 243–254.
13. 13)
  - 41. Fayed, H., Atiya, A.: ‘A novel template reduction approach for the –nearest neighbor method’, IEEE Trans. Neural Netw., 2009, 20, (5), pp. 890–896.
14. 14)
  - 45. Schuldt, C., Laptev, I., Caputo, B.: ‘Recognizing human actions: a local SVM approach’. Proc. 17th ICPR, 2004, vol. 3, pp. 32–36.
15. 15)
  - 25. Zhu, F., Shao, L., Lin, M.: ‘Multi-view action recognition using local similarity random forests and sensor fusion’, Pattern Recognit. Lett., 2013, 24, pp. 20–24.
16. 16)
  - 20. Weinland, D., Ronfard, R., Boyer, E.: ‘Free viewpoint action recognition using motion history volumes’. CVIU, 2006, vol. 104, no. 2, pp. 249–257.
17. 17)
  - 4. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. CVPR, 2005, pp. 886–893.
18. 18)
  - 12. Wang, H., Klaser, A., Schmid, C., et al: ‘Action recognition by dense trajectories’. Proc. of the IEEE Int. Conf. on Computer Vision and Pattern Recognition, 2011, pp. 3169–3137.
19. 19)
  - 28. Ahmad, M., Lee, S.W.: ‘HMM-based human action recognition using multiview image sequences’. ICPR, 2006, pp. 263–266.
20. 20)
  - 26. Iosifidis, A., Tefas, A., Pitas, I.: ‘View-invariant action recognition based on artificial neural networks’, IEEE TNNLS, 2012, 23, (3), pp. 412–424.
21. 21)
  - 40. Cover, T.M., Hart, P.E.: ‘Nearest neighbor pattern classification’, IEEE Trans. Inf. Theory, 1967, 9, pp. 21–26.
22. 22)
  - 24. Qureshi, F., Terzopoulos, D.: ‘Surveillance camera scheduling: a virtual vision approach’, Multimedia Syst., 2006, 12, (3), pp. 269–283.
23. 23)
  - 27. Iosifidis, A., Tefas, A., Pitas, I.: ‘Multi-view action recognition based on action volumes, fuzzy distances and cluster discriminant analysis’, Signal Process.., 2013, 93, (6), pp. 1445–1457.
24. 24)
  - 8. Sepulveda, J., Velastin, S.A.: ‘F1 score assessment of Gaussian mixture background subtraction algorithms using the MuHAVi dataset’. Sixth IET Int. Conf. on Imaging for Crime Detection and Prevention (ICDP-15), London, 15–17 July 2015.
25. 25)
  - 35. Weinland, D., Ronfard, R., Boyer, E.: ‘A survey of vision-based methods for action representation, segmentation and recognition’, Comput. Vis. Image Underst., 2011, 115, (2), pp. 224–241.
26. 26)
  - 19. Iosifidis, A., Tefas, A., Pitas, I.: ‘Multi-view human action recognition: a survey’. Ninth Int. Conf. on Intelligent Information Hiding and Multimedia Signal Processing, 2013, vol. 16–18, pp. 522–525.
27. 27)
  - 23. Gkalelis, N., Nikolaidis, N., Pitas, I.: ‘View independent human movement recognition from multi-view video exploiting a circular invariant posture representation’. IEEE ICME, 2009, pp. 394–397.
28. 28)
  - 18. Holte, M.B., Moeslund, T.B., Tran, C., et al: ‘Human action recognition using multiple views: a comparative perspective on recent developments’ ACM HGBU, 2011, pp. 47–52.
29. 29)
  - 36. Ahmad, M., Lee, S.W.: ‘HMM-based human action recognition using multiview image sequences’. Int. Conf. on Pattern Recognition, 2006, vol. 1, pp. 263–266.
30. 30)
  - 10. Cheema, S., Eweiwi, A., Thurau, C., et al: ‘Action recognition by learning discriminative key poses’. ICCV Workshops, 2011, pp. 1302–1309.
31. 31)
  - 17. Ji, X., Liu, H.: ‘Advances in view-invariant human motion analysis: a review’, Trans. Syst. Man Cybern. C, 2010, 40, (1), pp. 13–24.
32. 32)
  - 21. Holte, M., Moeslund, T., Nikolaidis, N., et al: ‘3D human action recognition for multi-view camera systems’. 3DIMPVT, 2011, pp. 342–349.
33. 33)
  - 32. Liu, J., Luo, J., Shah, M.: ‘Recognizing realistic actions from videos ‘in the wild’’. CVPR, 2009, pp. 1996–2003.
34. 34)
  - 9. Chaaraoui, A.A., Climent-Pérez, P., Flórez-Revuelta, F.: ‘Silhouette-based human action recognition using sequences of key poses’, Pattern Recognit. Lett., 2013, 34, (15), pp. 1799–1807.
35. 35)
  - 31. Li, B., Campls, O.I., Sznaier, M.: ‘Cross-view activity recognition using hankelets’. CVPR, 2012, pp. 1362–1369.
36. 36)
  - 30. Li, R.: ‘Discriminative virtual views for cross-view action recognition’. CVPR, 2012, pp. 2855–2862.
37. 37)
  - 2. Aggarwal, J.K., Ryoo, M.S.: ‘Human activity analysis: a review’, ACM Comput. Surv. (CSUR), 43, (3), 2011, p. 16.
38. 38)
  - 42. Marinaki, M., Marinakis, Y., Doumpos, M., et al: ‘A comparison of several nearest neighbor classifier metrics using Tabu search algorithm for the feature selection problem’, Optim. Lett., 2008, 2, (3), pp. 299–308.
39. 39)
  - 22. Yan, P., Khan, S., Shah, M.: ‘Learning 4D action feature models for arbitrary view action recognition’. CVPR, 2008, pp. 1–7.
40. 40)
  - 14. Wong, S.F., Cipolla, R.: ‘Extracting spatio-temporal interest points using global information’. Proc. of the IEEE Int. Conf. on Computer Vision, 2007, pp. 1–8.
41. 41)
  - 38. Lewandowski, M., Makris, D., Velastin, S.A., Nebel, J.C.: ‘Structural Laplacian eigenmaps for modeling sets of multivariate sequences’, Cybernetics’, IEEE Transactions on., 2014, 44, (6), pp.936–949.
42. 42)
  - 39. Ahad, M.A.R., Tan, J.K., Kim, H., Ishikawa, S.: ‘Motion history image: its variants and applications’, Machine Vision and Applications., 2012, 23,(2), pp.255–281.
43. 43)
  - 43. Agrawal, R., Faloutsos, C., Swami, A.: ‘Efficient similarity search in sequence databases’ Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms (FODO ‘93) Springer-VerlagLondon, UK, 1993, pp. 69–84..
44. 44)
  - 29. Orrite, C., Rodriguez, M., Herrero, E., et al: ‘Automatic segmentation and recognition of human actions in monocular sequences’. ICPR, 2014, pp. 4218–4223.
45. 45)
  - 13. Ramanan, D., Forsyth, D.A., Zisserman, A.: ‘Tracking people by learning their appearance’, IEEE Trans. Pattern Anal. Mach. Intell., 2007, 29, pp. 65–81.
46. 46)
  - 34. Chen, Z., Ellis, T.: ‘Self-adaptive Gaussian mixture model for urban traffic monitoring system’. 2011 IEEE Int. Conf. on Computer Vision Workshops (ICCV Workshops), 2011, pp. 1769–1776.
47. 47)
  - 47. Mueid, R.M., Ahmed, C., Ahad, M.A.: ‘Pedestrian activity classification using patterns of motion and histogram of oriented gradient’, J. Multimodal User Interfaces, 2015, pp. 1–7.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Multi-view human action recognition using 2D motion templates based on MHIs and their HOG description

References

Related content