© The Institution of Engineering and Technology
Existing studies in skeleton-based action recognition mainly utilise skeletal data taken from a single camera. Since the quality of skeletal tracking of a single camera is noisy and unreliable, however, combining data from multiple cameras can improve the tracking quality and hence increase the recognition accuracy. In this study, the authors propose a method called weighted averaging fusion which merges skeletal data of two or more camera views. The method first evaluates the reliability of a set of corresponding joints based on their distances to the centroid, then computes the weighted average of selected joints, that is, each joint is weighted by the overall reliability of the camera reporting the joint. Such obtained, fused skeletal data are used as the input to the action recognition step. Experiments using various frame-level features and testing schemes show that more than 10% improvement can be achieved in the action recognition accuracy using these fused skeletal data as compared with the single-view case.
References
-
-
1)
-
13. Iosifidis, A., Tefas, A., Pitas, I.: ‘View-invariant action recognition based on artificial neural networks’, IEEE Trans. Neural Netw. Learn. Syst., 2012, 23, (3), pp. 412–424 (doi: 10.1109/TNNLS.2011.2181865).
-
2)
-
32. Barnachon, M., Bouakaz, S., Boufama, B., et al: ‘Ongoing human action recognition with motion capture’, Pattern Recognit., 2014, 47, (1), pp. 238–247 (doi: 10.1016/j.patcog.2013.06.020).
-
3)
-
12. Weinland, D., Ronfard, R., Boyer, E.: ‘Automatic discovery of action taxonomies from multiple views’. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, 2006, vol. 2, pp. 1639–1645.
-
4)
-
27. Chen, X., Koskela, M.: ‘Skeleton-based action recognition with extreme learning machines’, Neurocomputing, 2015, 149, , pp. 387–396 (doi: 10.1016/j.neucom.2013.10.046).
-
5)
-
5. Poppe, R.: ‘A survey on vision-based human action recognition’, Image Vis. Comput., 2010, 28, (6), pp. 976–990 (doi: 10.1016/j.imavis.2009.11.014).
-
6)
-
1. Holte, M.B., Tran, C., Trivedi, M.M., et al: ‘Human action recognition using multiple views: a comparative perspective on recent developments’. Proc. of Joint ACM Workshop on Human Gesture and Behaviour Understanding, Scottsdale, Arizona, USA, 2011, pp. 47–52.
-
7)
-
25. Kaenchan, S., Mongkolnam, P., Watanapa, B., et al: ‘Automatic multiple kinect cameras setting for simple walking posture analysis’. Int. Computer Science Engineering Conf., Bangkok, Thailand, September 2013, pp. 245–249.
-
8)
-
33. Leightley, D., Li, B., McPhee, J.S., et al: ‘Exemplar-based human action recognition with template matching from a stream of motion capture’, in Campilho, A., Kamel, M. (Eds.): ‘Image analysis and recognition’ (Springer International Publishing, 2014), vol. 8815, , pp. 12–20.
-
9)
-
23. Yeung, K.Y., Kwok, T.H., Wang, C.L.: ‘Improved skeleton tracking by duplex kinects: a practical approach for real-time applications’, J. Comput. Inf. Sci. Eng., 2013, 13, (4), pp. 041007–041007-10 (doi: 10.1115/1.4025404).
-
10)
-
39. ‘Kinect Depth vs. Actual Distance’. .
-
11)
-
16. ‘Kinect for Windows user interface guideline v1.8’. .
-
12)
-
37. Herrera, D.C., Kannala, J., Heikkilä, J.: ‘Joint depth and color camera calibration with distortion correction’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, (10), pp. 2058–2064 (doi: 10.1109/TPAMI.2012.125).
-
13)
-
7. Yao, A., Gall, J., Fanelli, G., et al: ‘Does human action recognition benefit from pose estimation?’. Proc. of British Machine Vision Conf., Dundee, UK, September 2011, pp. 67.1–67.11.
-
14)
-
9. Weinland, D., Ronfard, R., Boyer, E.: ‘Free viewpoint action recognition using motion history volumes’, Comput. Vis. Image Underst., 2006, 104, (2–3), pp. 249–257 (doi: 10.1016/j.cviu.2006.07.013).
-
15)
-
15. Mademlis, I., Iosifidis, A., Tefas, A., et al: ‘Stereoscopic video description for human action recognition’. IEEE Symp. on Computational Intelligence for Multimedia, Signal and Vision Processing, Orlando, FL, USA, December 2014, pp. 1–6.
-
16)
-
3. Weinland, D., Ronfard, R., Boyer, E.: ‘A survey of vision-based methods for action representation, segmentation and recognition’, Comput. Vis. Image Underst., 2011, 115, (2), pp. 224–241 (doi: 10.1016/j.cviu.2010.10.002).
-
17)
-
38. Smisek, J., Jancosek, M., Pajdla, T.: ‘3D with Kinect’. IEEE Int. Conf. on Computer Vision Workshop (ICCVW), Barcelona, Spain, November 2011, pp. 1154–1160.
-
18)
-
17. Jiang, M., Kong, J., Bebis, G., et al: ‘Informative joints based human action recognition using skeleton contexts’, Signal Process., Image Commun., 2015, 33, pp. 29–40 (doi: 10.1016/j.image.2015.02.004).
-
19)
-
3. Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: ‘Machine recognition of human activities: A survey’, IEEE Trans. Circuits Syst. Video Technol., 2008, 18, (11), pp. 1473–1488 (doi: 10.1109/TCSVT.2008.2005594).
-
20)
-
14. Shotton, J., Fitzgibbon, A., Cook, M., et al: ‘Real-time human pose recognition in parts from single depth images’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, Colorado, June 2011, pp. 1297–1304.
-
21)
-
36. Chen, L., Wei, H., Ferryman, J.: ‘ReadingAct RGB-D action dataset and human action recognition from local features’, Pattern Recognit. Lett., 2014, 50, pp. 159–169 (doi: 10.1016/j.patrec.2013.09.004).
-
22)
-
21. Cheng, Z., Qin, L., Ye, Y., et al: ‘Human daily action analysis with multi-view and color-depth data’. Proc. of ECCV Workshops and Demonstrations, Florence, Italy, October 2012, pp. 52–61.
-
23)
-
30. Wang, J., Liu, Z., Wu, Y., et al: ‘Mining actionlet ensemble for action recognition with depth cameras’. 2012 IEEE Conf. Computer Vision and Pattern Recognition, Rhode Island, 2012, pp. 1290–1297.
-
24)
-
24. Caon, M., Yue, Y., Tscherrig, J., et al: ‘Context-aware 3D gesture interaction based on multiple kinects’. The First Int. Conf. on Ambient Computing, Applications, Services and Technologies, Barcelona, Spain, October 2011, pp. 7–12.
-
25)
-
8. Lόpez-Méndez, A., Casas, J.R.: ‘Model-based recognition of human actions by trajectory matching in phase spaces’, Image Vis. Comput., 2012, 30, (11), pp. 808–816 (doi: 10.1016/j.imavis.2012.06.007).
-
26)
-
20. Chaaraoui, A.A., Climent-Pérez, P., Flόrez-Revuelta, F.: ‘An efficient approach for multi-view human action recognition based on bag-of-key-poses’, in Salah, A., Ruiz-del-Solar, J., Meriçli, Ç., et al (Eds.): ‘Human behavior understanding’ (Springer, Berlin, Heidelberg, 2012), vol. 7559, ., pp. 29–40.
-
27)
-
19. Cilla, R., Patricio, M.A., Berlanga, A., et al: ‘A probabilistic, discriminative and distributed system for the recognition of human actions from multiple views’, Neurocomputing, 2012, 75, (1), pp. 78–87 (doi: 10.1016/j.neucom.2011.03.051).
-
28)
-
29. Yang, X., Tian, Y.L.: ‘Effective 3D action recognition using EigenJoints’, J. Vis. Commun. Image Represent., 2014, 25, (1), pp. 2–11 (doi: 10.1016/j.jvcir.2013.03.001).
-
29)
-
26. Chaaraoui, A.A., Padilla-Lόpez, J.R., Climent-Pérez, P., et al: ‘Evolutionary joint selection to improve human action recognition with RGB-D devices’, Expert Syst. Appl., 2014, 41, (3), pp. 786–794 (doi: 10.1016/j.eswa.2013.08.009).
-
30)
-
34. Horn, B.K.P.: ‘Closed-form solution of absolute orientation using unit quaternions’, J. Opt. Soc. Am. A, 1987, 4, (4), pp. 629–642 (doi: 10.1364/JOSAA.4.000629).
-
31)
-
31. Oi, F., Chaudhry, R., Kurillo, G., et al: ‘Sequence of the most informative joints (SMIJ): a new representation for human skeletal action recognition’, J. Vis. Commun. Image Represent., 2014, 25, (1), pp. 24–38 (doi: 10.1016/j.jvcir.2013.04.007).
-
32)
-
11. Iosifidis, A., Tefas, A., Nikolaidis, N., et al: ‘Multi-view human movement recognition based on fuzzy distances and linear discriminant analysis’, Comput. Vis. Image Underst., 2012, 116, (3), pp. 347–360 (doi: 10.1016/j.cviu.2011.08.008).
-
33)
-
4. Chen, L., Wei, H., Ferryman, J.: ‘A survey of human motion analysis using depth imagery’, Pattern Recognit. Lett., 2013, 34, (15), pp. 1995–2006 (doi: 10.1016/j.patrec.2013.02.006).
-
34)
-
18. Azis, N.A., Choi, H.J., Iraqi, Y.: ‘Substitutive skeleton fusion for human action recognition’. Int. Conf. Big Data and Smart Computing, Jeju Island, Republic of Korea, February 2015.
-
35)
-
28. Xia, L., Chen, C.C., Aggarwal, J.K.: ‘View invariant human action recognition using histograms of 3D joints’. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition Workshops, Rhode Island, 2012, pp. 20–27.
-
36)
-
2. Aggarwal, J.K., Xia, L.: ‘Human activity recognition from 3D data: a review’, Pattern Recognit. Lett., 2014, 48, pp. 70–80 (doi: 10.1016/j.patrec.2014.04.011).
-
37)
-
35. Li, W., Zhang, Z., Liu, Z.: ‘Action recognition based on a bag of 3D points’. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition Workshops, San Francisco, June 2010, pp. 9–14.
-
38)
-
9. Vemulapalli, R., Arrate, F., Chellappa, R.: ‘Human action recognition by representing 3D skeletons as points in a lie group’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, June 2014, pp. 588–595.
-
39)
-
22. Haller, E., Scarlat, G., Mocanu, I., et al: ‘Human activity recognition based on multiple Kinects’, in Botía, J., Álvarez-García, J., Fujinami, K., et al (Eds.): ‘Evaluating AAL systems through competitive benchmarking’ (Springer, Berlin, Heidelberg, 2013), vol. 386, ., pp. 48–59.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2015.0146
Related content
content/journals/10.1049/iet-cvi.2015.0146
pub_keyword,iet_inspecKeyword,pub_concept
6
6