access icon free Weighted averaging fusion for multi-view skeletal data and its application in action recognition

Existing studies in skeleton-based action recognition mainly utilise skeletal data taken from a single camera. Since the quality of skeletal tracking of a single camera is noisy and unreliable, however, combining data from multiple cameras can improve the tracking quality and hence increase the recognition accuracy. In this study, the authors propose a method called weighted averaging fusion which merges skeletal data of two or more camera views. The method first evaluates the reliability of a set of corresponding joints based on their distances to the centroid, then computes the weighted average of selected joints, that is, each joint is weighted by the overall reliability of the camera reporting the joint. Such obtained, fused skeletal data are used as the input to the action recognition step. Experiments using various frame-level features and testing schemes show that more than 10% improvement can be achieved in the action recognition accuracy using these fused skeletal data as compared with the single-view case.

Inspec keywords: object tracking; feature extraction; video cameras; image fusion; merging; pose estimation

Other keywords: skeletal data merging; skeletal data fusion; frame level feature; camera view merging; skeletal tracking quality; weighted averaging fusion; skeleton-based action recognition; reliability evaluation

Subjects: Image recognition; Data handling techniques; Computer vision and image processing techniques

References

    1. 1)
    2. 2)
    3. 3)
      • 12. Weinland, D., Ronfard, R., Boyer, E.: ‘Automatic discovery of action taxonomies from multiple views’. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, 2006, vol. 2, pp. 16391645.
    4. 4)
    5. 5)
    6. 6)
      • 1. Holte, M.B., Tran, C., Trivedi, M.M., et al: ‘Human action recognition using multiple views: a comparative perspective on recent developments’. Proc. of Joint ACM Workshop on Human Gesture and Behaviour Understanding, Scottsdale, Arizona, USA, 2011, pp. 4752.
    7. 7)
      • 25. Kaenchan, S., Mongkolnam, P., Watanapa, B., et al: ‘Automatic multiple kinect cameras setting for simple walking posture analysis’. Int. Computer Science Engineering Conf., Bangkok, Thailand, September 2013, pp. 245249.
    8. 8)
      • 33. Leightley, D., Li, B., McPhee, J.S., et al: ‘Exemplar-based human action recognition with template matching from a stream of motion capture’, in Campilho, A., Kamel, M. (Eds.): ‘Image analysis and recognition’ (Springer International Publishing, 2014), vol. 8815, Lect. Notes Comp. Sci, pp. 1220.
    9. 9)
    10. 10)
      • 39. Kinect Depth vs. Actual Distance’. Available at http://mathnathan.com/2011/02/depthvsdistance/, accessed 17 July 2015.
    11. 11)
      • 16. Kinect for Windows user interface guideline v1.8’. Available at https://msdn.microsoft.com/en-us/library/jj663791.aspx, accessed 17 July 2015.
    12. 12)
    13. 13)
      • 7. Yao, A., Gall, J., Fanelli, G., et al: ‘Does human action recognition benefit from pose estimation?’. Proc. of British Machine Vision Conf., Dundee, UK, September 2011, pp. 67.167.11.
    14. 14)
    15. 15)
      • 15. Mademlis, I., Iosifidis, A., Tefas, A., et al: ‘Stereoscopic video description for human action recognition’. IEEE Symp. on Computational Intelligence for Multimedia, Signal and Vision Processing, Orlando, FL, USA, December 2014, pp. 16.
    16. 16)
    17. 17)
      • 38. Smisek, J., Jancosek, M., Pajdla, T.: ‘3D with Kinect’. IEEE Int. Conf. on Computer Vision Workshop (ICCVW), Barcelona, Spain, November 2011, pp. 11541160.
    18. 18)
    19. 19)
    20. 20)
      • 14. Shotton, J., Fitzgibbon, A., Cook, M., et al: ‘Real-time human pose recognition in parts from single depth images’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, Colorado, June 2011, pp. 12971304.
    21. 21)
    22. 22)
      • 21. Cheng, Z., Qin, L., Ye, Y., et al: ‘Human daily action analysis with multi-view and color-depth data’. Proc. of ECCV Workshops and Demonstrations, Florence, Italy, October 2012, pp. 5261.
    23. 23)
      • 30. Wang, J., Liu, Z., Wu, Y., et al: ‘Mining actionlet ensemble for action recognition with depth cameras’. 2012 IEEE Conf. Computer Vision and Pattern Recognition, Rhode Island, 2012, pp. 12901297.
    24. 24)
      • 24. Caon, M., Yue, Y., Tscherrig, J., et al: ‘Context-aware 3D gesture interaction based on multiple kinects’. The First Int. Conf. on Ambient Computing, Applications, Services and Technologies, Barcelona, Spain, October 2011, pp. 712.
    25. 25)
    26. 26)
      • 20. Chaaraoui, A.A., Climent-Pérez, P., Flόrez-Revuelta, F.: ‘An efficient approach for multi-view human action recognition based on bag-of-key-poses’, in Salah, A., Ruiz-del-Solar, J., Meriçli, Ç., et al (Eds.): ‘Human behavior understanding’ (Springer, Berlin, Heidelberg, 2012), vol. 7559, Lect. Notes Comp. Sci., pp. 2940.
    27. 27)
    28. 28)
    29. 29)
    30. 30)
    31. 31)
    32. 32)
    33. 33)
    34. 34)
      • 18. Azis, N.A., Choi, H.J., Iraqi, Y.: ‘Substitutive skeleton fusion for human action recognition’. Int. Conf. Big Data and Smart Computing, Jeju Island, Republic of Korea, February 2015.
    35. 35)
      • 28. Xia, L., Chen, C.C., Aggarwal, J.K.: ‘View invariant human action recognition using histograms of 3D joints’. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition Workshops, Rhode Island, 2012, pp. 2027.
    36. 36)
    37. 37)
      • 35. Li, W., Zhang, Z., Liu, Z.: ‘Action recognition based on a bag of 3D points’. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition Workshops, San Francisco, June 2010, pp. 914.
    38. 38)
      • 9. Vemulapalli, R., Arrate, F., Chellappa, R.: ‘Human action recognition by representing 3D skeletons as points in a lie group’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, June 2014, pp. 588595.
    39. 39)
      • 22. Haller, E., Scarlat, G., Mocanu, I., et al: ‘Human activity recognition based on multiple Kinects’, in Botía, J., Álvarez-García, J., Fujinami, K., et al (Eds.): ‘Evaluating AAL systems through competitive benchmarking’ (Springer, Berlin, Heidelberg, 2013), vol. 386, Comm. Comp. Inf. Sci., pp. 4859.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2015.0146
Loading

Related content

content/journals/10.1049/iet-cvi.2015.0146
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading