Currently, human motion analysis using three-dimensional (3D) data creates closer awareness in computer vision with the introduction of cost-effective Kinect or other depth cameras. This study attempts to segment a continuous 3D skeletal sequence into several disjointed sub-sequences, each of which is corresponding to a complete action. To address this issue, the authors propose a supervised time-series segmentation algorithm. A bidirectional propagation search scheme is employed to reach a solution. Specifically, a human skeleton is formulated as a point in multidimensional space, and a motion trajectory is further represented as a sequence. Each training action sequence serves as an atom in a dictionary, which is adopted by an l ₂ -regularised collaborative representation classifier. Based on the fact that the reconstruction error of the collaborative representation measures the similarity between a test sub-sequence and training sequences, they utilise its variation over time to capture action transition. Cut point detection and sub-sequence recognition are simultaneously achieved. Experiments on the authors’ recorded 3D skeletal sequences demonstrate that the proposed algorithm outperforms existing human motion segmentation techniques. Their algorithm is capable of extending to segment various dimensional sequences. This extensibility is validated by synthetic signal segmentation experiments.

References

1. 1)
  - 24. Zou, H., Hastie, T.: ‘Regularization and variable selection via the elastic net’, J. R. Stat. Soc., 2005, 67, (2), pp. 301–320.
2. 2)
  - 1. Aggarwal, J.K., Ryoo, M.S.: ‘Human activity analysis: a review’, ACM Comput. Surv., 2011, 43, (3), pp. 194–218.
3. 3)
  - 22. Chen, C., Liu, K., Kehtarnavaz, N.: ‘Real-time human action recognition based on depth motion maps’, J. Real-Time Image Process., 2016, 12, (1), pp. 155–163.
4. 4)
  - 21. Hoerl, A.E., Kennard, R.W.: ‘Ridge regression’ (Encyclopedia of Statistical Sciences, Wiley, New York, 1988), pp. 129–136.
5. 5)
  - 4. Gong, W., Zhang, X., Gonzalez, J., et al: ‘Human pose estimation from monocular images: a comprehensive survey’, MDPI Sens., 2016, 16, (12), p. 1966.
6. 6)
  - 31. Li, W., Zhang, Z., Liu, Z.: ‘Action recognition based on a bag of 3D points’. IEEE Computer Society Conf. Computer Vision and Pattern Recognition Workshops (CVPRW), 2010, pp. 9–14.
7. 7)
  - 26. Wright, J., Ma, Y., Mairal, J., et al: ‘Sparse representation for computer vision and pattern recognition’, Proc. IEEE, 2010, 98, (6), pp. 1031–1044.
8. 8)
  - 14. Gong, D., Medioni, G.: ‘Dynamic manifold warping for view invariant action recognition’. The IEEE Int. Conf. Computer Vision (ICCV), 2011, 23, (5), pp. 571–578.
9. 9)
  - 35. Oh, S.M., Rehg, J.M., Balch, T., et al: ‘Learning and inferring motion patterns using parametric segmental switching linear dynamic systems’. The IEEE Int. Conf. Computer Vision (ICCV), 2008, 77, (1), pp. 103–124.
10. 10)
  - 13. Yu, G., Liu, Z., Yuan, J.: ‘Discriminative orderlet mining for real-time recognition of human-object interaction’. Asian Conf. Computer Vision (ACCV), 2014, pp. 50–65.
11. 11)
  - 23. Tibshirani, R.: ‘Regression shrinkage and selection via the lasso’, J. R. Stat. Soc. B, 1996, 58, (1), pp. 267–288.
12. 12)
  - 30. Elad, M.: ‘Sparse and redundant representations: from theory to applications in signal and image processing’ (Springer, New York, 2010).
13. 13)
  - 5. Shotton, J., Fitzgibbon, A., Cook, M., et al: ‘Real-time human pose recognition in parts from single depth images’. Computer Vision and Pattern Recognition (CVPR) 2011, Colorado Springs, CO, USA, 20–25 June, 2013, vol. 56, pp. 1297–1304.
14. 14)
  - 34. Fox, E., Sudderth, E., Jordan, M., et al: ‘Nonparametric Bayesian learning of switching linear dynamical systems’. Proc. Neural Information Processing Systems, 2008, pp. 457–464.
15. 15)
  - 11. Xia, L., Chen, C.C., Aggarwal, J.K.: ‘View invariant human action recognition using histograms of 3D joints’. IEEE Computer Society Conf. Computer Vision and Pattern Recognition Workshops (CVPRW), 2012, pp. 20–27.
16. 16)
  - 10. Yang, X., Tian, Y.: ‘Eigenjoints-based action recognition using Naïve-Bayes-nearest-neighbor’. IEEE Computer Society Conf. Computer Vision and Pattern Recognition Workshops (CVPRW), 2012, 38, (3c), pp. 14–19.
17. 17)
  - 9. Vemulapalli, R., Arrate, F., Chellappa, R.: ‘Human action recognition by representing 3D human skeletons as points in a Lie group’. Computer Vision and Pattern Recognition (CVPR), 2014, pp. 588–595.
18. 18)
  - 12. Zanfir, M., Leordeanu, M., Sminchisescu, C.: ‘The moving pose: an efficient 3D kinematics descriptor for low-latency action recognition and detection’. The IEEE Int. Conf. Computer Vision (ICCV), 2013, pp. 2752–2759.
19. 19)
  - 20. Gong, D., Medioni, G., Zhu, S., et al: ‘Kernelized temporal cut for online temporal segmentation and recognition’. European Conf. Computer Vision (ECCV), 2012, pp. 229–243.
20. 20)
  - 2. Weinland, D., Ronfard, R., Boyer, E.: ‘A survey of vision-based methods for action representation, segmentation and recognition’, Comput. Vis. Image Underst., 2011, 115, (2), pp. 224–241.
21. 21)
  - 27. Efron, B., Hastie, T., Johnstone, I., et al: ‘Least angle regression’, Ann. Stat., 2004, 32, (2), pp. 407–451.
22. 22)
  - 8. Wang, J., Liu, Z., Wu, Y., et al: ‘Learning actionlet ensemble for 3D human action recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2014, 36, (5), pp. 914–927.
23. 23)
  - 16. MacQueen, J.B.: ‘Some methods for classification and analysis of multivariate observations’. Proc. Fifth Berkeley Symp. Mathematical Statistical Probability, 1967, vol. 1, pp. 281–297.
24. 24)
  - 32. ‘CMU graphics Lab motion capture database’. Available at http://mocap.cs.cmu.edu, 2012.
25. 25)
  - 25. Fan, J., Li, R.: ‘Variable selection via nonconcave penalized likelihood and its oracle properties’, J. Am. Stat. Assoc., 2001, 96, (456), pp. 1348–1360.
26. 26)
  - 18. Luxburg, U.: ‘A tutorial on spectral clustering’, Stat. Comput., 2007, 17, (4), pp. 395–416.
27. 27)
  - 28. Zhang, L., Yang, M., Feng, X.: ‘Sparse representation or collaborative representation: which helps face recognition?’. The IEEE Int. Conf. Computer Vision (ICCV), 2011, pp. 471–478.
28. 28)
  - 29. Wright, J., Yang, Y., Ganesh, A., et al: ‘Robust face recognition via sparse representation’, IEEE Trans. Pattern Anal. Mach. Intell., 2009, 31, (2), pp. 210–227.
29. 29)
  - 6. Zhou, F., Torre, F.D.L., Hodgins, J.K.: ‘Aligned cluster analysis for temporal segmentation of human motion’. IEEE Int. Conf. Automatic Face & Gesture Recognition, 2008, 418, (6), pp. 1–7.
30. 30)
  - 19. Barbic, J., Safonova, A., Pan, J.Y., et al: ‘Segmenting motion capture data into distinct behaviors’, Proc. Graph. Interface, 2004, 34, (5), pp. 185–194.
31. 31)
  - 33. Burkard, R., Dell'Amico, M., Martello, S.: ‘Assignment problems’ (SIAM, Philadelphia, Pennsylvania, 2009).
32. 32)
  - 17. Dhillon, I.S., Guan, Y., Kulis, B.: ‘Kernel k-means: spectral clustering and normalized cuts’. Proc. 10th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2004, pp. 551–556.
33. 33)
  - 15. Zhou, F., Torre, F.D.L.: ‘Canonical time warping for alignment of human behavior’. Proc. Neural Information Processing Systems (NIPS), 2009, vol. 22, pp. 2286–2294.
34. 34)
  - 3. Popoola, O.P., Wang, K.: ‘Video-based abnormal human behavior recognition – a review’, IEEE Trans. Syst. Man Cybern. C, 2012, 42, (6), pp. 865–878.
35. 35)
  - 7. Zhou, F., Torre, F.D.L., Hodgins, J.K.: ‘Hierarchical aligned cluster analysis for temporal clustering of human motion’, IEEE Trans. Pattern Anal. Mach. Intell., 2013, 35, (3), pp. 582–596.

Human motion segmentation using collaborative representations of 3D skeletal sequences

References

Related content