In real human action recognition task, it is a common phenomenon that there are many unlabelled samples and few labelled samples. How to make good use of unlabelled samples to improve the generalisation ability of models is the focus of semi-supervised learning research. In this study, the authors present two semi-supervised methods based on long short-term memory (LSTM) to learn discriminative hidden features. One is the LSTM ladder network, the other is the Symmetrical LSTM network. By them unlabelled samples can be used automatically to improve learning performance without relying on external interaction. Both on the NTU-RGB + D dataset and the Kinetics dataset, their methods achieve >10 and 5% improvements, separately.

References

1. 1)
  - 1. Poppe, R.: ‘A survey on vision-based human action recognition’, Image Vis. Comput., 2010, 28, (6), pp. 976–990.
2. 2)
  - 13. Song, S., Lan, C., Xing, J., et al: ‘An end-to-end spatio-temporal attention model for human action recognition from skeleton data’. AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 2016.
3. 3)
  - 3. Zhang, S., Gao, C., Zhang, J., et al: ‘Discriminative part selection for human action recognition’, IEEE Trans. Multimed., 2018, 20, (4), pp. 769–780.
4. 4)
  - 16. Lafferty, J., Wasserman, L.: ‘Statistical analysis of semi-supervised regression’. Int. Conf. Neural Information Processing Systems, New York, NY, USA, 2007.
5. 5)
  - 12. Yan, S., Xiong, Y., Lin, D.: ‘Spatial temporal graph convolutional networks for skeleton-based action recognition’. AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2018.
6. 6)
  - 7. Gers, F.A., Schmidhuber, J., Cummins, F., et al: ‘Learning to forget: continual prediction with LSTM’, Neural Comput., 2000, 12, (10), pp. 2451–2471.
7. 7)
  - 8. Chang, C.C., Lin, C.J.: ‘LIBSVM: a library for support vector machines’. ACM Transactions on Intelligent Systems and Technology, 2011, 27.
8. 8)
  - 4. Zhang, Z., Ma, X., Song, R., et al: ‘Deep learning based human action recognition: a survey’. Chinese Automation Congress, Jinan, People's Republic of China, 2018.
9. 9)
  - 9. Hsu, C.W., Lin, C.J.: ‘A comparison of methods for multiclass support vector machines’, IEEE Trans. Neural Netw., 2002, 13, (4), p. 1026.
10. 10)
  - 22. Provoost, T., Moens, M.F.: ‘Semi-supervised learning for the BioNLP gene regulation network’, BMC Bioinf., 2015, 16, (Suppl 10), pp. S4–S4.
11. 11)
  - 23. Grira, N.: ‘Active semi-supervised fuzzy clustering’, Pattern Recognit., 2008, 41, (5), pp. 1834–1844.
12. 12)
  - 5. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘Imagenet classification with deep convolutional neural networks’. Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2012.
13. 13)
  - 14. Chapelle, O., Lkopf, B.S., Zien, A.: ‘Semi-supervised learning’ in Chapelle, O., Schlkopf, B., Zien, A. (Eds.): ‘Handbook on neural information processing’ (The MIT Press, USA, 2013), pp. 13–16.
14. 14)
  - 20. Lor, S., Hong, S., Maheshwari, P.: ‘Divide-and-conquer minimal-cut bisectioning of task graphs’. Int. Conf. Massively Parallel Computing Systems, Ischia, Italy, 1994.
15. 15)
  - 17. Tu, W., Sun, S.: ‘Semi-supervised feature extraction for EEG classification’, Pattern Anal. Appl., 2013, 16, (2), pp. 213–222.
16. 16)
  - 19. Xiong, R., Wang, J., Zhang, N., et al: ‘Deep hybrid collaborative filtering for web service recommendation’, Expert Syst. Appl., 2018, 10, pp. 191–205, S0957417418303385.
17. 17)
  - 2. Tao, H., Zhu, W., Guo, X., et al: ‘Human action recognition based on scene semantics’, Multimed. Tools Appl., 2018, 2, pp. 1–22.
18. 18)
  - 10. Das, S., Koperski, M., Bremond, F., et al: ‘A fusion of appearance based CNNs and temporal evolution of skeleton with LSTM for daily living action recognition’. 2018, arXiv:1802.00421 [cs.CV].
19. 19)
  - 15. Zhou, Z.H., Li, M.: ‘Semi-supervised regression with co-training’. Int. Joint Conf. Artificial Intelligence, San Francisco, CA, USA, 2005.
20. 20)
  - 18. Koren, Y.: ‘Collaborative filtering with temporal dynamics’, Commun. ACM, 2010, 53, (4), pp. 89–97.
21. 21)
  - 24. Shahroudy, A., Liu, J., Ng, T.T., et al: ‘NTU RGB + D: a large scale dataset for 3D human activity analysis’. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016.
22. 22)
  - 11. Wang, H., Liang, W.: ‘Learning content and style: joint action recognition and person identification from human skeletons’, Pattern Recognit., 2018, 81, pp. 23–85, S0031320318301195.
23. 23)
  - 6. IEEE: ‘Proceedings of 1993 IEEE International Conference on Neural Networks (ICNN ‘93)’. IEEE Int. Conf. Neural Networks, New Orleans, LA, USA, 2002.
24. 24)
  - 21. Mahmoudian, B.: ‘On the existence of some skew-Gaussian random field models’, Stat. Probab. Lett., 2018, 137, pp. 331–335.

Semi-supervised long short-term memory for human action recognition

References

Related content