access icon free Supervised framework for automatic recognition and retrieval of interaction: a framework for classification and retrieving videos with similar human interactions

This study presents supervised framework for automatic recognition and retrieval of interactions (SAFARRIs), a supervised learning framework to recognise interactions such as pushing, punching, and hugging, between a pair of human performers in a video shot. The primary contribution of the study is to extend the vectors of locally aggregated descriptors (VLADs) as a compact and discriminative video encoding representation, to solve the complex class partitioning problem of recognising human interaction. An initial codebook is generated from the training set of video shots, by extracting feature descriptors around the spatiotemporal interest points computed across frames. A bag of action words is generated by encoding the first-order statistics of the visual words using VLAD. Support vector machine classifiers (1 against all) are trained using these codebooks. The authors have verified SAFARRI's accuracy for classification and retrieval (query by example). SAFARRI is free from tracking or recognition of body parts and capable of identifying the region of interaction in video shots. It gives superior retrieval and classification performances over recently proposed methods, on two publicly available human interaction datasets.

Inspec keywords: video retrieval; video coding; statistics; support vector machines; learning (artificial intelligence); pattern classification; vectors

Other keywords: supervised framework for automatic recognition and retrieval of interactions; discriminative video encoding representation; VLADs; complex class partitioning problem; bag of action words; supervised learning framework; SAFARRIs; video retrieval; vectors of locally aggregated descriptors; support vector machine classifiers; human interaction recognition; visual word first-order statistics; video classification

Subjects: Other topics in statistics; Other topics in statistics; Algebra; Video signal processing; Knowledge engineering techniques; Image and video coding; Algebra; Information retrieval techniques

References

    1. 1)
      • 12. Vahdat, A., Ranjbar, M., Mori, G.: ‘A discriminative key pose sequence model for recognizing human interactions’. Int. Conf. on Computer Vision Workshops, 2011, pp. 17291736.
    2. 2)
      • 42. Arandjelovic, R., Zisserman, A.: ‘All about VLAD’. Computer Vision and Pattern Recognition, 2013, pp. 15781585.
    3. 3)
    4. 4)
      • 27. Ballas, N., Delezoide, B., Prêeteux, F.: ‘Trajectory signature for action recognition in video’. Int. Conf. on Multimedia, 2012, pp. 14291432.
    5. 5)
      • 45. Guo, Y., Zhao, G., Chen, J., et al: ‘Dynamic texture synthesis using a spatial temporal descriptor’. Int. Conf. on Image Processing, 2009, pp. 22772280.
    6. 6)
      • 16. Poiesi, F., Cavallaro, A.: ‘Predicting and recognizing human interactions in public spaces’, J. Real-Time Image Process., DOI: 10.1007/s11554-014-0428-8, 2014, pp. 119.
    7. 7)
      • 49. Ryoo, M.S., Aggarwal, J.K.: ‘UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA)’, 2010. Available at http://www.cvrc.ece.utexas.edu/SDHA2010/HumanInteraction.html.
    8. 8)
      • 43. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. Computer Vision and Pattern Recognition, 2005, pp. 886893.
    9. 9)
    10. 10)
    11. 11)
      • 18. Li, R., Porfilio, P., Zickler, T.: ‘Finding group interactions in social clutter’. Computer Vision and Pattern Recognition, 2013, pp. 27222729.
    12. 12)
    13. 13)
      • 38. Jégou, H., Douze, M., Schmid, C.: ‘Packing bag-of-features’. Int. Conf. on Computer Vision, 2009, pp. 23572364.
    14. 14)
      • 36. Salarifard, R., Hosseini, M.A., Karimian, M., et al: ‘A robust SIFT-based descriptor for video classification’. Int. Conf. on Machine Vision, 2015, pp. 94451E94451E-5.
    15. 15)
      • 13. Yuan, F., Xia, G., Sahbi, H., et al: ‘Spatio-temporal interest points chain (STIPC) for activity recognition’. Asian Conf. on Pattern Recognition, 2011, pp. 2226.
    16. 16)
      • 5. Park, S., Aggarwal, J.K.: ‘Recognition of human interaction using multiple features in gray scale images’. Int. Conf. on Pattern Recognition, 2000, pp. 5154.
    17. 17)
    18. 18)
    19. 19)
      • 30. Zhang, X., Cui, J., Tian, L., et al: ‘Local spatio-temporal feature based voting framework for complex human activity detection and localization’. Asian Conf. on Pattern Recognition, 2011, pp. 1216.
    20. 20)
      • 37. Liu, W., Jia, K., Wang, Z., et al: ‘Video retrieval algorithm based on video fingerprints and spatiotemporal information’. Int. Conf. on Signal Processing, 2014, pp. 13211325.
    21. 21)
    22. 22)
      • 34. Gao, H.P., Yang, Z.Q.: ‘Content based video retrieval using spatiotemporal salient objects’. Int. Symp. on Intelligence Information Processing and Trusted Computing, 2010, pp. 689692.
    23. 23)
    24. 24)
    25. 25)
    26. 26)
    27. 27)
      • 41. Revaud, J., Douze, M., Schmid, C., et al: ‘Event retrieval in large video collections with circulant temporal encoding’. Computer Vision and Pattern Recognition, 2013, pp. 24592466.
    28. 28)
      • 29. Amer, M.R., Todorovic, S.: ‘A chains model for localizing participants of group activities in videos’. Int. Conf. on Computer Vision, 2011, pp. 786793.
    29. 29)
      • 10. Ryoo, M.S.: ‘Human activity prediction: early recognition of ongoing activities from streaming videos’. Int. Conf. on Computer Vision, 2011, pp. 10361043.
    30. 30)
    31. 31)
      • 9. Yu, T.H., Kim, T.K., Cipolla, R.: ‘Real-time action recognition by spatiotemporal semantic and structural forests’. British Machine Vision Conf., 2010, pp. 52.152.12.
    32. 32)
    33. 33)
      • 25. Yun, K., Honorio, J., Chattopadhyay, D., et al: ‘Two-person interaction detection using body-pose features and multiple instance learning’. Computer Vision and Pattern Recognition Workshops, 2012, pp. 2835.
    34. 34)
    35. 35)
      • 11. Brendel, W., Todorovic, S.: ‘Learning spatiotemporal graphs of human activities’. Int. Conf. on Computer Vision, 2011, pp. 778785.
    36. 36)
      • 31. Jain, M., Jegou, H., Bouthemy, P.: ‘Better exploiting motion for better action recognition’. Computer Vision and Pattern Recognition, 2013, pp. 25552562.
    37. 37)
      • 6. Xing, D., Wang, X., Lu, H.: ‘Action recognition using hybrid feature descriptor and VLAD video encoding’. Asian Conf. of Computer Vision Workshop, 2015.
    38. 38)
      • 44. Chaudhry, R., Ravichandran, A., Hager, G., et al: ‘Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions’. Computer Vision and Pattern Recognition, 2009, pp. 19321939.
    39. 39)
      • 28. Ryoo, M., Yu, W.: ‘One video is sufficient? Human activity recognition using active video composition’. Workshop on Applications of Computer Vision, 2011, pp. 634641.
    40. 40)
      • 8. Ryoo, M.S., Joung, J., Choi, S., et al: ‘Incremental learning of novel activity categories from videos’. Int. Conf. on Virtual Systems and Multimedia, 2010, pp. 2126.
    41. 41)
    42. 42)
      • 3. Kong, Y., Jia, Y., Fu, Y.: ‘Learning human interaction by interactive phrases’. European Conf. on Computer Vision, 2012, pp. 300313.
    43. 43)
      • 2. El Houda Slimani, K.N., Benezeth, Y., Souami, F.: ‘Human interaction recognition based on the co-occurrence of visual words’. Computer Vision and Pattern Recognition Workshops, 2014, pp. 461466.
    44. 44)
      • 14. Meng, L., Qing, L., Yang, P., et al: ‘Activity recognition based on semantic spatial relation’. Int. Conf. on Pattern Recognition, 2012, pp. 609612.
    45. 45)
    46. 46)
    47. 47)
      • 40. Spyromitros-Xioufis, E., Papadopoulos, S., Kompatsiaris, I., et al: ‘An empirical study on the combination of SURF features with VLAD vectors for image search’. Int. Workshop on Image Analysis for Multimedia Interactive Services, 2012, pp. 14.
    48. 48)
    49. 49)
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2015.0189
Loading

Related content

content/journals/10.1049/iet-cvi.2015.0189
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading