Efficient indexing and retrieving objects of interest from large-scale surveillance videos are a significant and challenging topic. In this study, the authors present an effective multiple deep features learning approach for object retrieval in surveillance videos. Based on the discriminative convolutional neural network (CNN), they can learn multiple deep features to comprehensively describe the visual object. To be specific, they utilise the CNN model pre-trained on ImageNet ILSVRC12 and fine-tuned on our dataset to abstract structure information. In addition, they train another CNN model supervised by 11 colour names to deliver the colour information. To improve the retrieval performance, the deep features are encoded into short binary codes by locality-sensitive hash and fused to fast retrieve the object of interest. Retrieval experiments are performed on a dataset of 100k objects extracted from multi-camera surveillance videos. Comparison results with other common visual features show the effectiveness of the proposed approach.

References

1. 1)
  - 22. Zhang, Y., Wang, J., Fu, W., et al: ‘Specific vehicle detection and tracking in road environment’. Proc. of the Third Int. Conf. on Internet Multimedia Computing and Service, August 2011, pp. 182–186.
2. 2)
  - 10. Krizhevsky, A., Hinton, G.E.: ‘Using very deep autoencoders for content-based image retrieval’. ESANN, 2011.
3. 3)
  - 20. Berlin, B., Kay, P.: ‘Basic color terms: their universality and evolution’. (Univ of California Press, 1991).
4. 4)
  - 4. Yang, M., Yu, K.: ‘Real-time clothing recognition in surveillance videos’. IEEE Int. Conf. on Image Processing (ICIP), September 2011, pp. 2937–2940.
5. 5)
  - 23. Jia, Y., Shelhamer, E., Donahue, J., et al: ‘Caffe: convolutional architecture for fast feature embedding’. Proc. of the ACM Int. Conf. on Multimedia, November 2014, pp. 675–678.
6. 6)
  - 5. Lowe, D.G.: ‘Object recognition from local scale-invariant features’. IEEE Int. Conf. on Computer Vision, 1999, vol. 2, pp. 1150–1157.
7. 7)
  - 25. Gong, Y., Lazebnik, S.: ‘Iterative quantization: a procrustean approach to learning binary codes’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 817–824.
8. 8)
  - 19. LeCun, Y., Boser, B., Denker, J.S., et al: ‘Backpropagation applied to handwritten zip code recognition’, Neural Comput., 1989, 1, (4), pp. 541–551.
9. 9)
  - 8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘Imagenet classification with deep convolutional neural networks’, Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.
10. 10)
  - 7. Mikolajczyk, K., Schmid, C.: ‘A performance evaluation of local descriptors’, IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27, (10), pp. 1615–1630.
11. 11)
  - 1. Calderara, S., Cucchiara, R., Prati, A.: ‘Multimedia surveillance: content-based retrieval with multicamera people tracking’. Proc. of the 4th ACM Int. Workshop on Video Surveillance and Sensor Networks, October 2006, pp. 95–100.
12. 12)
  - 12. Zhang, X., Zou, J., He, K., et al: ‘Accelerating very deep convolutional networks for classification and detection’. http://www.arxiv.org/abs/1505.06798, accessed 27 October 2015.
13. 13)
  - 14. Charikar, M.S.: ‘Similarity estimation techniques from rounding algorithm’. Proc. of the 34th Annual ACM Symp. on Theory of Computing, May 2002, pp. 380–388.
14. 14)
  - 24. Chatzichristofis, S.A., Boutalis, Y.S.: ‘CEDD: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval’. Computer vision systems, (Springer Berlin Heidelberg, 2008), pp. 312–322.
15. 15)
  - 11. Razavian, A.S., Azizpour, H., Sullivan, J., et al: ‘CNN features off-the-shelf: an astounding baseline for recognition’. IEEE Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2014, pp. 512–519.
16. 16)
  - 3. Thornton, J., Baran-Gale, J., Butler, D., et al: ‘Person attribute search for large-area video surveillance’. IEEE Int. Conf. on Technologies for Homeland Security (HST), November 2011, pp. 55–61.
17. 17)
  - 2. Feris, R., Siddiquie, B., Zhai, Y., et al: ‘Attribute-based vehicle search in crowded surveillance videos’. Proc. of the 1st ACM Int. Conf. on Multimedia Retrieval, April 2011, p. 18.
18. 18)
  - 9. Zeiler, M.D., Fergus, R.: ‘Visualizing and understanding convolutional networks’. Computer Vision–ECCV2014, (Springer International Publishing, 2014), pp. 818–833.
19. 19)
  - 21. ‘Colour Names Data Sets’, http://www.lear.inrialpes.fr/data, accessed 26 May 2015.
20. 20)
  - 15. Annesley, J., Orwell, J., Renno, J.P.: ‘Evaluation of MPEG7 colour descriptors for visual surveillance retrieval’. 2nd Joint IEEE Int. Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, October 2005, pp. 105–112.
21. 21)
  - 17. Le, T.L., Thonnat, M., Boucher, A., et al: ‘Appearance based retrieval for tracked objects in surveillance videos’. Proc. of the ACM Int. Conf. on Image and Video Retrieval, July 2009, p. 40.
22. 22)
  - 13. Van De Weijer, J., Schmid, C., Verbeek, J.: ‘Learning color names from real-world images’. IEEE Conf. on Computer Vision and Pattern Recognition, June 2007, pp. 1–8.
23. 23)
  - 18. Jiang, Y., Meng, J., Yuan, J., et al: ‘Randomized spatial context for object search’, IEEE Trans. Image Process. (T-IP), 2015, 24, (6), pp. 1748–1762.
24. 24)
  - 6. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, 2005, vol. 1, pp. 886–893.
25. 25)
  - 16. Mitrea, C.A., Mironica, I., Ionescu, B., et al: ‘Multiple instance-based object retrieval in video surveillance: dataset and evaluation’. IEEE Int. Conf. on Intelligent Computer Communication and Processing (ICCP), September 2014, pp. 171–178.

Multiple deep features learning for object retrieval in surveillance videos

References

Related content