The authors present a novel online visual tracking algorithm via ensemble autoencoder (AE). In contrast to other existing deep model based trackers, the proposed algorithm is based on the theory that the image resolution has an influence on vision procedures. When the authors employ a deep neural network to represent the object, the resolution is corresponding to the network size. The authors apply a small network to represent the pattern in a relatively lower resolution and search the object in a relatively larger area of the neighbourhood. After roughly estimating the location of the object, the authors apply a large network, which can provide more detailed information, to estimate the state of the object more accurately. Thus, the authors employ a small AE mainly for position searching and a larger one mainly for scale estimating. When tracking an object, the two networks interact to operate under the framework of particle filtering. Extensive experiments on the benchmark dataset show that the proposed algorithm performs favourably compared with some state-of-the-art methods.

References

1. 1)
  - 13. Kuen, J., Lim, K.M., Lee, C.P.: ‘Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle’, Pattern Recognit., 2015, 48, (10), pp. 2964–2982.
2. 2)
  - 14. Zhang, K., Liu, Q., Wu, Y., et al: ‘Robust visual tracking via convolutional networks without training’, IEEE Trans. Image Process., 2016, 25, (4), pp. 1779–1792.
3. 3)
  - 28. Henriques, J.F., Caseiro, R., Martins, P., et al: ‘High-speed tracking with kernelized correlation filters’, IEEE Trans. Pattern Anal. Mach. Intell., 2015, 37, (3), pp. 583–596.
4. 4)
  - 19. Tang, F., Brennan, S., Zhao, Q., et al: ‘Co-tracking using semi-supervised support vector machines’. Proc. Int. Conf. on Computer Vision, Rio de Janeiro, Brazil, 2007, pp. 1–8.
5. 5)
  - 11. Jain, V., Seung, S.: ‘Natural image denoising with convolutional networks’. Proc. Neural Information Processing Systems, Vancouver, British Columbia, Canada, 2008, pp. 769–776.
6. 6)
  - 2. Kalal, Z., Mikolajczyk, K., Matas, J.: ‘Tracking learning detection’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, (7), pp. 1409–1422.
7. 7)
  - 7. Angelova, A., Krizhevsky, A., Vanhoucke, V.: ‘Pedestrian detection with a large-field-Of-view deep network’. Proc. IEEE Int. Conf. on Robotics and Automation, Seattle, Washington, USA, 2015, pp. 704–711.
8. 8)
  - 23. Ristic, B., Sanjeev, A., Neil, G.: ‘Beyond the Kalman filter: particle filters for tracking applications’ (Artech House, Boston, 2004), vol. 685.
9. 9)
  - 20. Zhang, J., Shugao, M., Stan, S.: ‘MEEM: robust tracking via multiple experts using entropy minimization’. European Conf. on Computer Vision, 2014, pp. 188–203.
10. 10)
  - 3. Adam, A., Rivlin, E., Shimshoni, I.: ‘Robust fragments-based tracking using the integral histogram’. Int. Conf. on Computer Vision and Pattern Recognition, New York, 2006, pp. 798–805.
11. 11)
  - 22. Krizhevsky, A., Hinton, G.: ‘Learning multiple layers of features from tiny images’, Technical report, Department of Computer Science, University of Toronto, 2009.
12. 12)
  - 30. Zhang, T., Liu, S., Xu, C., et al: ‘Structural sparse tracking’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, USA, 2015.
13. 13)
  - 26. Wu, Y., Lim, J., Yang, M.H.: ‘Online object tracking: a benchmark’. Int. Conf. on Computer Vision and Pattern Recognition, Portland, OR, USA, 2013, pp. 2411–2418.
14. 14)
  - 18. Nam, H., Baek, M., Han, B.: ‘Modeling and propagating CNNs in a tree structure for visual tracking’, arXiv preprint arXiv:1608.07242, 2016.
15. 15)
  - 12. Wang, N., Yeung, D.Y.: ‘Learning a deep compact image representation for visual tracking’. Proc. Advances in Neural Information Processing Systems, Lake Tahoe, Nevada, USA, 2013, pp. 809–817.
16. 16)
  - 29. Zhong, W., Lu, H., Yang, M.H.: ‘Robust object tracking via sparse collaborative appearance model’, TIP, 2014, 23, (5), pp. 2356–2368.
17. 17)
  - 16. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’. Proc. Int. Conf. on Learning Representations (ICLR), San Diego, USA, 2015.
18. 18)
  - 5. Hare, S., Golodetz, S., Saffari, A., et al: ‘Struck: structured output tracking with kernels’. Proc. IEEE Int. Conf. on Computer Vision, Barcelona, Spain, 2011, pp. 263–270.
19. 19)
  - 10. Hinton, G.E., Salakhutdinov, R.R.: ‘Reducing the dimensionality of data with neural networks’, Science, 2006, 313, (5786), pp. 504–507.
20. 20)
  - 17. Krizhevsky, A., Sutskever, I., Hinton, G. E.: ‘ImageNet classification with deep convolutional neural networks’. Proc. Neural Information Processing Systems, Lake Tahoe, Nevada, USA, 2012, pp. 1097–1105.
21. 21)
  - 15. Nam, H., Bohyung, H.: ‘Learning multi-domain convolutional neural networks for visual tracking’, arXiv preprint arXiv:1510.07945, 2015.
22. 22)
  - 9. Salakhutdinov, R., Hinton, G.: ‘Deep Boltzmann machines’. Proc. Artificial Intelligence and Statistics (AISTATS), Clearwater Beach, Florida, USA, 2009, pp. 448–455.
23. 23)
  - 4. Sevilla-Lara, L., Learned-Miller, E.: ‘Distribution fields for tracking’. Proc. Int. Conf. on Computer Vision and Pattern Recognition, Providence, Rhode Island, USA, 2012, pp. 1910–1917.
24. 24)
  - 24. Sonka, M., Vaclav, H., Roger, B.: ‘Segmentation’ in ‘Image processing, analysis, and machine vision’ (Springer, USA, 2014), pp. 112–191.
25. 25)
  - 6. He, K., Zhang, X., Ren, S., et al: ‘Spatial pyramid pooling in deep convolutional networks for visual recognition’. European Conf. on Computer Vision, Cham, 2014, pp. 346–361.
26. 26)
  - 21. Ma, C., Huang, J.B., Yang, X., et al: ‘Hierarchical convolutional features for visual tracking’. Proc. IEEE Int. Conf. on Computer Vision, Santiago, Chile, 2015, pp. 3074–3082.
27. 27)
  - 1. Babenko, B., Yang, M., Belongie, S.: ‘Visual tracking with online multiple instance learning’. Proc. Int. Conf. on Computer Vision and Pattern Recognition, Miami, Florida, 2009, pp. 983–990.
28. 28)
  - 27. Gao, J., Ling, H., Hu, W., et al: ‘Transfer learning based visual tracking with Gaussian processes regression’. Proc. European Conf. on Computer Vision, Zurich, Switzerland, 2014, pp. 188–203.
29. 29)
  - 25. Vincent, P., Larochelle, H., Lajoie, I., et al: ‘Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion’, J. Mach. Learn. Res., 2010, 11, pp. 3371–3408.
30. 30)
  - 8. Eigen, D., Christian, P., Rob, F.: ‘Depth map prediction from a single image using a multi-scale deep network’. Proc. Advances in Neural Information Processing Systems, Montréal, Canada, 2014, pp. 2366–2374.

Visual tracking via ensemble autoencoder

References

Related content