Visual tracking via ensemble autoencoder

Visual tracking via ensemble autoencoder

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Image Processing — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

The authors present a novel online visual tracking algorithm via ensemble autoencoder (AE). In contrast to other existing deep model based trackers, the proposed algorithm is based on the theory that the image resolution has an influence on vision procedures. When the authors employ a deep neural network to represent the object, the resolution is corresponding to the network size. The authors apply a small network to represent the pattern in a relatively lower resolution and search the object in a relatively larger area of the neighbourhood. After roughly estimating the location of the object, the authors apply a large network, which can provide more detailed information, to estimate the state of the object more accurately. Thus, the authors employ a small AE mainly for position searching and a larger one mainly for scale estimating. When tracking an object, the two networks interact to operate under the framework of particle filtering. Extensive experiments on the benchmark dataset show that the proposed algorithm performs favourably compared with some state-of-the-art methods.


    1. 1)
      • 1. Babenko, B., Yang, M., Belongie, S.: ‘Visual tracking with online multiple instance learning’. Proc. Int. Conf. on Computer Vision and Pattern Recognition, Miami, Florida, 2009, pp. 983990.
    2. 2)
      • 2. Kalal, Z., Mikolajczyk, K., Matas, J.: ‘Tracking learning detection’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, (7), pp. 14091422.
    3. 3)
      • 3. Adam, A., Rivlin, E., Shimshoni, I.: ‘Robust fragments-based tracking using the integral histogram’. Int. Conf. on Computer Vision and Pattern Recognition, New York, 2006, pp. 798805.
    4. 4)
      • 4. Sevilla-Lara, L., Learned-Miller, E.: ‘Distribution fields for tracking’. Proc. Int. Conf. on Computer Vision and Pattern Recognition, Providence, Rhode Island, USA, 2012, pp. 19101917.
    5. 5)
      • 5. Hare, S., Golodetz, S., Saffari, A., et al: ‘Struck: structured output tracking with kernels’. Proc. IEEE Int. Conf. on Computer Vision, Barcelona, Spain, 2011, pp. 263270.
    6. 6)
      • 6. He, K., Zhang, X., Ren, S., et al: ‘Spatial pyramid pooling in deep convolutional networks for visual recognition’. European Conf. on Computer Vision, Cham, 2014, pp. 346361.
    7. 7)
      • 7. Angelova, A., Krizhevsky, A., Vanhoucke, V.: ‘Pedestrian detection with a large-field-Of-view deep network’. Proc. IEEE Int. Conf. on Robotics and Automation, Seattle, Washington, USA, 2015, pp. 704711.
    8. 8)
      • 8. Eigen, D., Christian, P., Rob, F.: ‘Depth map prediction from a single image using a multi-scale deep network’. Proc. Advances in Neural Information Processing Systems, Montréal, Canada, 2014, pp. 23662374.
    9. 9)
      • 9. Salakhutdinov, R., Hinton, G.: ‘Deep Boltzmann machines’. Proc. Artificial Intelligence and Statistics (AISTATS), Clearwater Beach, Florida, USA, 2009, pp. 448455.
    10. 10)
      • 10. Hinton, G.E., Salakhutdinov, R.R.: ‘Reducing the dimensionality of data with neural networks’, Science, 2006, 313, (5786), pp. 504507.
    11. 11)
      • 11. Jain, V., Seung, S.: ‘Natural image denoising with convolutional networks’. Proc. Neural Information Processing Systems, Vancouver, British Columbia, Canada, 2008, pp. 769776.
    12. 12)
      • 12. Wang, N., Yeung, D.Y.: ‘Learning a deep compact image representation for visual tracking’. Proc. Advances in Neural Information Processing Systems, Lake Tahoe, Nevada, USA, 2013, pp. 809817.
    13. 13)
      • 13. Kuen, J., Lim, K.M., Lee, C.P.: ‘Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle’, Pattern Recognit., 2015, 48, (10), pp. 29642982.
    14. 14)
      • 14. Zhang, K., Liu, Q., Wu, Y., et al: ‘Robust visual tracking via convolutional networks without training’, IEEE Trans. Image Process., 2016, 25, (4), pp. 17791792.
    15. 15)
      • 15. Nam, H., Bohyung, H.: ‘Learning multi-domain convolutional neural networks for visual tracking’, arXiv preprint arXiv:1510.07945, 2015.
    16. 16)
      • 16. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’. Proc. Int. Conf. on Learning Representations (ICLR), San Diego, USA, 2015.
    17. 17)
      • 17. Krizhevsky, A., Sutskever, I., Hinton, G. E.: ‘ImageNet classification with deep convolutional neural networks’. Proc. Neural Information Processing Systems, Lake Tahoe, Nevada, USA, 2012, pp. 10971105.
    18. 18)
      • 18. Nam, H., Baek, M., Han, B.: ‘Modeling and propagating CNNs in a tree structure for visual tracking’, arXiv preprint arXiv:1608.07242, 2016.
    19. 19)
      • 19. Tang, F., Brennan, S., Zhao, Q., et al: ‘Co-tracking using semi-supervised support vector machines’. Proc. Int. Conf. on Computer Vision, Rio de Janeiro, Brazil, 2007, pp. 18.
    20. 20)
      • 20. Zhang, J., Shugao, M., Stan, S.: ‘MEEM: robust tracking via multiple experts using entropy minimization’. European Conf. on Computer Vision, 2014, pp. 188203.
    21. 21)
      • 21. Ma, C., Huang, J.B., Yang, X., et al: ‘Hierarchical convolutional features for visual tracking’. Proc. IEEE Int. Conf. on Computer Vision, Santiago, Chile, 2015, pp. 30743082.
    22. 22)
      • 22. Krizhevsky, A., Hinton, G.: ‘Learning multiple layers of features from tiny images’, Technical report, Department of Computer Science, University of Toronto, 2009.
    23. 23)
      • 23. Ristic, B., Sanjeev, A., Neil, G.: ‘Beyond the Kalman filter: particle filters for tracking applications’ (Artech House, Boston, 2004), vol. 685.
    24. 24)
      • 24. Sonka, M., Vaclav, H., Roger, B.: ‘Segmentation’ in ‘Image processing, analysis, and machine vision’ (Springer, USA, 2014), pp. 112191.
    25. 25)
      • 25. Vincent, P., Larochelle, H., Lajoie, I., et al: ‘Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion’, J. Mach. Learn. Res., 2010, 11, pp. 33713408.
    26. 26)
      • 26. Wu, Y., Lim, J., Yang, M.H.: ‘Online object tracking: a benchmark’. Int. Conf. on Computer Vision and Pattern Recognition, Portland, OR, USA, 2013, pp. 24112418.
    27. 27)
      • 27. Gao, J., Ling, H., Hu, W., et al: ‘Transfer learning based visual tracking with Gaussian processes regression’. Proc. European Conf. on Computer Vision, Zurich, Switzerland, 2014, pp. 188203.
    28. 28)
      • 28. Henriques, J.F., Caseiro, R., Martins, P., et al: ‘High-speed tracking with kernelized correlation filters’, IEEE Trans. Pattern Anal. Mach. Intell., 2015, 37, (3), pp. 583596.
    29. 29)
      • 29. Zhong, W., Lu, H., Yang, M.H.: ‘Robust object tracking via sparse collaborative appearance model’, TIP, 2014, 23, (5), pp. 23562368.
    30. 30)
      • 30. Zhang, T., Liu, S., Xu, C., et al: ‘Structural sparse tracking’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, USA, 2015.

Related content

This is a required field
Please enter a valid email address