access icon free Visual tracking based on semantic and similarity learning

We present a method by combining the similarity and semantic features of a target to improve tracking performance in video sequences. Trackers based on Siamese networks have achieved success in recent competitions and databases through learning similarity according to binary labels. Unfortunately, such weak labels result in limiting the discriminative ability of the learned feature, thus it is difficult to identify the target itself from the distractors that have the same class. The authors observe that the inter-class semantic features benefit to increase the separation between the target and the background, even distractors. Therefore, they proposed a network architecture which uses both similarity and semantic branches to obtain more discriminative features for locating the target accuracy in new frames. The large-scale ImageNet VID dataset is employed to train the network. Even in the presence of background clutter, visual distortion, and distractors, the proposed method still maintains following the target. They test their method with the open benchmarks OTB and UAV123. The results show that their combined approach significantly improves the tracking ability relative to trackers using similarity or semantic features alone.

Inspec keywords: image sequences; learning (artificial intelligence); video signal processing; object tracking; feature extraction

Other keywords: large-scale ImageNet VID dataset; video sequences; Siamese networks; visual tracking; network architecture; visual distortion; semantic branches; tracking performance; tracking ability; binary labels; discriminative features; inter-class semantic features; learning similarity; target accuracy; discriminative ability; weak labels

Subjects: Computer vision and image processing techniques; Video signal processing; Knowledge engineering techniques; Image recognition

References

    1. 1)
      • 25. Liu, W., Wen, Y., Yu, Z., et al: ‘Large-margin softmax loss for convolutional neural networks’. Proc. of The 33rd Int. Conf. on Machine Learning, New York, NY, USA, 2016, pp. 507516.
    2. 2)
      • 37. Qi, Y., Zhang, S., Qin, L., et al: ‘Hedged deep tracking’. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 43034311.
    3. 3)
      • 24. Jia, Y., Shelhamer, E., Donahue, J., et al: ‘Caffe: convolutional architecture for fast feature embedding’, arXiv preprint arXiv:14085093, 2014.
    4. 4)
      • 28. Song, H.O., Xiang, Y., Jegelka, S., et al: ‘Deep metric learning via lifted structured feature embedding’. 2016 IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, Las Vegas, NV, USA, 2016, pp. 40044012.
    5. 5)
      • 3. Pflugfelder, R.P.: ‘Siamese learning visual tracking: a survey’, CoRR, 2017, abs/1707.00569.
    6. 6)
      • 23. Song, Y., Ma, C., Gong, L., et al: ‘Crest: convolutional residual learning for visual tracking’. IEEE Int. Conf. on Computer Vision, Honolulu, HI, USA, 2017, pp. 25552564.
    7. 7)
      • 20. Tian, Y., Fan, B., Wu, F.: ‘L2-net: deep learning of discriminative patch descriptor in Euclidean space’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 61286136.
    8. 8)
      • 2. Li, P., Wang, D., Wang, L., et al: ‘Deep visual tracking: review and experimental comparison’, Pattern Recognit., 2018, 76, pp. 323338.
    9. 9)
      • 38. Danelljan, M., Häger, G., Khan, F.S.: ‘Accurate scale estimation for robust visual tracking’. British Machine Vision Conf., Nottingham, UK, 2014, pp. 65.165.11.
    10. 10)
      • 4. Wu, Y., Lim, J., Yang, M.: ‘Object tracking benchmark’, IEEE Trans. Pattern Anal. Mach. Intell., 2015, 37, (9), pp. 18341848.
    11. 11)
      • 36. Danelljan, M., Robinson, A., Khan, F.S., et al: ‘Beyond correlation filters: learning continuous convolution operators for visual tracking’. European Conf. on Computer Vision Springer, Las Vegas, NV, USA, 2016, pp. 472488.
    12. 12)
      • 12. Wang, Q., Teng, Z., Xing, J., et al: ‘Learning attentions: residual attentional Siamese network for high performance online visual tracking’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018, pp. 48544863.
    13. 13)
      • 11. Dong, X., Shen, J.: ‘Triplet loss in Siamese network for object tracking’. The European Conf. on Computer Vision (ECCV), Munich, Germany, 2018, pp. 472488.
    14. 14)
      • 26. Hadsell, R., Chopra, S., LeCun, Y.: ‘Dimensionality reduction by learning an invariant mapping’. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2006, pp. 17351742.
    15. 15)
      • 34. Russakovsky, O., Deng, J., Su, H., et al: ‘Imagenet large scale visual recognition challenge’, Int. J. Comput. Vis., 2015, 115, (3), pp. 211252.
    16. 16)
      • 32. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘Imagenet classification with deep convolutional neural networks’. Advances in Neural Information Processing Systems, Lake Tahoe, Nevada, USA, 2012, pp. 11061114.
    17. 17)
      • 16. Baldi, P., Chauvin, Y.: ‘Neural networks for fingerprint recognition’, Neural Comput., 1993, 5, (3), pp. 402418.
    18. 18)
      • 19. Wen, Y., Zhang, K., Li, Z., et al: ‘A discriminative feature learning approach for deep face recognition’. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 499515.
    19. 19)
      • 6. Ma, C., Huang, J., Yang, X., Yang, M.: ‘Hierarchical convolutional features for visual tracking’. IEEE Int. Conf. on Computer Vision, Santiago, Chile, 2015, pp. 30743082.
    20. 20)
      • 22. Held, D., Thrun, S., Savarese, S.: ‘Learning to track at 100 FPS with deep regression networks’. Computer Vision – ECCV, Amsterdam, The Netherlands, 2016, pp. 749765.
    21. 21)
      • 43. Henriques, J.F., Rui, C., Martins, P., et al: ‘High-speed tracking with kernelized correlation filters’, IEEE Trans. Pattern Anal. Mach. Intell., 2015, 37, (3), pp. 583596.
    22. 22)
      • 14. He, A., Luo, C., Tian, X., et al: ‘A twofold Siamese network for real-time object tracking’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 2018, pp. 48344843.
    23. 23)
      • 35. Vedaldi, A., Lenc, K.: ‘Matconvnet – convolutional neural networks for matlab’. Proc. of the 25th annual ACM int. Conf. on Multimedia, Brisbane, Australia, 2015, pp. 689692.
    24. 24)
      • 42. Hare, S., Saffari, A., Torr, P.H.S.: ‘Struck: structured output tracking with kernels’. IEEE Int. Conf. on Computer Vision, Barcelona, Spain, 2011, pp. 263270.
    25. 25)
      • 27. Schroff, F., Kalenichenko, D., Philbin, J.: ‘Facenet: a unified embedding for face recognition and clustering’. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, Boston, MA, USA, 2015, pp. 815823.
    26. 26)
      • 15. Chen, K., Tao, W.: ‘Once for all: a two-flow convolutional neural network for visual tracking’, IEEE Trans. Circuits Syst. Video Technol., 2018, 28, (12), pp. 33773386.
    27. 27)
      • 10. Guo, Q., Feng, W., Zhou, C., et al: ‘Learning dynamic Siamese network for visual object tracking’. The IEEE Int. Conf. on Computer Vision (ICCV), Venice, Italy, 2017, pp. 17811789.
    28. 28)
      • 13. Wang, Q., Zhang, M., Xing, J., et al: ‘Do not lose the details: reinforced representation learning for high performance visual tracking’. Proc. of the Twenty-Seventh Int. Joint Conf. on Artificial Intelligence, IJCAI. (Int. Joint Conf.s on Artificial Intelligence Organization, Stockholm, Sweden, 2018, pp. 985991.
    29. 29)
      • 31. Zheng, Z., Zheng, L., Yang, Y.: ‘A discriminatively learned CNN embedding for person re-identification’, Acm Trans. Multimed. Comput. Commun. Appl., 2018, 14, (1), 13:113:20.
    30. 30)
      • 17. Bromley, J., Guyon, I., LeCun, Y., et al: ‘Signature verification using a Siamese time delay neural network’. Advances in Neural Information Processing Systems, Denver, CO, USA, 1993, pp. 737744.
    31. 31)
      • 40. Zhang, J., Ma, S., Sclaroff, S.: ‘MEEM: robust tracking via multiple experts using entropy minimizationEuropean Conf. on Computer Vision, Zurich, Switzerland, 2014, pp. 188203.
    32. 32)
      • 7. Valmadre, J., Bertinetto, L., Henriques, J.F., et al: ‘End-to-end representation learning for correlation filter based tracking’. 2017 IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, Honolulu, Hawaii, 2017, pp. 50005008.
    33. 33)
      • 33. Deng, J., Dong, W., Socher, R., et al: ‘Imagenet: a large-scale hierarchical image database’. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR), Miami, Florida, USA, 2009, pp. 248255.
    34. 34)
      • 8. Bertinetto, L., Valmadre, J., Henriques, J.F., et al: ‘Fully-convolutional Siamese networks for object tracking’. European Conf. on Computer Vision, Amsterdam, The Netherlands, 2016, pp. 850865.
    35. 35)
      • 39. Wang, Q., Gao, J., Xing, J., et al: ‘Dcfnet: discriminant correlation filters network for visual tracking’, CoRR, abs/1704.04057, 2017.
    36. 36)
      • 29. Hermans, A., Beyer, L., Leibe, B.: ‘In defense of the triplet loss for person re-identification’, CoRR, abs/1703.07737, 2017.
    37. 37)
      • 1. Smeulders, A.W.M., Chu, D.M., Cucchiara, R., et al: ‘Visual tracking: an experimental survey’, IEEE Trans. Pattern Anal. Mach. Intell., 2014, 36, (7), pp. 14421468.
    38. 38)
      • 9. Tao, R., Gavves, E., Smeulders, A.W.M.: ‘Siamese instance search for tracking’. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 14201429.
    39. 39)
      • 41. Kalal, Z, Mikolajczyk, K, Matas, J.: ‘Tracking-learning-detection.’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, (7), pp. 14091422.
    40. 40)
      • 18. Liu, W., Wen, Y., Yu, Z., et al: ‘Sphereface: deep hypersphere embedding for face recognition’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 67386746.
    41. 41)
      • 5. Mueller, M., Smith, N., Ghanem, B.: ‘A benchmark and simulator for UAV tracking’. European Conf. on Computer Vision, Amsterdam, The Netherlands, 2016, pp. 445461.
    42. 42)
      • 30. Geng, M., Wang, Y., Xiang, T., et al: ‘Deep transfer learning for person re-identification’, 2016. Available at: https://www.semanticscholar.org/paper/Deep-Transfer-Learning-for-Person-Re-Identification-Chen-Wang/6e0406e765a56f673ea4ff3402d92fd1172826a9.
    43. 43)
      • 21. Dosovitskiy, A., Fischer, P., Ilg, E., et al: ‘Flownet: learning optical flow with convolutional networks’. In: IEEE Int. Conf. on Computer Vision, ICCV, Santiago, Chile, 2015, pp. 27582766.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2018.5826
Loading

Related content

content/journals/10.1049/iet-cvi.2018.5826
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading