Your browser does not support JavaScript!

Fine-grained recognition of maritime vessels and land vehicles by deep feature embedding

Fine-grained recognition of maritime vessels and land vehicles by deep feature embedding

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Computer Vision — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Recent advances in large-scale image and video analysis have empowered the potential capabilities of visual surveillance systems. In particular, deep learning-based approaches bring in substantial benefits in solving certain computer vision problems such as fine-grained object recognition. Here, the authors mainly concentrate on classification and identification of maritime vessels and land vehicles, which are the key constituents of visual surveillance systems. Employing publicly available data sets for maritime vessels and land vehicles, the authors aim to improve visual recognition. Specifically, the authors focus on five tasks regarding visual recognition; coarse-grained classification, fine-grained classification, coarse-grained retrieval, fine-grained retrieval, and verification. To increase the performance in these tasks, the authors utilise a multi-task learning framework and present a novel loss function which simultaneously considers deep feature learning and classification by exploiting the available hierarchical labels of individual samples and the global statistics of distances between the data pairs. The authors observe that the proposed multi-task learning model improves the fine-grained recognition performance on MARVEL and Stanford Cars data sets, compared to training of a model targeting a single recognition task.


    1. 1)
      • 48. Chatfield, K., Simonyan, K., Vedaldi, A., et al: ‘Return of the devil in the details: delving deep into convolutional nets’. British Machine Vision Conf., Nottingham, UK, 2014.
    2. 2)
      • 45. Chechik, G., Sharma, V., Shalit, U., et al: ‘Large scale online learning of image similarity through ranking’, J. Mach. Learn. Res., 2010, 11, pp. 11091135.
    3. 3)
      • 9. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘Imagenet classification with deep convolutional neural networks’, Adv. Neural Inf. Process. Syst., 2012.
    4. 4)
      • 28. Hadsell, R., Chopra, S., LeCun, Y.: ‘Dimensionality reduction by learning an invariant mapping’. Proc. of the 2006 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition – Volume 2, CVPR'06, Washington, DC, USA, 2006, pp. 17351742.
    5. 5)
      • 33. Kumar, B.G.V., Carneiro, G., Reid, I.: ‘Learning local image descriptors with deep Siamese and triplet convolutional networks by minimising global loss functions’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June 2016.
    6. 6)
      • 7. Gundogdu, E., Solmaz, B., Koç, A., et al: ‘Deep distance metric learning for maritime vessel identification’. 2017 25th Signal Processing and Communications Applications Conf. (SIU), Antalya, Turkey, May 2017, pp. 14.
    7. 7)
      • 53. Lowe, D.G.: ‘Distinctive image features from scale-invariant keypoints’, Int. J. Comput. Vis., 2004, 60, (2), pp. 91110.
    8. 8)
      • 49. Chang, C.-C., Lin, C.-J.: ‘LIBSVM: a library for support vector machines’, ACM Trans. Intell. Syst. Technol., 2011, 2, 27:127:27. Software available at
    9. 9)
      • 46. Wolhart, P., Lepetit, V.: ‘Learning descriptors for object recognition and 3d pose estimation’. IEEE Int. Conf. on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015.
    10. 10)
      • 17. Solmaz, B., Gundogdu, E., Yucesoy, V., et al: ‘Generic and attribute-specific deep representations for maritime vessels’, IPSJ Trans. Comput. Vis. Appl., 2017, 9, (1), p. 22.
    11. 11)
      • 34. Davis, J.V., Kulis, B., Jain, P., et al: ‘Information-theoretic metric learning’. Proc. of the 24th Int. Conf. on Machine Learning, ICML'07, New York, NY, USA, 2007, pp. 209216.
    12. 12)
      • 30. Wang, J., Song, Y., Leung, T., et al: ‘Learning fine-grained image similarity with deep ranking’. Proc. of the 2014 IEEE Conf. on Computer Vision and Pattern Recognition, CVPR'14, Washington, DC, USA, 2014, pp. 13861393.
    13. 13)
      • 50. Vedaldi, A., Lenc, K.: ‘Matconvnet – convolutional neural networks for MATLAB’,, accessed 9th January 2018.
    14. 14)
      • 38. Garcia, J., Martinel, N., Gardel, A., et al: ‘Discriminant context information analysis for post-ranking person re-identification’, IEEE Trans. Image Process., 2017, 26, (4), pp. 16501665.
    15. 15)
      • 44. Teney, D., Brown, M., Kit, D., et al: ‘Learning similarity metrics for dynamic scene segmentation’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 20842093.
    16. 16)
      • 32. Yi, D., Lei, Z., Liao, S., et al: ‘Learning face representation from scratch’,, accessed 9th January.
    17. 17)
      • 55. Rother, C., Kolmogorov, V., Blake, A.: ‘at“Grabcut”: interactive foreground extraction using iterated graph cuts’. ACM SIGGRAPH 2004 Papers, SIGGRAPH'04, New York, NY, USA, 2004, pp. 309314.
    18. 18)
      • 14. Angelova, A., Zhu, S.: ‘Efficient object detection and segmentation for finegrained recognition’. 2013 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 2013, pp. 811818.
    19. 19)
      • 52. Chai, Y., Lempitsky, V., Zisserman, A.: ‘Symbiotic segmentation and part localization for fine-grained categorization’. IEEE Int. Conf. on Computer Vision, 2013.
    20. 20)
      • 6. Yang, L., Luo, P., Loy, C.C., et al: ‘A large-scale car dataset for fine-grained categorization and verification’. 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, June 2015, pp. 39733981.
    21. 21)
      • 26. Lai, Z., Wan, M., Jin, Z.: ‘Locality preserving embedding for face and handwriting digital recognition’, Neural Comput. Appl., 2011, 20, (4), p. 565.
    22. 22)
      • 11. Zhang, X., Zhou, F., Lin, Y., et al: ‘Embedding label structures for finegrained feature representation’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June 2016.
    23. 23)
      • 36. Bromley, J., Guyon, I., LeCun, Y., et al: ‘Signature verification using a ‘Siamese’ time delay neural network’, in Cowan, J. D., Tesauro, G., Alspector, J., (Eds.): ‘Advances in neural information processing systems 6’ (Morgan-Kaufmann, New York, NY, USA, 1994), pp. 737744.
    24. 24)
      • 41. Varior, R.R., Shuai, B., Lu, J., et al: ‘A Siamese long short-term memory architecture for human re-identification’. European Conf. on Computer Vision, Springer, 2016, pp. 135153.
    25. 25)
      • 23. Rao, C.R.: ‘The utilization of multiple measurements in problems of biological classification’, J. R. Stat. Soc. Series B, 1948, 10, (2), pp. 159203.
    26. 26)
      • 40. Martinel, N., Micheloni, C., Foresti, G. L.: ‘A pool of multiple person re-identification experts’, Pattern Recognit. Lett., 2016, 71, pp. 2330.
    27. 27)
      • 3. Vedaldi, A., Mahendran, S., Tsogkas, S., et al: ‘Understanding objects in detail with fine-grained attributes’. 2014 IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, Ohio, USA, June 2014, pp. 36223629.
    28. 28)
      • 42. Wang, J., Zhou, S., Wang, J., et al: ‘Deep ranking model by large adaptive margin learning for person re-identification’, Pattern Recognit., 2018, 74, pp. 241252.
    29. 29)
      • 18. Solmaz, B., Yücesoy, V., Koç, A.: ‘Automated visual classification of indoor scenes and architectural styles’. 2017 25th Signal Processing and Communications Applications Conf. (SIU), Antalya, Turkey, 2017, pp. 14.
    30. 30)
      • 16. Liu, J., Kanazawa, A., Jacobs, D., et al: ‘Dog breed classification using part localization’. European Conf. on Computer Vision, Springer, 2012, pp. 172185.
    31. 31)
      • 15. Branson, S., Van Horn, G., Belongie, S., et al: ‘Bird species categorization using pose normalized deep convolutional nets’, arXiv preprint arXiv:1406.2952, 2014.
    32. 32)
      • 56. Zhang, X., Xiong, H., Zhou, W., et al: ‘Picking neural activations for fine-grained recognition’, IEEE Trans. Multimed., 2017, PP, (99), pp. 11.
    33. 33)
      • 5. Krause, J., Stark, M., Deng, J., et al: ‘3d object representations for finegrained categorization’. 2013 IEEE Int. Conf. on Computer Vision Workshops (ICCVW), Sydney, NSW, Australia, December 2013, pp. 554561.
    34. 34)
      • 8. Gundogdu, E., Solmaz, B., Yucesoy, V., et al: ‘Marvel: a large-scale image dataset for maritime vessels’. ACCV, Taipei, Taiwan, 2016.
    35. 35)
      • 31. Sun, Y., Chen, Y., Wang, X., et al: ‘Deep learning face representation by joint identification-verification’, in Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q., (Eds.): ‘Advances in neural information processing systems 27’ (Curran Associates, Inc., London, UK, 2014), pp. 19881996.
    36. 36)
      • 10. Xie, S., Yang, T., Wang, X., et al: ‘Hyper-class augmented and regularized deep learning for fine-grained image classification’. 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, June 2015, pp. 26452654.
    37. 37)
      • 25. Weinberger, K.Q., Saul, L.K.: ‘Distance metric learning for large margin nearest neighbor classification’, J. Mach. Learn. Res., 2009, 10, pp. 207244.
    38. 38)
      • 29. Schroff, F., Kalenichenko, D., Philbin, J.: ‘Facenet: a unified embedding for face recognition and clustering’. The IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, June 2015.
    39. 39)
      • 43. Kodirov, E., Xiang, T., Gong, S.: ‘Semantic autoencoder for zero-shot learning’, arXiv preprint arXiv:1704.08345, 2017.
    40. 40)
      • 39. Martinel, N., Dunnhofer, M., Foresti, G. L., et al: ‘Person reidentification via unsupervised transfer of learned visual representations’. Proc. of the 11th Int. Conf. on Distributed Smart Cameras, Stanford, CA, USA, 2017, pp. 151156.
    41. 41)
      • 12. Nowak, E., Jurie, F., Triggs, B.: ‘Sampling strategies for bag-of-features image classification’. European Conf. on Computer Vision, Springer, 2006, pp. 490503.
    42. 42)
      • 20. Lai, Z., Xu, Y., Yang, J., et al: ‘Rotational invariant dimensionality reduction algorithms’, IEEE Trans. Cybern., 2017, 47, (11), pp. 37333746.
    43. 43)
      • 47. Song, H.O., Xiang, Y., Jegelka, S., et al: ‘Deep metric learning via lifted structured feature embedding’, Comput. Vis. Pattern Recognit., 2016.
    44. 44)
      • 19. Pearson, K.: ‘LIII. on lines and planes of closest fit to systems of points in space’, London, Edinburgh, and Dublin Philos. Mag. J. Sci., 1901, 2, (11), pp. 559572.
    45. 45)
      • 13. Zhou, F., Lin, Y.: ‘Fine-grained image classification by exploring bipartite-graph labels’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 11241133.
    46. 46)
      • 1. Russakovsky, O., Deng, J., Su, H., et al: ‘Imagenet large scale visual recognition challenge’, Int. J. Comput. Vis., 2015, 115, (3), pp. 211252.
    47. 47)
      • 54. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., et al: ‘Object detection with discriminatively trained part-based models’, IEEE Trans. Pattern Anal. Mach. Intell., 2010, 32, (9), pp. 16271645.
    48. 48)
      • 27. Lai, Z., Li, Y., Wan, M., et al: ‘Local sparse representation projections for face recognition’, Neural Comput. Appl., 2013, 23, (7–8), pp. 22312239.
    49. 49)
      • 35. Xing, E.P., Ng, A.Y., Jordan, M.I., et al: ‘Distance metric learning, with application to clustering with side-information’, in ‘Advances in neural information processing systems 15’ (MIT Press, Michigan, USA, 2003), pp. 505512.
    50. 50)
      • 24. Jain, P., Kulis, B., Davis, J.V., et al: ‘Metric and kernel learning using a linear transformation’, J. Mach. Learn. Res., 2012, 13, pp. 519547.
    51. 51)
      • 37. Chopra, S., Hadsell, R., LeCun, Y.: ‘Learning a similarity metric discriminatively, with application to face verification’. Proc. of the 2005 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR'05) – Volume 1 – Volume 01, CVPR'05, Washington, DC, USA, 2005, pp. 539546.
    52. 52)
      • 4. Wah, C., Branson, S., Welinder, P., et al: ‘The Caltech-UCSD birds-200-2011 dataset’. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
    53. 53)
      • 22. Fisher, R. A.: ‘The use of multiple measurements in taxonomic problems’, Ann. Hum. Genet., 1936, 7, (2), pp. 179188.
    54. 54)
      • 2. Maji, S., Rahtu, E., Kannala, J., et al: ‘Fine-grained visual classification of aircraft’,, accessed 9th January 2018.
    55. 55)
      • 51. Krause, J., Gebru, T., Deng, J., et al: ‘Learning features and parts for fine-grained recognition’. 2014 22nd Int. Conf. on Pattern Recognition, Stockholm, Sweden, August 2014, pp. 2633.
    56. 56)
      • 21. Lai, Z., Wong, W.K., Xu, Y., et al: ‘Approximate orthogonal sparse embedding for dimensionality reduction’, IEEE Trans. Neural Netw. Learn. Syst., 2016, 27, (4), pp. 723735.

Related content

This is a required field
Please enter a valid email address