Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon openaccess Multi-level image representation for large-scale image-based instance retrieval

In recent years, instance-level-image retrieval has attracted massive attention. Several researchers proposed that the representations learned by convolutional neural network (CNN) can be used for image retrieval task. In this study, the authors propose an effective feature encoder to extract robust information from CNN. It consists of two main steps: the embedding step and the aggregation step. Moreover, they apply the multi-task loss function to train their model in order to make the training process more effective. Finally, this study proposes a novel representation policy that encodes feature vectors extracted from different layers to capture both local patterns and semantic concepts from deep CNN. They call this ‘multi-level-image representation’, which could further improve the performance. The proposed model is helpful to improve the retrieval performance. For the sake of comprehensively evaluating the performance of their approach, they conducted ablation experiments with various convolutional NN architectures. Furthermore, they apply their approach to a concrete challenge – Alibaba large-scale search challenge. The results show that their model is effective and competitive.

References

    1. 1)
      • [30]. Zhao, S., Xu, Y., Han, Y.: ‘Large-scale E-commerce image retrieval with top-weighted convolutional neural networks’. ACM on Int. Conf. Multimedia Retrieval, New York, USA, 2016, pp. 285288.
    2. 2)
      • [13]. Gordoa, A., et al: ‘Leveraging category-level labels for instance-level image retrieval’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Rhode Island, USA, 2012, pp. 30453052.
    3. 3)
      • [6]. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘ImageNet classification with deep convolutional neural networks’. Proc. Advances in Neural Information Processing Systems, Lake Tahoe, Nevada, USA, December 2012, pp. 10971105.
    4. 4)
      • [8]. Schroff, F., Kalenichenko, D., Philbin, J.: ‘FaceNet: a unified embedding for face recognition and clustering’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 815823.
    5. 5)
      • [7]. Girshick, R., Donahue, J., Darrell, T., et al: ‘Rich feature hierarchies for accurate object detection and semantic segmentation’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, Ohio, USA, 2014, pp. 580587.
    6. 6)
      • [17]. Perronnin, F., Liu, Y., Sanchez, J., et al: ‘Large-scale image retrieval with compressed Fisher vectors’. IEEE Conf. Computer Vision and Pattern Recognition, San Francisco, USA, 2010, pp. 33843391.
    7. 7)
      • [22]. Lin, K., Yang, H.F., Hsiao, J.H., et al: ‘Deep learning of binary hash codes for fast image retrieval’. Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 2015, pp. 2735.
    8. 8)
      • [14]. Zhang, S., et al: ‘Semantic-aware co-indexing for image retrieval’. IEEE Int. Conf. Computer Vision (ICCV), Sydney, Australia, 2013, pp. 16731680.
    9. 9)
      • [1]. Settles, B., Craven, M., Ray, S.: ‘Multiple-instance active learning’. Proc. Advances in Neural Information Processing Systems, Vancouver, Canada, 2007, pp. 12891296.
    10. 10)
      • [4]. Bay, H., Tuytelaars, T., Van Gool, L.: ‘Surf: speeded up robust features’. Proc. European Conf. on Computer Vision, Graz, Austria, May 2006, pp. 404417.
    11. 11)
      • [2]. Gordo, A., et al: ‘Deep image retrieval: learning global representations for image search’. Proc. European Conf. on Computer Vision, Amsterdam, The Netherlands, October 2016, pp. 241257.
    12. 12)
      • [19]. Deng, Q.-L., Xu, Y., Wang, J.-H., et al: ‘Deep learning for gender recognition’. Int. Conf. Computers, Communications and Systems (ICCCS 2015), Kanyakumari, India, September 2015, pp. 206209.
    13. 13)
      • [20]. Wan, J., Wang, D., Hoi, S.C.H., et al: ‘Deep learning for content-based image retrieval: a comprehensive study’. Proc. ACM Int. Conf. Multimedia, Orlando, Florida, USA, 2014, pp. 157166.
    14. 14)
      • [27]. Szegedy, C., Liu, W., Jia, Y., et al: ‘Going deeper with convolutions’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 19.
    15. 15)
      • [25]. Azizpour, H., Sharif Razavian, A., Sullivan, J., et al: ‘From generic to specific deep representations for visual recognition’, arXiv preprint arXiv:1406.5774, 2014.
    16. 16)
      • [12]. Mousavian, A., Kosecka, J.: ‘Deep convolutional features for image based retrieval and scene categorization’, arXiv preprint arXiv:1509.06033, 2015.
    17. 17)
      • [29]. ALISC: https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.100066.333.11.da8UHF&raceId=231510&lang=enUS, 2015.
    18. 18)
    19. 19)
    20. 20)
      • [15]. Sivic, J., Zisserman, A.: ‘Video Google: a text retrieval approach to object matching in videos’. Proc. IEEE Int. Conf. Computer Vision, Nice, France, 2003, pp. 14701477.
    21. 21)
    22. 22)
      • [10]. Huang, J., Feris, R.S., Chen, Q., et al: ‘Cross-domain image retrieval with a dual attribute-aware ranking network’. Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 10621070.
    23. 23)
    24. 24)
      • [23]. Wang, J., Song, Y., Leung, T., et al: ‘Learning fine-grained image similarity with deep ranking’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, Ohio, 2014, pp. 13861393.
    25. 25)
      • [9]. Wang, X., Gupta, A.: ‘Unsupervised learning of visual representations using videos’. Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 27942802.
    26. 26)
      • [28]. Jia, Y., Shelhamer, E., Donahue, J., et al: ‘Caffe: convolutional architecture for fast feature embedding’, Eprint Arxiv, 2014, pp. 675678.
    27. 27)
      • [24]. Sun, Y., Chen, Y., Wang, X., et al: ‘Deep learning face representation by joint identification-verification’. Advances in Neural Information Processing Systems, Montreal, Canada, 2015, pp. 19881996.
    28. 28)
    29. 29)
      • [11]. Ng, J., Yang, F., Davis, L.: ‘Exploiting local features from deep networks for image retrieval’. Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 2015, pp. 5361.
    30. 30)
      • [26]. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’, arXiv preprint arXiv:1409.1556, 2014.
http://iet.metastore.ingenta.com/content/journals/10.1049/trit.2018.0003
Loading

Related content

content/journals/10.1049/trit.2018.0003
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address