Multi-level image representation for large-scale image-based instance retrieval

Qili Deng; Shuai Wu; Jie Wen; Yong Xu

Multi-level image representation for large-scale image-based instance retrieval

View Fulltext

Author(s): Qili Deng¹ ; Shuai Wu¹ ; Jie Wen¹ ; Yong Xu¹
- Affiliations: 1: Bio-computing Research Center , Shenzhen Graduate School, Harbin Institute of Technology , Shenzhen 518055 , People's Republic of China
Source: Volume 3, Issue 1, March 2018, p. 33 – 39
DOI: 10.1049/trit.2018.0003 , Online ISSN 2468-2322

This is an open access article published by the IET, Chinese Association for Artificial Intelligence and Chongqing University of Technology under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/)

Received 25/12/2017, Accepted 15/02/2018, Revised 03/02/2018, Published 05/03/2018

In recent years, instance-level-image retrieval has attracted massive attention. Several researchers proposed that the representations learned by convolutional neural network (CNN) can be used for image retrieval task. In this study, the authors propose an effective feature encoder to extract robust information from CNN. It consists of two main steps: the embedding step and the aggregation step. Moreover, they apply the multi-task loss function to train their model in order to make the training process more effective. Finally, this study proposes a novel representation policy that encodes feature vectors extracted from different layers to capture both local patterns and semantic concepts from deep CNN. They call this ‘multi-level-image representation’, which could further improve the performance. The proposed model is helpful to improve the retrieval performance. For the sake of comprehensively evaluating the performance of their approach, they conducted ablation experiments with various convolutional NN architectures. Furthermore, they apply their approach to a concrete challenge – Alibaba large-scale search challenge. The results show that their model is effective and competitive.

References

1. 1)
  - [30]. Zhao, S., Xu, Y., Han, Y.: ‘Large-scale E-commerce image retrieval with top-weighted convolutional neural networks’. ACM on Int. Conf. Multimedia Retrieval, New York, USA, 2016, pp. 285–288.
2. 2)
  - [13]. Gordoa, A., et al: ‘Leveraging category-level labels for instance-level image retrieval’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Rhode Island, USA, 2012, pp. 3045–3052.
3. 3)
  - [6]. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘ImageNet classification with deep convolutional neural networks’. Proc. Advances in Neural Information Processing Systems, Lake Tahoe, Nevada, USA, December 2012, pp. 1097–1105.
4. 4)
  - [8]. Schroff, F., Kalenichenko, D., Philbin, J.: ‘FaceNet: a unified embedding for face recognition and clustering’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 815–823.
5. 5)
  - [7]. Girshick, R., Donahue, J., Darrell, T., et al: ‘Rich feature hierarchies for accurate object detection and semantic segmentation’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, Ohio, USA, 2014, pp. 580–587.
6. 6)
  - [17]. Perronnin, F., Liu, Y., Sanchez, J., et al: ‘Large-scale image retrieval with compressed Fisher vectors’. IEEE Conf. Computer Vision and Pattern Recognition, San Francisco, USA, 2010, pp. 3384–3391.
7. 7)
  - [22]. Lin, K., Yang, H.F., Hsiao, J.H., et al: ‘Deep learning of binary hash codes for fast image retrieval’. Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 2015, pp. 27–35.
8. 8)
  - [14]. Zhang, S., et al: ‘Semantic-aware co-indexing for image retrieval’. IEEE Int. Conf. Computer Vision (ICCV), Sydney, Australia, 2013, pp. 1673–1680.
9. 9)
  - [1]. Settles, B., Craven, M., Ray, S.: ‘Multiple-instance active learning’. Proc. Advances in Neural Information Processing Systems, Vancouver, Canada, 2007, pp. 1289–1296.
10. 10)
  - [4]. Bay, H., Tuytelaars, T., Van Gool, L.: ‘Surf: speeded up robust features’. Proc. European Conf. on Computer Vision, Graz, Austria, May 2006, pp. 404–417.
11. 11)
  - [2]. Gordo, A., et al: ‘Deep image retrieval: learning global representations for image search’. Proc. European Conf. on Computer Vision, Amsterdam, The Netherlands, October 2016, pp. 241–257.
12. 12)
  - [19]. Deng, Q.-L., Xu, Y., Wang, J.-H., et al: ‘Deep learning for gender recognition’. Int. Conf. Computers, Communications and Systems (ICCCS 2015), Kanyakumari, India, September 2015, pp. 206–209.
13. 13)
  - [20]. Wan, J., Wang, D., Hoi, S.C.H., et al: ‘Deep learning for content-based image retrieval: a comprehensive study’. Proc. ACM Int. Conf. Multimedia, Orlando, Florida, USA, 2014, pp. 157–166.
14. 14)
  - [27]. Szegedy, C., Liu, W., Jia, Y., et al: ‘Going deeper with convolutions’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 1–9.
15. 15)
  - [25]. Azizpour, H., Sharif Razavian, A., Sullivan, J., et al: ‘From generic to specific deep representations for visual recognition’, arXiv preprint arXiv:1406.5774, 2014.
16. 16)
  - [12]. Mousavian, A., Kosecka, J.: ‘Deep convolutional features for image based retrieval and scene categorization’, arXiv preprint arXiv:1509.06033, 2015.
17. 17)
  - [29]. ALISC: https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.100066.333.11.da8UHF&raceId=231510&lang=enUS, 2015.
18. 18)
  - [18]. Guo, K., Wu, S., Xu, Y.: ‘Face recognition using both visible light image and near-infrared image and a deep network’, CAAI Trans. Intell. Technol., 2017, 2, (1), pp. 37–47 (doi: 10.1016/j.trit.2017.03.001).
19. 19)
  - 20. Lowe, D.G.: ‘Distinctive image features from scale-invariant keypoints’, Int. J. Comput. Vis., 2004, 60, pp. 91–110 (doi: 10.1023/B:VISI.0000029664.99615.94).
20. 20)
  - [15]. Sivic, J., Zisserman, A.: ‘Video Google: a text retrieval approach to object matching in videos’. Proc. IEEE Int. Conf. Computer Vision, Nice, France, 2003, pp. 1470–1477.
21. 21)
  - 39. Jégou, H., Perronnin, F., Douze, M., et al: ‘Aggregating local image descriptors into compact codes’, IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, (9), pp. 1704–1716 (doi: 10.1109/TPAMI.2011.235).
22. 22)
  - [10]. Huang, J., Feris, R.S., Chen, Q., et al: ‘Cross-domain image retrieval with a dual attribute-aware ranking network’. Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1062–1070.
23. 23)
  - [21]. Babenko, A., Slesarev, A., Chigorin, A., et al: ‘Neural codes for image retrieval’, Lect. Notes Comput. Sci., 2014, 8689, pp. 584–599 (doi: 10.1007/978-3-319-10590-1_38).
24. 24)
  - [23]. Wang, J., Song, Y., Leung, T., et al: ‘Learning fine-grained image similarity with deep ranking’. Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, Ohio, 2014, pp. 1386–1393.
25. 25)
  - [9]. Wang, X., Gupta, A.: ‘Unsupervised learning of visual representations using videos’. Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 2794–2802.
26. 26)
  - [28]. Jia, Y., Shelhamer, E., Donahue, J., et al: ‘Caffe: convolutional architecture for fast feature embedding’, Eprint Arxiv, 2014, pp. 675–678.
27. 27)
  - [24]. Sun, Y., Chen, Y., Wang, X., et al: ‘Deep learning face representation by joint identification-verification’. Advances in Neural Information Processing Systems, Montreal, Canada, 2015, pp. 1988–1996.
28. 28)
  - [5]. Qiu, G.: ‘Indexing chromatic and achromatic patterns for content-based colour image retrieval’, Pattern Recognit., 2002, 35, (8), pp. 1675–1686 (doi: 10.1016/S0031-3203(01)00162-5).
29. 29)
  - [11]. Ng, J., Yang, F., Davis, L.: ‘Exploiting local features from deep networks for image retrieval’. Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 2015, pp. 53–61.
30. 30)
  - [26]. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’, arXiv preprint arXiv:1409.1556, 2014.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Multi-level image representation for large-scale image-based instance retrieval

References

Related content