access icon openaccess Learning DALTS for cross-modal retrieval

Cross-modal retrieval has been recently proposed to find an appropriate subspace, where the similarity across different modalities such as image and text can be directly measured. In this study, different from most existing works, the authors propose a novel model for cross-modal retrieval based on a domain-adaptive limited text space (DALTS) rather than a common space or an image space. Experimental results on three widely used datasets, Flickr8K, Flickr30K and Microsoft Common Objects in Context (MSCOCO), show that the proposed method, dubbed DALTS, is able to learn superior text space features which can effectively capture the necessary information for cross-modal retrieval. Meanwhile, DALTS achieves promising improvements in accuracy for cross-modal retrieval compared with the current state-of-the-art methods.

Inspec keywords: image segmentation; information retrieval; image retrieval; recurrent neural nets; natural language processing; text analysis

Other keywords: image space; domain-adaptive limited text space; Flickr30K; text space features; Flickr8K; DALTS; MSCOCO; cross-modal retrieval

Subjects: Computer vision and image processing techniques; Optical, image and video signal processing; Natural language interfaces; Information retrieval techniques; Document processing and analysis techniques; Neural computing techniques

http://iet.metastore.ingenta.com/content/journals/10.1049/trit.2018.1051
Loading

Related content

content/journals/10.1049/trit.2018.1051
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading