Learning DALTS for cross-modal retrieval

View Fulltext

Author(s): Zheng Yu¹ and Wenmin Wang¹
- Affiliations: 1: School of Electronic and Computer Engineering, Shenzhen Graduate School , Peking University , Shenzhen , People's Republic of China
Source: Volume 4, Issue 1, March 2019, p. 9 – 16
DOI: 10.1049/trit.2018.1051 , Online ISSN 2468-2322

Cross-modal retrieval has been recently proposed to find an appropriate subspace, where the similarity across different modalities such as image and text can be directly measured. In this study, different from most existing works, the authors propose a novel model for cross-modal retrieval based on a domain-adaptive limited text space (DALTS) rather than a common space or an image space. Experimental results on three widely used datasets, Flickr8K, Flickr30K and Microsoft Common Objects in Context (MSCOCO), show that the proposed method, dubbed DALTS, is able to learn superior text space features which can effectively capture the necessary information for cross-modal retrieval. Meanwhile, DALTS achieves promising improvements in accuracy for cross-modal retrieval compared with the current state-of-the-art methods.

Learning DALTS for cross-modal retrieval

Related content