Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with
Transformers

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

30 March 2021

Jean-Baptiste Alayrac

Andrew Zisserman

Papers citing "Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers"

12 / 12 papers shown

Title
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval Guanqi Zhan Yuanpei Liu Kai Han Weidi Xie Andrew Zisserman VLM 60 0 0 21 Feb 2025
Are Diffusion Models Vision-And-Language Reasoners? Benno Krojer Elinor Poole-Dayan Vikram S. Voleti Christopher Pal Siva Reddy 16 12 0 25 May 2023
Improving Cross-Modal Retrieval with Set of Diverse Embeddings Dongwon Kim Nam-Won Kim Suha Kwak 8 37 0 30 Nov 2022
Cross-Modal Adapter for Text-Video Retrieval Haojun Jiang Jianke Zhang Rui Huang Chunjiang Ge Zanlin Ni Jiwen Lu Jie Zhou S. Song Gao Huang 38 35 0 17 Nov 2022
Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval Abhra Chaudhuri Massimiliano Mancini Yanbei Chen Zeynep Akata Anjan Dutta 8 5 0 19 Oct 2022
Video Question Answering with Iterative Video-Text Co-Tokenization A. Piergiovanni K. Morton Weicheng Kuo Michael S. Ryoo A. Angelova 10 17 0 01 Aug 2022
Multimodal Learning with Transformers: A Survey P. Xu Xiatian Zhu David A. Clifton ViT 41 518 0 13 Jun 2022
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval S. Gorti Noël Vouitsis Junwei Ma Keyvan Golestan M. Volkovs Animesh Garg Guangwei Yu 14 148 0 28 Mar 2022
Video Transformers: A Survey Javier Selva A. S. Johansen Sergio Escalera Kamal Nasrollahi T. Moeslund Albert Clapés ViT 20 101 0 16 Jan 2022
Unified Vision-Language Pre-Training for Image Captioning and VQA Luowei Zhou Hamid Palangi Lei Zhang Houdong Hu Jason J. Corso Jianfeng Gao MLLM VLM 250 922 0 24 Sep 2019
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation Vijay Badrinarayanan Alex Kendall R. Cipolla SSeg 420 15,438 0 02 Nov 2015
A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics Yunchao Gong Qifa Ke Michael Isard Svetlana Lazebnik 3DV 58 583 0 18 Dec 2012