Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.14746
Cited By
CoVR: Learning Composed Video Retrieval from Web Video Captions
28 August 2023
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CoVR: Learning Composed Video Retrieval from Web Video Captions"
13 / 13 papers shown
Title
From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval
Yabing Wang
Zhuotao Tian
Qingpei Guo
Zheng Qin
Sanping Zhou
Ming Yang
Le Wang
61
0
0
25 Apr 2025
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
Guanqi Zhan
Yuanpei Liu
Kai Han
Weidi Xie
Andrew Zisserman
VLM
99
0
0
21 Feb 2025
Triplet Synthesis For Enhancing Composed Image Retrieval via Counterfactual Image Generation
Kenta Uesugi
Naoki Saito
Keisuke Maeda
Takahiro Ogawa
Miki Haseyama
33
0
0
22 Jan 2025
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval
Junyang Chen
Hanjiang Lai
VLM
34
15
0
13 Nov 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
259
4,223
0
30 Jan 2023
A CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
CLIP
115
61
0
17 May 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
388
4,110
0
28 Jan 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
245
557
0
28 Sep 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
240
577
0
22 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
309
778
0
18 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
250
926
0
24 Sep 2019
Dialog-based Interactive Image Retrieval
Xiaoxiao Guo
Hui Wu
Yu Cheng
Steven J. Rennie
Gerald Tesauro
Rogerio Feris
48
204
0
01 May 2018
1