Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.11097
Cited By
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP
21 June 2021
Han Fang
Pengfei Xiong
Luhui Xu
Yu Chen
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CLIP2Video: Mastering Video-Text Retrieval via Image CLIP"
50 / 189 papers shown
Title
Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval
Shunsuke Tsubaki
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
Keisuke Imoto
26
1
0
16 Mar 2024
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
Wenhao Wang
Yi Yang
VGen
DiffM
33
32
0
10 Mar 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
43
29
0
20 Feb 2024
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval
Xingning Dong
Zipeng Feng
Chunluan Zhou
Xuzheng Yu
Ming Yang
Qingpei Guo
VLM
39
2
0
31 Jan 2024
SNP-S3: Shared Network Pre-training and Significant Semantic Strengthening for Various Video-Text Tasks
Xingning Dong
Qingpei Guo
Tian Gan
Qing Wang
Jianlong Wu
Xiangyuan Ren
Yuan Cheng
Wei Chu
34
5
0
31 Jan 2024
Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation
Yuanhuiyi Lyu
Xueye Zheng
Lin Wang
DiffM
41
11
0
31 Jan 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
52
179
0
24 Jan 2024
Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)
Shih-Han Chou
Matthew Kowal
Yasmin Niknam
Diana Moyano
Shayaan Mehdi
...
Cheng Zhang
Ian Knopke
S. Kocak
Leonid Sigal
Yalda Mohsenzadeh
38
1
0
23 Jan 2024
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval
Xiangpeng Yang
Linchao Zhu
Xiaohan Wang
Yi Yang
VLM
34
23
0
19 Jan 2024
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
Xu Yan
Haiming Zhang
Yingjie Cai
Jingming Guo
Weichao Qiu
...
Lihui Jiang
Wei Zhang
Hongbo Zhang
Dengxin Dai
Bingbing Liu
56
17
0
16 Jan 2024
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
S. DarshanSingh
Zeeshan Khan
Makarand Tapaswi
VLM
CLIP
36
3
0
15 Jan 2024
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
Zhi-Song Liu
Robin Courant
Vicky Kalogeiton
42
6
0
08 Jan 2024
Detours for Navigating Instructional Videos
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
29
6
0
03 Jan 2024
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
Kaibin Tian
Yanhua Cheng
Yi Liu
Xinglin Hou
Quan Chen
Han Li
27
3
0
01 Jan 2024
COMMA: Co-Articulated Multi-Modal Learning
Lianyu Hu
Liqing Gao
Zekang Liu
Chi-Man Pun
Wei Feng
VLM
20
0
0
30 Dec 2023
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis
Zhaoyan Liu
S. Gorti
Valentin Villecroze
Jesse C. Cresswell
Guangwei Yu
G. Loaiza-Ganem
M. Volkovs
51
3
0
15 Dec 2023
WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling Vision-Language Models Through Open-Vocabulary Knowledge
Huy Le
Tung Kieu
Anh Nguyen
Ngan Le
VGen
29
1
0
15 Dec 2023
ViLA: Efficient Video-Language Alignment for Video Question Answering
Xijun Wang
Junbang Liang
Chun-Kai Wang
Kenan Deng
Yu Lou
Ming-Chyuan Lin
Shan Yang
34
13
0
13 Dec 2023
Prompt2NeRF-PIL: Fast NeRF Generation via Pretrained Implicit Latent
Jianmeng Liu
Yuyao Zhang
Zeyuan Meng
Yu-Wing Tai
Chi-Keung Tang
VLM
DiffM
AI4CE
30
1
0
05 Dec 2023
Explaining CLIP's performance disparities on data from blind/low vision users
Daniela Massiceti
Camilla Longden
Agnieszka Slowik
Samuel Wills
Martin Grayson
C. Morrison
VLM
26
9
0
29 Nov 2023
ViT-Lens: Towards Omni-modal Representations
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
21
18
0
27 Nov 2023
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Huanjin Yao
Wenhao Wu
Zhiheng Li
VLM
95
9
0
27 Nov 2023
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding
Ruyang Liu
Jingjia Huang
Wei-Nan Gao
Thomas H. Li
Ge Li
VLM
31
3
0
25 Nov 2023
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal
Yonatan Bitton
Idan Szpektor
Kai-Wei Chang
Aditya Grover
40
14
0
15 Nov 2023
VGSG: Vision-Guided Semantic-Group Network for Text-based Person Search
Shuting He
Hao Luo
Wei Jiang
Xudong Jiang
Henghui Ding
24
39
0
13 Nov 2023
Lost Your Style? Navigating with Semantic-Level Approach for Text-to-Outfit Retrieval
Junkyu Jang
Eugene Hwang
Sung-Hyuk Park
28
0
0
03 Nov 2023
Harvest Video Foundation Models via Efficient Post-Pretraining
Yizhuo Li
Kunchang Li
Yinan He
Yi Wang
Yali Wang
Limin Wang
Yu Qiao
Ping Luo
CLIP
VLM
VGen
54
2
0
30 Oct 2023
Videoprompter: an ensemble of foundational models for zero-shot video understanding
Adeel Yousaf
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
Mubarak Shah
VLM
38
2
0
23 Oct 2023
Encoding and Decoding Narratives: Datafication and Alternative Access Models for Audiovisual Archives
Yuchen Yang
35
1
0
10 Oct 2023
Write What You Want: Applying Text-to-video Retrieval to Audiovisual Archives
Yuchen Yang
VGen
19
7
0
09 Oct 2023
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval
Hao Li
Marie-Jeanne Lesot
Lianli Gao
Xiaosu Zhu
Christophe Marsala
EDL
16
11
0
29 Sep 2023
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Ruyang Liu
Chen Li
Yixiao Ge
Ying Shan
Thomas H. Li
Ge Li
25
28
0
27 Sep 2023
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
Chen Jiang
Hong Liu
Xuzheng Yu
Qing Wang
Yuan Cheng
...
Zhongyi Liu
Qingpei Guo
Wei Chu
Ming Yang
Yuan Qi
29
10
0
20 Sep 2023
Unified Coarse-to-Fine Alignment for Video-Text Retrieval
Ziyang Wang
Yi-Lin Sung
Feng Cheng
Gedas Bertasius
Joey Tianyi Zhou
101
44
0
18 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
33
3
0
16 Sep 2023
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following
Ziyu Guo
Renrui Zhang
Xiangyang Zhu
Yiwen Tang
Xianzheng Ma
...
Ke Chen
Peng Gao
Xianzhi Li
Hongsheng Li
Pheng-Ann Heng
MLLM
35
125
0
01 Sep 2023
CoVR: Learning Composed Video Retrieval from Web Video Captions
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
22
21
0
28 Aug 2023
Multi-event Video-Text Retrieval
Gengyuan Zhang
Jisen Ren
Jindong Gu
Volker Tresp
19
13
0
22 Aug 2023
Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features
Alberto Baldrati
Marco Bertini
Tiberio Uricchio
A. Bimbo
CLIP
CoGe
13
29
0
22 Aug 2023
ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights
Weixian Lei
Yixiao Ge
Jianfeng Zhang
Dylan Sun
Kun Yi
Ying Shan
Mike Zheng Shou
33
1
0
20 Aug 2023
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
Chaorui Deng
Qi Chen
Pengda Qin
Dave Zhenyu Chen
Qi Wu
VLM
CLIP
46
29
0
15 Aug 2023
TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval
Kaibin Tian
Rui Zhao
Hu Hu
Runquan Xie
Fengzong Lian
Zhanhui Kang
Xirong Li
CLIP
27
0
0
02 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
38
118
0
25 Jul 2023
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Kumar Ashutosh
Santhosh Kumar Ramakrishnan
Triantafyllos Afouras
Kristen Grauman
26
24
0
17 Jul 2023
Fine-grained Text-Video Retrieval with Frozen Image Encoders
Zuozhuo Dai
Fang Shao
Qingkun Su
Zilong Dong
Siyu Zhu
167
1
0
14 Jul 2023
MultiVENT: Multilingual Videos of Events with Aligned Natural Text
Kate Sanders
David Etter
Reno Kriz
Benjamin Van Durme
VGen
42
7
0
06 Jul 2023
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models
Avinash Madasu
Vasudev Lal
CoGe
42
3
0
28 Jun 2023
Neural Priming for Sample-Efficient Adaptation
Matthew Wallingford
Vivek Ramanujan
Alex Fang
Aditya Kusupati
Roozbeh Mottaghi
Aniruddha Kembhavi
Ludwig Schmidt
Ali Farhadi
VLM
108
13
0
16 Jun 2023
Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos
Md Zahid Hasan
Jiajing Chen
Jiyang Wang
Mohammed Shaiqur Rahman
Ameya Joshi
Senem Velipasalar
C. Hegde
Anuj Sharma
S. Sarkar
VLM
46
18
0
16 Jun 2023
Text Promptable Surgical Instrument Segmentation with Vision-Language Models
Zijian Zhou
Oluwatosin O. Alabi
Meng Wei
Tom Kamiel Magda Vercauteren
Miaojing Shi
MedIm
30
23
0
15 Jun 2023
Previous
1
2
3
4
Next