Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.09623
Cited By
Cross-Modal Adapter for Text-Video Retrieval
17 November 2022
Haojun Jiang
Jianke Zhang
Rui Huang
Chunjiang Ge
Zanlin Ni
Jiwen Lu
Jie Zhou
S. Song
Gao Huang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Cross-Modal Adapter for Text-Video Retrieval"
38 / 38 papers shown
Title
UP-Person: Unified Parameter-Efficient Transfer Learning for Text-based Person Retrieval
Yating Liu
Yaowei Li
Xiangyuan Lan
Wenming Yang
Zimo Liu
Q. Liao
19
0
0
14 Apr 2025
A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval
Weihang Zhang
Jihao Li
Shuoke Li
Ziqing Niu
Jialiang Chen
Wenkai Zhang
VLM
33
0
0
18 Jan 2025
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension
Ting Liu
Zunnan Xu
Yue Hu
Liangtao Shi
Zhiqiang Wang
Quanjun Yin
54
2
0
03 Jan 2025
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Claudia Cuttano
Gabriele Trivigno
Gabriele Rosi
Carlo Masone
Giuseppe Averta
VOS
86
1
0
26 Nov 2024
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities
Ehsan Faghihi
Mohammedreza Zarenejad
Ali-Asghar Beheshti Shirazi
21
0
0
04 Nov 2024
Beyond Coarse-Grained Matching in Video-Text Retrieval
Aozhu Chen
Hazel Doughty
Xirong Li
Cees G. M. Snoek
16
0
0
16 Oct 2024
Deep Transfer Learning: Model Framework and Error Analysis
Yuling Jiao
Huazhen Lin
Yuchen Luo
Jerry Zhijian Yang
26
1
0
12 Oct 2024
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities
Bilal Faye
Hanane Azzag
M. Lebbah
ObjD
19
0
0
17 Sep 2024
Selective Vision-Language Subspace Projection for Few-shot CLIP
Xingyu Zhu
Beier Zhu
Yi Tan
Shuo Wang
Yanbin Hao
H. Zhang
VLM
27
2
0
24 Jul 2024
Structure-aware World Model for Probe Guidance via Large-scale Self-supervised Pre-train
Haojun Jiang
Meng Li
Zhenguo Sun
Ning Jia
Yu Sun
Shaqi Luo
Shiji Song
Gao Huang
33
2
0
28 Jun 2024
Cardiac Copilot: Automatic Probe Guidance for Echocardiography with World Model
Haojun Jiang
Zhenguo Sun
Ning Jia
Meng Li
Yu Sun
Shaqi Luo
Shiji Song
Gao Huang
18
5
0
19 Jun 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Thong Nguyen
Yi Bin
Junbin Xiao
Leigang Qu
Yicong Li
Jay Zhangjie Wu
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
26
9
1
09 Jun 2024
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
Meng Cao
Haoran Tang
Jinfa Huang
Peng Jin
Can Zhang
Ruyang Liu
Long Chen
Xiaodan Liang
Li-ming Yuan
Ge Li
79
11
0
29 May 2024
CLIP model is an Efficient Online Lifelong Learner
Leyuan Wang
Liuyu Xiang
Yujie Wei
Yunlong Wang
Zhaofeng He
VLM
CLL
14
2
0
24 May 2024
DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding
Ting Liu
Xuyang Liu
Siteng Huang
Honggang Chen
Quanjun Yin
Long Qin
Donglin Wang
Yue Hu
22
5
0
10 May 2024
Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment
Tengjun Huang
18
0
0
28 Apr 2024
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval
Xiangpeng Yang
Linchao Zhu
Xiaohan Wang
Yi Yang
VLM
15
4
0
19 Jan 2024
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
S. DarshanSingh
Zeeshan Khan
Makarand Tapaswi
VLM
CLIP
15
2
0
15 Jan 2024
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Fan Liu
Tianshu Zhang
Wenwen Dai
Wenwen Cai
Wenwen Cai Xiaocong Zhou
Delong Chen
VLM
OffRL
10
19
0
03 Jan 2024
READ-PVLA: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling
Thong Nguyen
Xiaobao Wu
Xinshuai Dong
Khoi M. Le
Zhiyuan Hu
Cong-Duy Nguyen
See-Kiong Ng
Anh Tuan Luu
19
2
0
12 Dec 2023
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
Tanveer Hannan
Md. Mohaiminul Islam
Thomas Seidl
Gedas Bertasius
9
3
0
11 Dec 2023
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Huanjin Yao
Wenhao Wu
Zhiheng Li
VLM
79
9
0
27 Nov 2023
Unified Coarse-to-Fine Alignment for Video-Text Retrieval
Ziyang Wang
Yi-Lin Sung
Feng Cheng
Gedas Bertasius
Mohit Bansal
75
41
0
18 Sep 2023
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval
Yuan. Yuan
Yangfan Zhan
Zhitong Xiong
VLM
15
38
0
24 Aug 2023
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
Chaorui Deng
Qi Chen
Pengda Qin
Dave Zhenyu Chen
Qi Wu
VLM
CLIP
22
11
0
15 Aug 2023
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter
Binjie Zhang
Yixiao Ge
Xuyuan Xu
Ying Shan
Mike Zheng Shou
37
7
0
22 Jun 2023
Visual Tuning
Bruce X. B. Yu
Jianlong Chang
Haixin Wang
Lin Liu
Shijie Wang
...
Lingxi Xie
Haojie Li
Zhouchen Lin
Qi Tian
Chang Wen Chen
VLM
31
37
0
10 May 2023
Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning
Siteng Huang
Biao Gong
Yutong Feng
Min Zhang
Yiliang Lv
Donglin Wang
CoGe
8
9
0
27 Mar 2023
MaPLe: Multi-modal Prompt Learning
Muhammad Uzair Khattak
H. Rasheed
Muhammad Maaz
Salman Khan
F. Khan
VPVLM
VLM
178
521
0
06 Oct 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
CLIP
110
60
0
17 May 2022
Glance and Focus Networks for Dynamic Visual Recognition
Gao Huang
Yulin Wang
Kangchen Lv
Haojun Jiang
Wenhui Huang
Pengfei Qi
S. Song
3DH
43
49
0
09 Jan 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
239
554
0
28 Sep 2021
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
319
2,108
0
02 Sep 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
298
771
0
18 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
275
3,784
0
18 Apr 2021
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
396
532
0
21 Jul 2020
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
391
2,216
0
03 Sep 2019
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
L. V. D. van der Maaten
Kilian Q. Weinberger
PINN
3DV
236
35,884
0
25 Aug 2016
1