ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.09623
  4. Cited By
Cross-Modal Adapter for Vision-Language Retrieval
v1v2 (latest)

Cross-Modal Adapter for Vision-Language Retrieval

Pattern Recognition (Pattern Recogn.), 2022
17 November 2022
Haojun Jiang
Jianke Zhang
Rui Huang
Chunjiang Ge
Zanlin Ni
Jiwen Lu
Gao Huang
ArXiv (abs)PDFHTMLGithub (55★)

Papers citing "Cross-Modal Adapter for Vision-Language Retrieval"

31 / 31 papers shown
Repeating Words for Video-Language Retrieval with Coarse-to-Fine Objectives
Repeating Words for Video-Language Retrieval with Coarse-to-Fine Objectives
Haoyu Zhao
Jiaxi Gu
Shicong Wang
Xing Zhang
Hang Xu
Zuxuan Wu
Yu-Gang Jiang
198
0
0
20 Aug 2025
pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models
pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models
Sajjad Ghiasvand
Mahnoosh Alizadeh
Ramtin Pedarsani
VLM
382
1
0
07 Jul 2025
Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval
Hailong Ning
Siying Wang
Tao Lei
Xiaopeng Cao
Huanmin Dou
Bin Zhao
Asoke K. Nandi
Petia Radeva
198
3
0
22 May 2025
UP-Person: Unified Parameter-Efficient Transfer Learning for Text-based Person Retrieval
UP-Person: Unified Parameter-Efficient Transfer Learning for Text-based Person Retrieval
Yating Liu
Yaowei Li
Xiangyuan Lan
Wenming Yang
Zimo Liu
Q. Liao
313
4
0
14 Apr 2025
A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval
A Resource-Efficient Training Framework for Remote Sensing Text--Image Retrieval
Weihang Zhang
Jihao Li
Shuoke Li
Ziqing Niu
Jialiang Chen
Wenkai Zhang
VLM
295
1
0
18 Jan 2025
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video SegmentationComputer Vision and Pattern Recognition (CVPR), 2024
Claudia Cuttano
Gabriele Trivigno
Gabriele Rosi
Carlo Masone
Giuseppe Averta
VOS
581
38
0
26 Nov 2024
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning
  Through Retrieval and Understanding Modalities
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities
Ehsan Faghihi
Mohammedreza Zarenejad
Ali-Asghar Beheshti Shirazi
299
2
0
04 Nov 2024
Beyond Coarse-Grained Matching in Video-Text Retrieval
Beyond Coarse-Grained Matching in Video-Text RetrievalAsian Conference on Computer Vision (ACCV), 2024
Aozhu Chen
Hazel Doughty
Xirong Li
Cees G. M. Snoek
330
2
0
16 Oct 2024
Deep Transfer Learning: Model Framework and Error Analysis
Deep Transfer Learning: Model Framework and Error Analysis
Yuling Jiao
Huazhen Lin
Yuchen Luo
Jerry Zhijian Yang
518
2
0
12 Oct 2024
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression ComprehensionConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Ting Liu
Zunnan Xu
Yue Hu
Liangtao Shi
Zhiqiang Wang
Quanjun Yin
667
8
0
20 Sep 2024
OneEncoder: A Lightweight Framework for Progressive Alignment of
  Modalities
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities
Hanane Azzag
Hanane Azzag
M. Lebbah
ObjD
383
3
0
17 Sep 2024
Selective Vision-Language Subspace Projection for Few-shot CLIP
Selective Vision-Language Subspace Projection for Few-shot CLIP
Xingyu Zhu
Beier Zhu
Yi Tan
Shuo Wang
Yanbin Hao
Haiqi Zhang
VLM
265
23
0
24 Jul 2024
Structure-aware World Model for Probe Guidance via Large-scale
  Self-supervised Pre-train
Structure-aware World Model for Probe Guidance via Large-scale Self-supervised Pre-train
Haojun Jiang
Meng Li
Zhenguo Sun
Ning Jia
Yu Sun
Shaqi Luo
Shiji Song
Gao Huang
334
6
0
28 Jun 2024
Cardiac Copilot: Automatic Probe Guidance for Echocardiography with
  World Model
Cardiac Copilot: Automatic Probe Guidance for Echocardiography with World ModelInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024
Haojun Jiang
Zhenguo Sun
Ning Jia
Meng Li
Yu Sun
Shaqi Luo
Shiji Song
Gao Huang
235
17
0
19 Jun 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data PerspectivesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Thong Nguyen
Yi Bin
Junbin Xiao
Leigang Qu
Yicong Li
Jay Zhangjie Wu
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
648
39
1
09 Jun 2024
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
Meng Cao
Haoran Tang
Jinfa Huang
Peng Jin
Can Zhang
Ruyang Liu
Long Chen
Xiaodan Liang
Li-ming Yuan
Ge Li
340
23
0
29 May 2024
CLIP model is an Efficient Online Lifelong Learner
CLIP model is an Efficient Online Lifelong Learner
Leyuan Wang
Liuyu Xiang
Yujie Wei
Yunlong Wang
Zhaofeng He
VLMCLL
294
4
0
24 May 2024
DARA: Domain- and Relation-aware Adapters Make Parameter-efficient
  Tuning for Visual Grounding
DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual GroundingIEEE International Conference on Multimedia and Expo (ICME), 2024
Ting Liu
Xuyang Liu
Siteng Huang
Honggang Chen
Quanjun Yin
Long Qin
Donglin Wang
Yue Hu
321
13
0
10 May 2024
Efficient Remote Sensing with Harmonized Transfer Learning and Modality
  Alignment
Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment
Tengjun Huang
439
11
0
28 Apr 2024
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval
Xiangpeng Yang
Linchao Zhu
Xiaohan Wang
Yi Yang
VLM
347
52
0
19 Jan 2024
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
S. DarshanSingh
Zeeshan Khan
Makarand Tapaswi
VLMCLIP
257
6
0
15 Jan 2024
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Few-shot Adaptation of Multi-modal Foundation Models: A SurveyArtificial Intelligence Review (Artif Intell Rev), 2024
Fan Liu
Tianshu Zhang
Wenwen Dai
Wenwen Cai
Wenwen Cai Xiaocong Zhou
Delong Chen
VLMOffRL
377
58
0
03 Jan 2024
READ-PVLA: Recurrent Adapter with Partial Video-Language Alignment for
  Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling
READ-PVLA: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language ModelingAAAI Conference on Artificial Intelligence (AAAI), 2023
Thong Nguyen
Xiaobao Wu
Xinshuai Dong
Khoi M. Le
Zhiyuan Hu
Cong-Duy Nguyen
See-Kiong Ng
Anh Tuan Luu
247
2
0
12 Dec 2023
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
Tanveer Hannan
Md. Mohaiminul Islam
Thomas Seidl
Gedas Bertasius
574
13
0
11 Dec 2023
Side4Video: Spatial-Temporal Side Network for Memory-Efficient
  Image-to-Video Transfer Learning
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Huanjin Yao
Wenhao Wu
Zhiheng Li
VLM
358
16
0
27 Nov 2023
Unified Coarse-to-Fine Alignment for Video-Text Retrieval
Unified Coarse-to-Fine Alignment for Video-Text RetrievalIEEE International Conference on Computer Vision (ICCV), 2023
Ziyang Wang
Yi-Lin Sung
Feng Cheng
Gedas Bertasius
Joey Tianyi Zhou
470
89
0
18 Sep 2023
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text
  Retrieval
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text RetrievalIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2023
Yuan. Yuan
Yangfan Zhan
Zhitong Xiong
VLM
290
70
0
24 Aug 2023
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
Prompt Switch: Efficient CLIP Adaptation for Text-Video RetrievalIEEE International Conference on Computer Vision (ICCV), 2023
Chaorui Deng
Qi Chen
Pengda Qin
Dave Zhenyu Chen
Qi Wu
VLMCLIP
294
49
0
15 Aug 2023
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic
  Compatible Adapter
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter
Binjie Zhang
Yixiao Ge
Xuyuan Xu
Ying Shan
Mike Zheng Shou
239
9
0
22 Jun 2023
Visual Tuning
Visual TuningACM Computing Surveys (ACM Comput. Surv.), 2023
Bruce X. B. Yu
Jianlong Chang
Haixin Wang
Lin Liu
Shijie Wang
...
Lingxi Xie
Haojie Li
Zhouchen Lin
Qi Tian
Chang Wen Chen
VLM
537
64
0
10 May 2023
Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot
  Learning
Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot LearningComputer Vision and Pattern Recognition (CVPR), 2023
Siteng Huang
Biao Gong
Yutong Feng
Min Zhang
Yiliang Lv
Xuetao Zhang
CoGe
226
40
0
27 Mar 2023
1
Page 1 of 1