ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.02053
  4. Cited By
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive
  Representation Learning
v1v2 (latest)

Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning

Neural Information Processing Systems (NeurIPS), 2022
3 March 2022
Weixin Liang
Yuhui Zhang
Yongchan Kwon
Serena Yeung
James Zou
    VLM
ArXiv (abs)PDFHTMLGithub (33247★)

Papers citing "Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning"

50 / 368 papers shown
SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP
SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP
Christoph Timmermann
Hyunse Lee
Woojin Lee
VLM
197
2
0
10 Apr 2026
One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework
One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework
Lorenzo Bianchi
Giacomo Pacini
F. Carrara
Nicola Messina
Giuseppe Amato
Fabrizio Falchi
3DVVLM
240
1
0
30 Mar 2026
DashFusion: Dual-stream Alignment with Hierarchical Bottleneck Fusion for Multimodal Sentiment Analysis
DashFusion: Dual-stream Alignment with Hierarchical Bottleneck Fusion for Multimodal Sentiment AnalysisIEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2025
Yuhua Wen
Qifei Li
Yingying Zhou
Yingming Gao
Zhengqi Wen
Jianhua Tao
Ya Li
152
3
0
05 Dec 2025
Language Models as Semantic Teachers: Post-Training Alignment for Medical Audio Understanding
Language Models as Semantic Teachers: Post-Training Alignment for Medical Audio Understanding
Tsai-Ning Wang
Lin-Lin Chen
Neil Zeghidour
Aaqib Saeed
LM&MAAuLLMELM
204
1
0
04 Dec 2025
Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap Correction
Text-Only Training for Image Captioning with Retrieval Augmentation and Modality Gap Correction
Rui Fonseca
Bruno Martins
Gil Rocha
VLM
166
0
0
03 Dec 2025
Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
Shojiro Yamabe
Futa Waseda
Daiki Shiono
Tsubasa Takahashi
DiffMMLLMVLM
299
1
0
03 Dec 2025
Minimal neuron ablation triggers catastrophic collapse in the language core of Large Vision-Language Models
Minimal neuron ablation triggers catastrophic collapse in the language core of Large Vision-Language Models
Cen Lu
Yung-Chen Tang
Andrea Cavallaro
92
0
0
30 Nov 2025
Bridging the Modality Gap by Similarity Standardization with Pseudo-Positive Samples
Bridging the Modality Gap by Similarity Standardization with Pseudo-Positive Samples
Shuhei Yamashita
Daiki Shirafuji
Tatsuhiko Saito
116
1
0
27 Nov 2025
Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion
Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion
Samuele DellÉrba
Andrew D. Bagdanov
227
0
0
25 Nov 2025
Decoupling and Damping: Structurally-Regularized Gradient Matching for Multimodal Graph Condensation
Decoupling and Damping: Structurally-Regularized Gradient Matching for Multimodal Graph Condensation
Lian Shen
Zhendan Chen
Yinhui jiang
Meijia Song
Ziming Su
Juan Liu
Xiangrong Liu
158
1
0
25 Nov 2025
UISearch: Graph-Based Embeddings for Multimodal Enterprise UI Screenshots Retrieval
UISearch: Graph-Based Embeddings for Multimodal Enterprise UI Screenshots Retrieval
Maroun Ayli
Youssef Bakouny
Tushar Sharma
Nader Jalloul
Hani Seifeddine
Rima Kilany
AI4TS
310
0
0
24 Nov 2025
A Little More Like This: Text-to-Image Retrieval with Vision-Language Models Using Relevance Feedback
A Little More Like This: Text-to-Image Retrieval with Vision-Language Models Using Relevance Feedback
Bulat Khaertdinov
Mirela Popa
Nava Tintarev
VLM
201
0
0
21 Nov 2025
uCLIP: Parameter-Efficient Multilingual Extension of Vision-Language Models with Unpaired Data
uCLIP: Parameter-Efficient Multilingual Extension of Vision-Language Models with Unpaired Data
Dahyun Chung
Donghyun Shin
Yujin Sung
Seunggi Moon
Jinwoo Jeon
Byung-Jun Lee
CLIPVLM
222
0
0
17 Nov 2025
FoCLIP: A Feature-Space Misalignment Framework for CLIP-Based Image Manipulation and Detection
FoCLIP: A Feature-Space Misalignment Framework for CLIP-Based Image Manipulation and Detection
Yulin Chen
Zeyuan Wang
Tianyuan Yu
Yingmei Wei
Liang Bai
133
0
0
10 Nov 2025
MoRA: Missing Modality Low-Rank Adaptation for Visual Recognition
MoRA: Missing Modality Low-Rank Adaptation for Visual Recognition
Shu Zhao
Nilesh A. Ahuja
Tan Yu
Tianyi Shen
V. Narayanan
VLM
209
1
0
09 Nov 2025
On the Brittleness of CLIP Text Encoders
On the Brittleness of CLIP Text Encoders
Allie Tran
Luca Rossetto
293
1
0
06 Nov 2025
ProM3E: Probabilistic Masked MultiModal Embedding Model for Ecology
ProM3E: Probabilistic Masked MultiModal Embedding Model for Ecology
Srikumar Sastry
Subash Khanal
Aayush Dhakal
Jiayu Lin
Dan Cher
Phoenix Jarosz
Nathan Jacobs
198
0
0
04 Nov 2025
A Retrospect to Multi-prompt Learning across Vision and Language
A Retrospect to Multi-prompt Learning across Vision and LanguageIEEE International Conference on Computer Vision (ICCV), 2023
Ziliang Chen
Xin Huang
Quanlong Guan
Liang Lin
Weiqi Luo
VPVLMVLM
480
12
0
31 Oct 2025
Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Wu Wei
Xiaomeng Fan
Yuwei Wu
Zhi Gao
P. Li
Yunde Jia
Mehrtash Harandi
186
1
0
31 Oct 2025
A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models
A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models
Shihab Aaqil Ahamed
Udaya S.K.P. Miriya Thanthrige
Ranga Rodrigo
Muhammad Haris Khan
VLM
272
0
0
30 Oct 2025
T-REGS: Minimum Spanning Tree Regularization for Self-Supervised Learning
T-REGS: Minimum Spanning Tree Regularization for Self-Supervised Learning
Julie Mordacq
David Loiseaux
Vicky Kalogeiton
S. Oudot
257
0
0
27 Oct 2025
Data-Centric Lessons To Improve Speech-Language Pretraining
Data-Centric Lessons To Improve Speech-Language Pretraining
Vishaal Udandarao
Zhiyun Lu
Xuankai Chang
Yongqiang Wang
Violet Z. Yao
Albin Madapally Jose
Fartash Faghri
Josh Gardner
Chung-Cheng Chiu
185
2
0
22 Oct 2025
Theoretical Refinement of CLIP by Utilizing Linear Structure of Optimal Similarity
Theoretical Refinement of CLIP by Utilizing Linear Structure of Optimal Similarity
Naoki Yoshida
Satoshi Hayakawa
Yuhta Takida
Toshimitsu Uesaka
Hiromi Wakaki
Yuki Mitsufuji
152
0
0
17 Oct 2025
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
Mor Ventura
Michael Toker
Or Patashnik
Yonatan Belinkov
Roi Reichart
259
0
0
16 Oct 2025
QuASH: Using Natural-Language Heuristics to Query Visual-Language Robotic Maps
QuASH: Using Natural-Language Heuristics to Query Visual-Language Robotic Maps
Matti Pekkanen
Francesco Verdoja
Ville Kyrki
165
0
0
16 Oct 2025
When Embedding Models Meet: Procrustes Bounds and Applications
When Embedding Models Meet: Procrustes Bounds and Applications
Lucas Maystre
Alvaro Ortega Gonzalez
Charles Park
Rares Dolga
Tudor Berariu
Yu Zhao
K. Ciosek
199
1
0
15 Oct 2025
Understanding the Modality Gap: An Empirical Study on the Speech-Text Alignment Mechanism of Large Speech Language Models
Understanding the Modality Gap: An Empirical Study on the Speech-Text Alignment Mechanism of Large Speech Language Models
Bajian Xiang
Shuaijiang Zhao
Tingwei Guo
Wei Zou
135
5
0
14 Oct 2025
Lifting Manifolds to Mitigate Pseudo-Alignment in LLM4TS
Lifting Manifolds to Mitigate Pseudo-Alignment in LLM4TS
Liangwei Nathan Zheng
Wenhao Liang
Wei Emma Zhang
Miao Xu
Olaf Maennel
Weitong Chen
AI4TS
170
2
0
14 Oct 2025
Diffusion-Link: Diffusion Probabilistic Model for Bridging the Audio-Text Modality Gap
Diffusion-Link: Diffusion Probabilistic Model for Bridging the Audio-Text Modality Gap
KiHyun Nam
J. Choi
Hyeongkeun Lee
Jungwoo Heo
Joon Son Chung
150
2
0
13 Oct 2025
Self-Supervised Representation Learning with ID-Content Modality Alignment for Sequential Recommendation
Self-Supervised Representation Learning with ID-Content Modality Alignment for Sequential Recommendation
Donglin Zhou
Weike Pan
Zhong Ming
AI4TS
197
0
0
12 Oct 2025
DREAM: A Benchmark Study for Deepfake photoREalism AssessMent
DREAM: A Benchmark Study for Deepfake photoREalism AssessMent
Bo Peng
Zichuan Wang
Sheng Yu
Xiaochuan Jin
Wei Wang
Jing Dong
EGVM
244
0
0
11 Oct 2025
D-TPT: Dimensional Entropy Maximization for Calibrating Test-Time Prompt Tuning in Vision-Language Models
D-TPT: Dimensional Entropy Maximization for Calibrating Test-Time Prompt Tuning in Vision-Language Models
Jisu Han
Wonjun Hwang
VLM
224
1
0
10 Oct 2025
Partial Information Decomposition via Normalizing Flows in Latent Gaussian Distributions
Partial Information Decomposition via Normalizing Flows in Latent Gaussian Distributions
Wenyuan Zhao
Adithya Balachandran
Chao Tian
Paul Pu Liang
241
1
0
06 Oct 2025
Mitigating Modal Imbalance in Multimodal Reasoning
Mitigating Modal Imbalance in Multimodal Reasoning
Chen Henry Wu
Neil Kale
Aditi Raghunathan
LRM
186
6
0
02 Oct 2025
Generalized Contrastive Learning for Universal Multimodal Retrieval
Generalized Contrastive Learning for Universal Multimodal Retrieval
Jungsoo Lee
Janghoon Cho
Hyojin Park
Munawar Hayat
Kyuwoong Hwang
Fatih Porikli
Sungha Choi
VLM
240
4
0
30 Sep 2025
Semantic Compression via Multimodal Representation Learning
Semantic Compression via Multimodal Representation Learning
Eleonora Grassucci
Giordano Cicchetti
A. Uncini
Danilo Comminiello
195
0
0
29 Sep 2025
Hierarchical Representation Matching for CLIP-based Class-Incremental Learning
Hierarchical Representation Matching for CLIP-based Class-Incremental Learning
Zhen-Hao Wen
Yan Wang
Ji Feng
Han-Jia Ye
De-Chuan Zhan
Da-Wei Zhou
CLLVLM
209
1
0
26 Sep 2025
LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision
LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision
Debargha Ganguly
Sumit Kumar
Ishwar B Balappanawar
Weicong Chen
Shashank Kambhatla
Srinivasan Iyengar
Shivkumar Kalyanaraman
Ponnurangam Kumaraguru
Vipin Chaudhary
VLM
247
2
0
26 Sep 2025
Improving Generalizability and Undetectability for Targeted Adversarial Attacks on Multimodal Pre-trained Models
Improving Generalizability and Undetectability for Targeted Adversarial Attacks on Multimodal Pre-trained Models
Zhifang Zhang
Jiahan Zhang
S. Kevin Zhou
Qi Wei
Shuo He
Feng Liu
Bingquan Shen
AAML
385
4
0
24 Sep 2025
Single-Branch Network Architectures to Close the Modality Gap in Multimodal Recommendation
Single-Branch Network Architectures to Close the Modality Gap in Multimodal Recommendation
Christian Ganhor
Marta Moscati
Anna Hausberger
Shah Nawaz
Markus Schedl
HAIOffRL
191
4
0
23 Sep 2025
A Modality-Aware Cooperative Co-Evolutionary Framework for Multimodal Graph Neural Architecture Search
A Modality-Aware Cooperative Co-Evolutionary Framework for Multimodal Graph Neural Architecture Search
Sixuan Wang
Jiao Yin
Jinli Cao
MingJian Tang
Yong-Feng Ge
133
0
0
23 Sep 2025
Global Minimizers of Sigmoid Contrastive Loss
Global Minimizers of Sigmoid Contrastive Loss
Kiril Bangachev
Guy Bresler
Iliyas Noman
Yury Polyanskiy
231
1
0
23 Sep 2025
Vision-Free Retrieval: Rethinking Multimodal Search with Textual Scene Descriptions
Vision-Free Retrieval: Rethinking Multimodal Search with Textual Scene Descriptions
Ioanna Ntinou
Alexandros Xenos
Yassine Ouali
Adrian Bulat
Georgios Tzimiropoulos
VLM
182
2
0
23 Sep 2025
Can LLMs Reason Over Non-Text Modalities in a Training-Free Manner? A Case Study with In-Context Representation Learning
Can LLMs Reason Over Non-Text Modalities in a Training-Free Manner? A Case Study with In-Context Representation Learning
Tianle Zhang
Wanlong Fang
Jonathan Woo
Paridhi Latawa
Deepak A.Subramanian
Alvin Chan
267
2
0
22 Sep 2025
ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents
ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents
Yichen Wang
Hangtao Zhang
Hewen Pan
Ziqi Zhou
Xianlong Wang
Peijin Guo
Lulu Xue
Shengshan Hu
Minghui Li
Leo Yu Zhang
AAML
278
10
0
20 Sep 2025
SitLLM: Large Language Models for Sitting Posture Health Understanding via Pressure Sensor Data
SitLLM: Large Language Models for Sitting Posture Health Understanding via Pressure Sensor Data
Jian Gao
Fufangchen Zhao
Yiyang Zhang
Danfeng Yan
144
0
0
16 Sep 2025
Cross-Modal Retrieval with Cauchy-Schwarz Divergence
Cross-Modal Retrieval with Cauchy-Schwarz Divergence
Jiahao Zhang
Wenzhe Yin
Shujian Yu
175
0
0
15 Sep 2025
Lost in Embeddings: Information Loss in Vision-Language Models
Lost in Embeddings: Information Loss in Vision-Language Models
Wenyan Li
Raphael Tang
Chengzu Li
Caiqi Zhang
Ivan Vulić
Anders Søgaard
VLM
169
8
0
15 Sep 2025
Towards Understanding Visual Grounding in Visual Language Models
Towards Understanding Visual Grounding in Visual Language Models
Georgios Pantazopoulos
Eda B. Özyiğit
ObjD
515
4
0
12 Sep 2025
Enhancing 3D Medical Image Understanding with Pretraining Aided by 2D Multimodal Large Language Models
Enhancing 3D Medical Image Understanding with Pretraining Aided by 2D Multimodal Large Language ModelsIEEE journal of biomedical and health informatics (JBHI), 2025
Qiuhui Chen
Xuancheng Yao
Huping Ye
Yi Hong
MedIm
165
1
0
11 Sep 2025
12345678
Next
Page 1 of 8