Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.02053
Cited By
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning
3 March 2022
Weixin Liang
Yuhui Zhang
Yongchan Kwon
Serena Yeung
James Y. Zou
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning"
50 / 61 papers shown
Title
Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning
François Role
Sébastien Meyer
Victor Amblard
VLM
48
0
0
06 May 2025
Seeing the Abstract: Translating the Abstract Language for Vision Language Models
Davide Talon
Federico Girella
Ziyue Liu
Marco Cristani
Yiming Wang
VLM
48
0
0
06 May 2025
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Sangyeon Cho
Jangyeong Jeon
Mingi Kim
Junyeong Kim
CLIP
VLM
76
0
0
30 Apr 2025
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding
Binh M. Le
Shaoyuan Xu
Jinmiao Fu
Zhishen Huang
Moyan Li
Yanhui Guo
Hongdong Li
Sameera Ramasinghe
Bryan Wang
28
0
0
03 Apr 2025
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Wenzhe Yin
Zehao Xiao
Pan Zhou
Shujian Yu
Jiayi Shen
J. Sonke
E. Gavves
34
0
0
24 Feb 2025
Adaptive Neural Networks for Intelligent Data-Driven Development
Youssef Shoeb
Azarm Nowzad
Hanno Gottschalk
63
2
0
14 Feb 2025
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
Marco Mistretta
Alberto Baldrati
Lorenzo Agnolucci
Marco Bertini
Andrew D. Bagdanov
CLIP
VLM
99
2
0
06 Feb 2025
VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance
Divyansh Srivastava
Beatriz Cabrero-Daniel
Christian Berger
VLM
57
8
0
17 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
45
0
0
03 Jan 2025
Gramian Multimodal Representation Learning and Alignment
Giordano Cicchetti
Eleonora Grassucci
Luigi Sigillo
Danilo Comminiello
81
0
0
16 Dec 2024
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
Andreas Koukounas
Georgios Mastrapas
Bo Wang
Mohammad Kalim Akram
Sedigheh Eslami
Michael Gunther
Isabelle Mohr
Saba Sturua
Scott Martens
Nan Wang
VLM
103
6
0
11 Dec 2024
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng
M. Li
Hongbin Zhou
Renqiu Xia
Renrui Zhang
...
Aojun Zhou
Botian Shi
Tao Chen
Bo Zhang
Xiangyu Yue
86
4
0
08 Dec 2024
An Information Criterion for Controlled Disentanglement of Multimodal Data
Chenyu Wang
Sharut Gupta
Xinyi Zhang
Sana Tonekaboni
Stefanie Jegelka
Tommi Jaakkola
Caroline Uhler
DRL
32
1
0
31 Oct 2024
ResiDual Transformer Alignment with Spectral Decomposition
Lorenzo Basile
Valentino Maiorca
Luca Bortolussi
Emanuele Rodolà
Francesco Locatello
45
1
0
31 Oct 2024
Improving Long-Text Alignment for Text-to-Image Diffusion Models
Luping Liu
Chao Du
Tianyu Pang
Zehan Wang
Chongxuan Li
Dong Xu
VLM
51
5
0
15 Oct 2024
ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy
Hong Li
Zhiquan Tan
Xingyu Li
Weiran Huang
CLL
MoMe
21
1
0
14 Oct 2024
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Hugo Malard
Michel Olvera
Stéphane Lathuilière
S. Essid
VLM
30
0
0
08 Oct 2024
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Zixuan Wang
Chi-Keung Tang
Chi-Keung Tang
DiffM
VGen
LLMAG
41
4
0
04 Oct 2024
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
44
7
0
30 Sep 2024
Language-Queried Target Sound Extraction Without Parallel Training Data
Hao Ma
Zhiyuan Peng
Xu Li
Yukai Li
Mingjie Shao
Qiuqiang Kong
Ju Liu
VLM
69
1
0
14 Sep 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang
Dong Shen
Chaoxiang Cai
Fan Yang
Size Li
Di Zhang
Xi Li
MoE
41
2
0
28 Jun 2024
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Kiho Park
Yo Joong Choe
Yibo Jiang
Victor Veitch
45
24
0
03 Jun 2024
Topological Perspectives on Optimal Multimodal Embedding Spaces
Abdul Aziz
Abdul Rahim
BDL
29
0
0
29 May 2024
Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models
Yifei Ming
Yixuan Li
VLM
23
7
0
02 May 2024
RankCLIP: Ranking-Consistent Language-Image Pretraining
Yiming Zhang
Zhuokai Zhao
Zhaorun Chen
Zhili Feng
Zenghui Ding
Yining Sun
SSL
VLM
43
7
0
15 Apr 2024
Dissecting Query-Key Interaction in Vision Transformers
Xu Pan
Aaron Philip
Ziqian Xie
Odelia Schwartz
30
1
0
04 Apr 2024
Prioritized Semantic Learning for Zero-shot Instance Navigation
Xander Sun
Louis Lau
Hoyard Zhi
Ronghe Qiu
Junwei Liang
30
8
0
18 Mar 2024
Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation
Zhekai Du
Xinyao Li
Fengling Li
Ke Lu
Lei Zhu
Jingjing Li
38
15
0
05 Mar 2024
Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning
Yuhang Liu
Zhen Zhang
Dong Gong
Biwei Huang
Mingming Gong
A. Hengel
Kun Zhang
Javen Qinfeng Shi
J. Shi
41
7
0
09 Feb 2024
Modality-Aware Representation Learning for Zero-shot Sketch-based Image Retrieval
Eun-Young Lyou
Doyeon Lee
Jooeun Kim
Joonseok Lee
34
4
0
10 Jan 2024
AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis
Qiuhui Chen
Yi Hong
MedIm
15
1
0
02 Jan 2024
TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training
Chaoya Jiang
Wei Ye
Haiyang Xu
Qinghao Ye
Mingshi Yan
Ji Zhang
Shikun Zhang
CLIP
VLM
11
4
0
14 Dec 2023
Multimodal Pretraining of Medical Time Series and Notes
Ryan N. King
Tianbao Yang
Bobak J. Mortazavi
21
12
0
11 Dec 2023
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
21
64
0
07 Nov 2023
Scene Graph Conditioning in Latent Diffusion
Frank Fundel
DiffM
25
0
0
16 Oct 2023
Multimodal Federated Learning in Healthcare: a Review
Jacob Thrasher
Alina Devkota
Prasiddha Siwakotai
Rohit Chivukula
Pranav Poudel
Chaunbo Hu
Binod Bhattarai
P. Gyawali
26
7
0
14 Oct 2023
Extending Multi-modal Contrastive Representations
Zehan Wang
Ziang Zhang
Luping Liu
Yang Zhao
Haifeng Huang
Tao Jin
Zhou Zhao
19
5
0
13 Oct 2023
Weakly-supervised Automated Audio Captioning via text only training
Theodoros Kouzelis
V. Katsouros
CLIP
25
6
0
21 Sep 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLM
CLIP
27
13
0
25 Aug 2023
Adversarial Illusions in Multi-Modal Embeddings
Tingwei Zhang
Rishi Jha
Eugene Bagdasaryan
Vitaly Shmatikov
AAML
24
8
0
22 Aug 2023
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
Erfan Shayegani
Yue Dong
Nael B. Abu-Ghazaleh
20
126
0
26 Jul 2023
Linear Alignment of Vision-language Models for Image Captioning
Fabian Paischer
M. Hofmarcher
Sepp Hochreiter
Thomas Adler
CLIP
VLM
42
0
0
10 Jul 2023
Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Bang-ju Yang
Fenglin Liu
Zheng Li
Qingyu Yin
Chenyu You
Bing Yin
Yuexian Zou
VLM
26
5
0
05 Jul 2023
Improving neural network representations using human similarity judgments
Lukas Muttenthaler
Lorenz Linhardt
Jonas Dippel
Robert A. Vandermeulen
Katherine L. Hermann
Andrew Kyle Lampinen
Simon Kornblith
37
29
0
07 Jun 2023
Learning without Forgetting for Vision-Language Models
Da-Wei Zhou
Yuanhan Zhang
Jingyi Ning
Jingyi Ning
De-Chuan Zhan
De-Chuan Zhan
Ziwei Liu
VLM
CLL
69
37
0
30 May 2023
Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors
Paul S. Scotti
Atmadeep Banerjee
J. Goode
Stepan Shabalin
A. Nguyen
...
Nathalie Verlinde
Elad Yundler
David Weisberg
K. A. Norman
Tanishq Mathew Abraham
DiffM
32
106
0
29 May 2023
OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding
Minghua Liu
Ruoxi Shi
Kaiming Kuang
Yinhao Zhu
Xuanlin Li
Shizhong Han
H. Cai
Fatih Porikli
Hao Su
3DPC
27
116
0
18 May 2023
TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision
Jiacheng Wei
Hao Wang
Jiashi Feng
Guosheng Lin
Kim-Hui Yap
19
30
0
23 Mar 2023
Linear Spaces of Meanings: Compositional Structures in Vision-Language Models
Matthew Trager
Pramuditha Perera
L. Zancato
Alessandro Achille
Parminder Bhatia
Stefano Soatto
CoGe
19
30
0
28 Feb 2023
Shifted Diffusion for Text-to-image Generation
Yufan Zhou
Bingchen Liu
Yizhe Zhu
Xiao Yang
Changyou Chen
Jinhui Xu
DiffM
22
39
0
24 Nov 2022
1
2
Next