Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2208.02131
Cited By
v1
v2 (latest)
Masked Vision and Language Modeling for Multi-modal Representation Learning
International Conference on Learning Representations (ICLR), 2022
3 August 2022
Gukyeong Kwon
Zhaowei Cai
Avinash Ravichandran
Erhan Bas
Rahul Bhotika
Stefano Soatto
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masked Vision and Language Modeling for Multi-modal Representation Learning"
36 / 36 papers shown
Multilingual Vision-Language Models, A Survey
Andrei-Alexandru Manea
Jindřich Libovický
VLM
146
1
0
26 Sep 2025
EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition
Hugo Thimonier
Antony Perzo
Renaud Seguier
145
2
0
19 Aug 2025
Boosting Visual Knowledge-Intensive Training for LVLMs Through Causality-Driven Visual Object Completion
International Joint Conference on Artificial Intelligence (IJCAI), 2025
Qingguo Hu
Ante Wang
Jia Song
Delai Qiu
Qingsong Liu
Jinsong Su
VLM
LRM
126
1
0
06 Aug 2025
Distribution-Based Masked Medical Vision-Language Model Using Structured Reports
Shreyank N. Gowda
Ruichi Zhang
Xiao Gu
Ying Weng
Lu Yang
VLM
250
1
0
29 Jul 2025
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
Hanxun Huang
Sarah Monazam Erfani
Yige Li
Jiabo He
James Bailey
AAML
474
9
0
08 May 2025
Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders
International Conference on Text, Speech and Dialogue (TSD), 2025
Andrei-Alexandru Manea
Jindřich Libovický
VLM
393
1
0
30 Apr 2025
REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval
Shabnam Choudhury
Yash Salunkhe
Sarthak Mehrotra
Biplab Banerjee
297
1
0
04 Apr 2025
SARLANG-1M: A Benchmark for Vision-Language Modeling in SAR Image Understanding
Yimin Wei
Aoran Xiao
Yexian Ren
Yuting Zhu
Hongruixuan Chen
J. Xia
Xiangwei Zhu
VLM
459
7
0
04 Apr 2025
DGTRSD & DGTRS-CLIP: A Dual-Granularity Remote Sensing Image-Text Dataset and Vision Language Foundation Model for Alignment
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE J-STARS), 2025
Weizhi Chen
Yupeng Deng
Jin Wei
Jingbo Chen
Jiansheng Chen
Yuman Feng
Zhihao Xi
Diyou Liu
Kai Li
Yu Meng
VLM
313
2
0
25 Mar 2025
CAST: Cross-modal Alignment Similarity Test for Vision Language Models
International Conference on Computational Linguistics (COLING), 2024
Gautier Dagan
Olga Loginova
Anil Batra
CoGe
238
1
0
17 Sep 2024
NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality
Chaofan Tao
Gukyeong Kwon
Varad Gunjal
Hao Yang
Zhaowei Cai
Yonatan Dukler
Ashwin Swaminathan
R. Manmatha
Colin Jon Taylor
Stefano Soatto
CoGe
194
0
0
18 Aug 2024
Masked Image Modeling: A Survey
International Journal of Computer Vision (IJCV), 2024
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
Andrii Zadaianchuk
482
20
0
13 Aug 2024
Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement
Zijie Yue
Miaojing Shi
Hanli Wang
Shuai Ding
Qijun Chen
Shanlin Yang
346
1
0
11 Jul 2024
Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition
Mingfang Zhang
Yifei Huang
Ruicong Liu
Yoichi Sato
206
17
0
09 Jul 2024
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Junjie Wang
Y. Zhang
Minghao Liu
Yin Zhang
Yatai Ji
...
Yujiu Yang
Ge Zhang
Ruibin Yuan
Bei Chen
Wenhu Chen
235
5
0
20 Jun 2024
IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Soumya Suvra Ghosal
Samyadeep Basu
Soheil Feizi
Dinesh Manocha
VLM
188
5
0
19 Jun 2024
ContextBLIP: Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions
Honglin Lin
Siyu Li
Gu Nan
Chaoyue Tang
Xueting Wang
...
Yankai Rong
Zhili Zhou
Yutong Gao
Qimei Cui
Xiaofeng Tao
170
0
0
29 May 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
412
15
0
05 Mar 2024
Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement
Cheng Li
Weijian Huang
Hao Yang
Jiarun Liu
Shanshan Wang
MedIm
221
13
0
21 Jan 2024
Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote Sensing
IEEE Transactions on Geoscience and Remote Sensing (TGRS), 2024
Jakob Hackstein
Gencer Sumbul
Kai Norman Clasen
Begüm Demir
348
13
0
15 Jan 2024
Mask Grounding for Referring Image Segmentation
Yong Xien Chng
Henry Zheng
Yizeng Han
Xuchong Qiu
Gao Huang
ISeg
ObjD
383
43
0
19 Dec 2023
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yizhou Wang
YiXuan Wu
Weizhen He
Xun Guo
Xun Guo
...
Mengwei He
Rui Zhao
Jian Wu
Tong He
Bin Wang
VLM
714
21
0
04 Dec 2023
Point Cloud Self-supervised Learning via 3D to Multi-view Masked Learner
Zhimin Chen
Yingwei Li
Xiao Guo
Yingwei Li
Longlong Jing
Liang Yang
Bing Li
3DPC
322
9
0
17 Nov 2023
FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space
Neural Information Processing Systems (NeurIPS), 2023
Shengzhong Liu
Tomoyoshi Kimura
Dongxin Liu
Ruijie Wang
Jinyang Li
Suhas Diggavi
Mani B. Srivastava
Tarek Abdelzaher
AI4TS
213
48
0
30 Oct 2023
VeCLIP: Improving CLIP Training via Visual-enriched Captions
European Conference on Computer Vision (ECCV), 2023
Zhengfeng Lai
Haotian Zhang
Bowen Zhang
Wentao Wu
Haoping Bai
...
Zhe Gan
Jiulong Shan
Chen-Nee Chuah
Yinfei Yang
Meng Cao
CLIP
VLM
363
59
0
11 Oct 2023
Continual Contrastive Spoken Language Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Umberto Cappellazzo
Enrico Fini
Muqiao Yang
Daniele Falavigna
Alessio Brutti
Bhiksha Raj
CLL
354
1
0
04 Oct 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
AAAI Conference on Artificial Intelligence (AAAI), 2023
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLM
VLM
MoE
196
20
0
23 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
434
151
0
25 Jul 2023
Global and Local Semantic Completion Learning for Vision-Language Pre-training
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Rong-Cheng Tu
Yatai Ji
Jie Jiang
Weijie Kong
Chengfei Cai
Wenzhe Zhao
Hongfa Wang
Yujiu Yang
Wei Liu
VLM
252
8
0
12 Jun 2023
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Qian Jiang
Changyou Chen
Han Zhao
Liqun Chen
Q. Ping
S. D. Tran
Yi Xu
Belinda Zeng
Trishul Chilimbi
221
66
0
10 Mar 2023
Advancing Radiograph Representation Learning with Masked Record Modeling
International Conference on Learning Representations (ICLR), 2023
Hong-Yu Zhou
Chenyu Lian
Lian-cheng Wang
Yizhou Yu
MedIm
287
86
0
30 Jan 2023
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2023
Liya Wang
A. Tien
414
19
0
28 Jan 2023
Scaling Language-Image Pre-training via Masking
Computer Vision and Pattern Recognition (CVPR), 2022
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
375
393
0
01 Dec 2022
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
IEEE International Conference on Computer Vision (ICCV), 2022
Yuanze Lin
Chen Wei
Huiyu Wang
Alan Yuille
Cihang Xie
3DGS
308
17
0
21 Nov 2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2022
Zijia Zhao
Longteng Guo
Xingjian He
Shuai Shao
Zehuan Yuan
Jing Liu
305
13
0
09 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
International Conference on Learning Representations (ICLR), 2022
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
396
167
0
02 Oct 2022
1
Page 1 of 1