ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2208.02131
  4. Cited By
Masked Vision and Language Modeling for Multi-modal Representation
  Learning

Masked Vision and Language Modeling for Multi-modal Representation Learning

3 August 2022
Gukyeong Kwon
Zhaowei Cai
Avinash Ravichandran
Erhan Bas
Rahul Bhotika
Stefano Soatto
ArXivPDFHTML

Papers citing "Masked Vision and Language Modeling for Multi-modal Representation Learning"

50 / 60 papers shown
Title
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
Hanxun Huang
Sarah Monazam Erfani
Yige Li
Xingjun Ma
James Bailey
AAML
34
0
0
08 May 2025
Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders
Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders
Andrei-Alexandru Manea
Jindřich Libovický
VLM
52
0
0
30 Apr 2025
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Jiarui Ye
Hao Tang
LM&MA
76
0
0
29 Apr 2025
SARLANG-1M: A Benchmark for Vision-Language Modeling in SAR Image Understanding
SARLANG-1M: A Benchmark for Vision-Language Modeling in SAR Image Understanding
Yimin Wei
Aoran Xiao
Yexian Ren
Yuting Zhu
Hongruixuan Chen
J. Xia
Naoto Yokoya
VLM
66
0
0
04 Apr 2025
REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval
REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval
Shabnam Choudhury
Yash Salunkhe
Sarthak Mehrotra
Biplab Banerjee
29
0
0
04 Apr 2025
Enhancing Image Resolution of Solar Magnetograms: A Latent Diffusion Model Approach
Enhancing Image Resolution of Solar Magnetograms: A Latent Diffusion Model Approach
Francesco P. Ramunno
Paolo Massa
Vitaliy Kinakh
Brandon Panos
A. Csillaghy
S. Voloshynovskiy
DiffM
53
0
0
31 Mar 2025
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text
Weizhi Chen
Jingbo Chen
Yupeng Deng
Jiansheng Chen
Yuman Feng
Zhihao Xi
Diyou Liu
Kai Li
Yu Meng
VLM
51
0
0
25 Mar 2025
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation
  Models
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models
Rabin Adhikari
Safal Thapaliya
Manish Dhakal
Bishesh Khanal
MLLM
VLM
19
0
0
07 Oct 2024
CAST: Cross-modal Alignment Similarity Test for Vision Language Models
CAST: Cross-modal Alignment Similarity Test for Vision Language Models
Gautier Dagan
Olga Loginova
Anil Batra
CoGe
72
1
0
17 Sep 2024
PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose
  Representation
PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation
Ginger Delmas
Philippe Weinzaepfel
Francesc Moreno-Noguer
Grégory Rogez
27
2
0
10 Sep 2024
NAVERO: Unlocking Fine-Grained Semantics for Video-Language
  Compositionality
NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality
Chaofan Tao
Gukyeong Kwon
Varad Gunjal
Hao Yang
Zhaowei Cai
Yonatan Dukler
Ashwin Swaminathan
R. Manmatha
Colin Jon Taylor
Stefano Soatto
CoGe
27
0
0
18 Aug 2024
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing
  Speech-Image Retrieval
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval
Lifeng Zhou
Yuke Li
Rui Deng
Yuting Yang
Haoqi Zhu
21
0
0
15 Aug 2024
Masked Image Modeling: A Survey
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
56
6
0
13 Aug 2024
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
Biao Wu
Yutong Xie
Zeyu Zhang
Minh Hieu Phan
Qi Chen
Ling-Hao Chen
Qi Wu
LM&MA
24
0
0
28 Jul 2024
Category-Extensible Out-of-Distribution Detection via Hierarchical
  Context Descriptions
Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions
Kai-Chun Liu
Zhihang Fu
Chao Chen
Sheng Jin
Ze Chen
Mingyuan Tao
Rongxin Jiang
Jieping Ye
VLM
OODD
49
4
0
23 Jul 2024
Masks and Manuscripts: Advancing Medical Pre-training with End-to-End
  Masking and Narrative Structuring
Masks and Manuscripts: Advancing Medical Pre-training with End-to-End Masking and Narrative Structuring
Shreyank N. Gowda
David A. Clifton
MedIm
21
1
0
23 Jul 2024
Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement
Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement
Zijie Yue
Miaojing Shi
Hanli Wang
Shuai Ding
Qijun Chen
Shanlin Yang
35
0
0
11 Jul 2024
Masked Video and Body-worn IMU Autoencoder for Egocentric Action
  Recognition
Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition
Mingfang Zhang
Yifei Huang
Ruicong Liu
Yoichi Sato
29
4
0
09 Jul 2024
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal
  Documents
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Junjie Wang
Yin Zhang
Yatai Ji
Yuxiang Zhang
Chunyang Jiang
...
Bei Chen
Qunshu Lin
Minghao Liu
Ge Zhang
Wenhu Chen
32
3
0
20 Jun 2024
IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning
IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning
Soumya Suvra Ghosal
Samyadeep Basu
S. Feizi
Dinesh Manocha
VLM
33
2
0
19 Jun 2024
ContextBLIP: Doubly Contextual Alignment for Contrastive Image Retrieval
  from Linguistically Complex Descriptions
ContextBLIP: Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions
Honglin Lin
Siyu Li
Gu Nan
Chaoyue Tang
Xueting Wang
...
Yankai Rong
Zhili Zhou
Yutong Gao
Qimei Cui
Xiaofeng Tao
25
0
0
29 May 2024
MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning
MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning
Junjie Wang
Guangjing Yang
Wentao Chen
Huahui Yi
Xiaohu Wu
Qicheng Lao
MoE
ALM
32
0
0
29 May 2024
From Orthogonality to Dependency: Learning Disentangled Representation
  for Multi-Modal Time-Series Sensing Signals
From Orthogonality to Dependency: Learning Disentangled Representation for Multi-Modal Time-Series Sensing Signals
Ruichu Cai
Zhifan Jiang
Zijian Li
Weilin Chen
Xuexin Chen
Zhifeng Hao
Yifan Shen
Guan-Hong Chen
Kun Zhang
24
1
0
25 May 2024
Unified Multi-modal Diagnostic Framework with Reconstruction
  Pre-training and Heterogeneity-combat Tuning
Unified Multi-modal Diagnostic Framework with Reconstruction Pre-training and Heterogeneity-combat Tuning
Yupei Zhang
Li Pan
Qiushi Yang
Tan Li
Zhen Chen
18
1
0
09 Apr 2024
SyncMask: Synchronized Attentional Masking for Fashion-centric
  Vision-Language Pretraining
SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining
Chull Hwan Song
Taebaek Hwang
Jooyoung Yoon
Shunghyun Choi
Yeong Hyeon Gu
19
4
0
01 Apr 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
62
11
0
05 Mar 2024
Vision-Language Models for Medical Report Generation and Visual Question
  Answering: A Review
Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review
Iryna Hartsock
Ghulam Rasool
38
60
0
04 Mar 2024
Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy,
  Advances, and Outlook
Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy, Advances, and Outlook
Xingchen Zou
Yibo Yan
Xixuan Hao
Yuehong Hu
Haomin Wen
...
Junbo Zhang
Yong Li
Tianrui Li
Yu Zheng
Yuxuan Liang
HAI
AI4TS
43
35
0
29 Feb 2024
Enhancing EEG-to-Text Decoding through Transferable Representations from
  Pre-trained Contrastive EEG-Text Masked Autoencoder
Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder
Jiaqi Wang
Zhenxi Song
Zhengyu Ma
Xipeng Qiu
Min Zhang
Zhiguo Zhang
29
5
0
27 Feb 2024
Enhancing the vision-language foundation model with key semantic
  knowledge-emphasized report refinement
Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement
Cheng Li
Weijian Huang
Hao Yang
Jiarun Liu
Shanshan Wang
MedIm
17
3
0
21 Jan 2024
Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in
  Remote Sensing
Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote Sensing
Jakob Hackstein
Gencer Sumbul
Kai Norman Clasen
Begum Demir
12
7
0
15 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision
  and Beyond
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun-Xiong Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
29
13
0
31 Dec 2023
Mask Grounding for Referring Image Segmentation
Mask Grounding for Referring Image Segmentation
Yong Xien Chng
Henry Zheng
Yizeng Han
Xuchong Qiu
Gao Huang
ISeg
ObjD
19
15
0
19 Dec 2023
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
Yizhou Wang
YiXuan Wu
Shixiang Tang
Weizhen He
Xun Guo
...
Lei Bai
Rui Zhao
Jian Wu
Tong He
Wanli Ouyang
VLM
26
14
0
04 Dec 2023
Point Cloud Self-supervised Learning via 3D to Multi-view Masked
  Autoencoder
Point Cloud Self-supervised Learning via 3D to Multi-view Masked Autoencoder
Zhimin Chen
Yingwei Li
Longlong Jing
Liang Yang
Bing Li
3DPC
29
9
0
17 Nov 2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with
  Modality Collaboration
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
116
367
0
07 Nov 2023
Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation
  for Grounding-Based Vision and Language Models
Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language Models
Jingru Yi
Burak Uzkent
Oana Ignat
Zili Li
Amanmeet Garg
Xiang Yu
Linda Liu
VLM
17
1
0
05 Nov 2023
FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals
  in Factorized Orthogonal Latent Space
FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space
Shengzhong Liu
Tomoyoshi Kimura
Dongxin Liu
Ruijie Wang
Jinyang Li
Suhas Diggavi
Mani B. Srivastava
Tarek F. Abdelzaher
AI4TS
41
21
0
30 Oct 2023
VeCLIP: Improving CLIP Training via Visual-enriched Captions
VeCLIP: Improving CLIP Training via Visual-enriched Captions
Zhengfeng Lai
Haotian Zhang
Bowen Zhang
Wentao Wu
Haoping Bai
...
Zhe Gan
Jiulong Shan
Chen-Nee Chuah
Yinfei Yang
Meng Cao
CLIP
VLM
24
26
0
11 Oct 2023
Continual Contrastive Spoken Language Understanding
Continual Contrastive Spoken Language Understanding
Umberto Cappellazzo
Enrico Fini
Muqiao Yang
Daniele Falavigna
A. Brutti
Bhiksha Raj
CLL
23
1
0
04 Oct 2023
MUTEX: Learning Unified Policies from Multimodal Task Specifications
MUTEX: Learning Unified Policies from Multimodal Task Specifications
Rutav Shah
Roberto Martín-Martín
Yuke Zhu
OffRL
39
54
0
25 Sep 2023
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object
  Detection
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Yifan Xu
Mengdan Zhang
Xiaoshan Yang
Changsheng Xu
ObjD
13
5
0
30 Aug 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and
  Modality-Aware MoE
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLM
VLM
MoE
37
9
0
23 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
13
116
0
25 Jul 2023
Global and Local Semantic Completion Learning for Vision-Language
  Pre-training
Global and Local Semantic Completion Learning for Vision-Language Pre-training
Rong-Cheng Tu
Yatai Ji
Jie Jiang
Weijie Kong
Chengfei Cai
Wenzhe Zhao
Hongfa Wang
Yujiu Yang
Wei Liu
VLM
10
1
0
12 Jun 2023
Hard Patches Mining for Masked Image Modeling
Hard Patches Mining for Masked Image Modeling
Haochen Wang
Kaiyou Song
Junsong Fan
Yuxi Wang
Jin Xie
Zhaoxiang Zhang
21
57
0
12 Apr 2023
MAGVLT: Masked Generative Vision-and-Language Transformer
MAGVLT: Masked Generative Vision-and-Language Transformer
Sungwoong Kim
DaeJin Jo
Donghoon Lee
Jongmin Kim
VLM
25
11
0
21 Mar 2023
Understanding and Constructing Latent Modality Structures in Multi-modal
  Representation Learning
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
Qian Jiang
Changyou Chen
Han Zhao
Liqun Chen
Q. Ping
S. D. Tran
Yi Xu
Belinda Zeng
Trishul M. Chilimbi
41
36
0
10 Mar 2023
Advancing Radiograph Representation Learning with Masked Record Modeling
Advancing Radiograph Representation Learning with Masked Record Modeling
Hong-Yu Zhou
Chenyu Lian
Lian-cheng Wang
Yizhou Yu
MedIm
12
54
0
30 Jan 2023
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
Liya Wang
A. Tien
22
6
0
28 Jan 2023
12
Next