ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.05175
  4. Cited By
MVP: Multimodality-guided Visual Pre-training

MVP: Multimodality-guided Visual Pre-training

10 March 2022
Longhui Wei
Lingxi Xie
Wen-gang Zhou
Houqiang Li
Qi Tian
ArXivPDFHTML

Papers citing "MVP: Multimodality-guided Visual Pre-training"

50 / 84 papers shown
Title
Dynamic Relation Inference via Verb Embeddings
Dynamic Relation Inference via Verb Embeddings
Omri Suissa
Muhiim Ali
Ariana Azarbal
Hui Shen
Shekhar Pradhan
41
0
0
17 Mar 2025
Quantum EigenGame for excited state calculation
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
48
1
0
17 Mar 2025
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked
  Autoencoder Learning
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning
Shentong Mo
35
0
0
23 Dec 2024
PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image
  Modeling
PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image Modeling
Zhong-Yu Li
Yunheng Li
Deng-Ping Fan
Ming-Ming Cheng
64
0
0
24 Nov 2024
DiffSTR: Controlled Diffusion Models for Scene Text Removal
DiffSTR: Controlled Diffusion Models for Scene Text Removal
Sanhita Pathak
V. Kaushik
Brejesh Lall
DiffM
25
0
0
29 Oct 2024
Connecting Joint-Embedding Predictive Architecture with Contrastive
  Self-supervised Learning
Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Shentong Mo
Shengbang Tong
32
1
0
25 Oct 2024
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Yin Xie
Kaicheng Yang
Ninghua Yang
Weimo Deng
Xiangzi Dai
...
Yumeng Wang
Xiang An
Yongle Zhao
Ziyong Feng
Jiankang Deng
MLLM
VLM
40
1
0
18 Oct 2024
Denoising with a Joint-Embedding Predictive Architecture
Denoising with a Joint-Embedding Predictive Architecture
Dengsheng Chen
Jie Hu
Xiaoming Wei
Enhua Wu
DiffM
47
2
0
02 Oct 2024
Revisiting Prompt Pretraining of Vision-Language Models
Revisiting Prompt Pretraining of Vision-Language Models
Zhenyuan Chen
Lingfeng Yang
Shuo Chen
Zhaowei Chen
Jiajun Liang
Xiang Li
MLLM
VPVLM
VLM
33
1
0
10 Sep 2024
UNIC: Universal Classification Models via Multi-teacher Distillation
UNIC: Universal Classification Models via Multi-teacher Distillation
Mert Bulent Sariyildiz
Philippe Weinzaepfel
Thomas Lucas
Diane Larlus
Yannis Kalantidis
29
6
0
09 Aug 2024
ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive
  Language-Image Pre-traning Model
ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model
Yifan Chen
Xiaozhen Qiao
Zhe Sun
Xuelong Li
VLM
37
3
0
08 Aug 2024
QPT V2: Masked Image Modeling Advances Visual Scoring
QPT V2: Masked Image Modeling Advances Visual Scoring
Qizhi Xie
Kun Yuan
Yunpeng Qu
Mingda Wu
Ming-hui Sun
Chao Zhou
Jihong Zhu
32
3
0
23 Jul 2024
SemanticMIM: Marring Masked Image Modeling with Semantics Compression
  for General Visual Representation
SemanticMIM: Marring Masked Image Modeling with Semantics Compression for General Visual Representation
Yike Yuan
Huanzhang Dou
Fengjun Guo
Xi Li
29
2
0
15 Jun 2024
OVMR: Open-Vocabulary Recognition with Multi-Modal References
OVMR: Open-Vocabulary Recognition with Multi-Modal References
Zehong Ma
Shiliang Zhang
Longhui Wei
Qi Tian
VLM
36
0
0
07 Jun 2024
Efficient Pretraining Model based on Multi-Scale Local Visual Field
  Feature Reconstruction for PCB CT Image Element Segmentation
Efficient Pretraining Model based on Multi-Scale Local Visual Field Feature Reconstruction for PCB CT Image Element Segmentation
Chen Chen
Kai Qiao
Jie Yang
Jian Chen
Bin Yan
22
0
0
09 May 2024
EVA-X: A Foundation Model for General Chest X-ray Analysis with
  Self-supervised Learning
EVA-X: A Foundation Model for General Chest X-ray Analysis with Self-supervised Learning
Jingfeng Yao
Xinggang Wang
Yuehao Song
Huangxuan Zhao
Jun Ma
Yajie Chen
Wenyu Liu
Bo Wang
ViT
34
5
0
08 May 2024
ViTGaze: Gaze Following with Interaction Features in Vision Transformers
ViTGaze: Gaze Following with Interaction Features in Vision Transformers
Yuehao Song
Xinggang Wang
Jingfeng Yao
Wenyu Liu
Jinglin Zhang
Xiangmin Xu
ViT
44
1
0
19 Mar 2024
Masked Modeling for Self-supervised Representation Learning on Vision
  and Beyond
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun-Xiong Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
29
13
0
31 Dec 2023
Morphing Tokens Draw Strong Masked Image Models
Morphing Tokens Draw Strong Masked Image Models
Taekyung Kim
Byeongho Heo
Dongyoon Han
44
3
0
30 Dec 2023
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual
  Test-Time Adaptation
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Jiaming Liu
Ran Xu
Senqiao Yang
Renrui Zhang
Qizhe Zhang
Zehui Chen
Yandong Guo
Shanghang Zhang
TTA
27
10
0
19 Dec 2023
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the
  Generative Artificial Intelligence (AI) Research Landscape
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
79
46
0
18 Dec 2023
Towards Context-Stable and Visual-Consistent Image Inpainting
Towards Context-Stable and Visual-Consistent Image Inpainting
Yikai Wang
Chenjie Cao
Yanwei Fu
DiffM
37
2
0
08 Dec 2023
A brief introduction to a framework named Multilevel
  Guidance-Exploration Network
A brief introduction to a framework named Multilevel Guidance-Exploration Network
Guoqing Yang
Zhiming Luo
Jianzhe Gao
Yingxin Lai
Kun Yang
Yifan He
Shaozi Li
3DH
24
0
0
07 Dec 2023
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding
Yizhou Wang
Ruiyi Zhang
Haoliang Wang
Uttaran Bhattacharya
Yun Fu
Gang Wu
MLLM
24
10
0
04 Dec 2023
Improve Supervised Representation Learning with Masked Image Modeling
Improve Supervised Representation Learning with Masked Image Modeling
Kaifeng Chen
Daniel M. Salz
Huiwen Chang
Kihyuk Sohn
Dilip Krishnan
Mojtaba Seyedhosseini
SSL
ViT
32
2
0
01 Dec 2023
HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception
HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception
Junkun Yuan
Xinyu Zhang
Hao Zhou
Jian Wang
Zhongwei Qiu
...
Junyu Han
Errui Ding
Lanfen Lin
Fei Wu
Jingdong Wang
28
18
0
31 Oct 2023
Decoupling Common and Unique Representations for Multimodal
  Self-supervised Learning
Decoupling Common and Unique Representations for Multimodal Self-supervised Learning
Yi Wang
C. Albrecht
Nassim Ait Ali Braham
Chenying Liu
Zhitong Xiong
Xiaoxiang Zhu
SSL
19
16
0
11 Sep 2023
Empowering Low-Light Image Enhancer through Customized Learnable Priors
Empowering Low-Light Image Enhancer through Customized Learnable Priors
Naishan Zheng
Man Zhou
Yanmeng Dong
Xiangyu Rui
Jie Huang
Chongyi Li
Fengmei Zhao
31
28
0
05 Sep 2023
RevColV2: Exploring Disentangled Representations in Masked Image
  Modeling
RevColV2: Exploring Disentangled Representations in Masked Image Modeling
Qi Han
Yuxuan Cai
Xiangyu Zhang
25
7
0
02 Sep 2023
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
18
27
0
02 Sep 2023
Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from
  Stable Diffusion
Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from Stable Diffusion
Zixuan Ni
Longhui Wei
Jiacheng Li
Siliang Tang
Yueting Zhuang
Qi Tian
DiffM
23
21
0
02 Aug 2023
VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive
  Representation
VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive Representation
Zekun Qi
Muzhou Yu
Runpei Dong
Kaisheng Ma
3DPC
18
11
0
28 Jul 2023
CLIP-KD: An Empirical Study of CLIP Model Distillation
CLIP-KD: An Empirical Study of CLIP Model Distillation
Chuanguang Yang
Zhulin An
Libo Huang
Junyu Bi
Xinqiang Yu
Hansheng Yang
Boyu Diao
Yongjun Xu
VLM
21
27
0
24 Jul 2023
Mining Conditional Part Semantics with Occluded Extrapolation for Human-Object Interaction Detection
Guangzhi Wang
Yangyang Guo
Mohan S. Kankanhalli
24
0
0
19 Jul 2023
MOCA: Self-supervised Representation Learning by Predicting Masked
  Online Codebook Assignments
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments
Spyros Gidaris
Andrei Bursuc
Oriane Siméoni
Antonín Vobecký
N. Komodakis
Matthieu Cord
Patrick Pérez
SSL
ViT
16
3
0
18 Jul 2023
Hybrid Distillation: Connecting Masked Autoencoders with Contrastive
  Learners
Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners
Bowen Shi
Xiaopeng Zhang
Yaoming Wang
Jin Li
Wenrui Dai
Junni Zou
H. Xiong
Qi Tian
32
4
0
28 Jun 2023
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
Lorenzo Baraldi
Roberto Amoroso
Marcella Cornia
Lorenzo Baraldi
Andrea Pilzer
Rita Cucchiara
36
2
0
12 Jun 2023
SegViTv2: Exploring Efficient and Continual Semantic Segmentation with
  Plain Vision Transformers
SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers
Bowen Zhang
Liyang Liu
Minh Hieu Phan
Zhi Tian
Chunhua Shen
Yifan Liu
ViT
24
28
0
09 Jun 2023
Delving Deeper into Data Scaling in Masked Image Modeling
Delving Deeper into Data Scaling in Masked Image Modeling
Cheng Lu
Xiaojie Jin
Qibin Hou
Jun Hao Liew
Mingg-Ming Cheng
Jiashi Feng
27
4
0
24 May 2023
What Makes for Good Visual Tokenizers for Large Language Models?
What Makes for Good Visual Tokenizers for Large Language Models?
Guangzhi Wang
Yixiao Ge
Xiaohan Ding
Mohan S. Kankanhalli
Ying Shan
MLLM
VLM
19
38
0
20 May 2023
Embrace Limited and Imperfect Training Datasets: Opportunities and
  Challenges in Plant Disease Recognition Using Deep Learning
Embrace Limited and Imperfect Training Datasets: Opportunities and Challenges in Plant Disease Recognition Using Deep Learning
Mingle Xu
H. Kim
Jucheng Yang
A. Fuentes
Yao Meng
Sook Yoon
Taehyun Kim
D. Park
15
17
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
16
114
0
18 May 2023
Mask to reconstruct: Cooperative Semantics Completion for Video-text
  Retrieval
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval
Han Fang
Zhifei Yang
Xianghao Zang
Chao Ban
Hao Sun
VGen
24
2
0
13 May 2023
Continual Vision-Language Representation Learning with Off-Diagonal
  Information
Continual Vision-Language Representation Learning with Off-Diagonal Information
Zixuan Ni
Longhui Wei
Siliang Tang
Yueting Zhuang
Qi Tian
VLM
CLL
29
25
0
11 May 2023
Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders
Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders
Heng Pan
Chenyang Liu
Wenxiao Wang
Liejie Yuan
Hongfa Wang
Zhifeng Li
W. Liu
VLM
20
3
0
25 Apr 2023
Diffusion Models as Masked Autoencoders
Diffusion Models as Masked Autoencoders
Chen Wei
K. Mangalam
Po-Yao (Bernie) Huang
Yanghao Li
Haoqi Fan
Hu Xu
Huiyu Wang
Cihang Xie
Alan Yuille
Christoph Feichtenhofer
DiffM
SyDa
31
47
0
06 Apr 2023
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
Ximeng Sun
Pengchuan Zhang
Peizhao Zhang
Hardik Shah
Kate Saenko
Xide Xia
VLM
10
20
0
31 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
30
154
0
28 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
EVA-02: A Visual Representation for Neon Genesis
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
38
258
0
20 Mar 2023
Gradient-Regulated Meta-Prompt Learning for Generalizable
  Vision-Language Models
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models
Juncheng Li
Minghe Gao
Longhui Wei
Siliang Tang
Wenqiao Zhang
Meng Li
Wei Ji
Qi Tian
Tat-Seng Chua
Yueting Zhuang
VLM
VPVLM
27
18
0
12 Mar 2023
12
Next