ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.05175
  4. Cited By
MVP: Multimodality-guided Visual Pre-training

MVP: Multimodality-guided Visual Pre-training

10 March 2022
Longhui Wei
Lingxi Xie
Wen-gang Zhou
Houqiang Li
Qi Tian
ArXivPDFHTML

Papers citing "MVP: Multimodality-guided Visual Pre-training"

34 / 84 papers shown
Title
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature
  Mimicking
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking
Peng Gao
Renrui Zhang
Rongyao Fang
Ziyi Lin
Hongyang Li
Hongsheng Li
Qiao Yu
19
18
0
09 Mar 2023
Efficient Masked Autoencoders with Self-Consistency
Efficient Masked Autoencoders with Self-Consistency
Zhaowen Li
Yousong Zhu
Zhiyang Chen
Wei Li
Chaoyang Zhao
Rui Zhao
Ming Tang
Jinqiao Wang
45
2
0
28 Feb 2023
Remote Sensing Scene Classification with Masked Image Modeling (MIM)
Remote Sensing Scene Classification with Masked Image Modeling (MIM)
Liya Wang
A. Tien
30
3
0
28 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Xiao Wang
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
24
199
0
20 Feb 2023
MOMA:Distill from Self-Supervised Teachers
MOMA:Distill from Self-Supervised Teachers
Yuan Yao
Nandakishor Desai
M. Palaniswami
26
2
0
04 Feb 2023
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
Liya Wang
A. Tien
35
7
0
28 Jan 2023
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Floris Weers
Vaishaal Shankar
Angelos Katharopoulos
Yinfei Yang
Tom Gunter
CLIP
18
4
0
19 Jan 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
RILS: Masked Visual Reconstruction in Language Semantic Space
Shusheng Yang
Yixiao Ge
Kun Yi
Dian Li
Ying Shan
Xiaohu Qie
Xinggang Wang
CLIP
32
11
0
17 Jan 2023
Toward Building General Foundation Models for Language, Vision, and
  Vision-Language Understanding Tasks
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
Xinsong Zhang
Yan Zeng
Jipeng Zhang
Hang Li
VLM
AI4CE
LRM
14
17
0
12 Jan 2023
Disjoint Masking with Joint Distillation for Efficient Masked Image
  Modeling
Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling
Xin Ma
Chang-Shu Liu
Chunyu Xie
Long Ye
Yafeng Deng
Xiang Ji
20
9
0
31 Dec 2022
MAViL: Masked Audio-Video Learners
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
19
50
0
15 Dec 2022
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1
  Accuracy with ViT-B and ViT-L on ImageNet
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Shuyang Gu
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
22
35
0
12 Dec 2022
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token
  Migration
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token Migration
Yunjie Tian
Lingxi Xie
Jihao Qiu
Jianbin Jiao
Yaowei Wang
Qi Tian
Qixiang Ye
ViT
24
6
0
23 Nov 2022
CAE v2: Context Autoencoder with CLIP Target
CAE v2: Context Autoencoder with CLIP Target
Xinyu Zhang
Jiahui Chen
Junkun Yuan
Qiang Chen
Jian Wang
...
Jimin Pi
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
VLM
CLIP
34
24
0
17 Nov 2022
MAGE: MAsked Generative Encoder to Unify Representation Learning and
  Image Synthesis
MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
Tianhong Li
Huiwen Chang
Shlok Kumar Mishra
Han Zhang
Dina Katabi
Dilip Krishnan
19
151
0
16 Nov 2022
Stare at What You See: Masked Image Modeling without Reconstruction
Stare at What You See: Masked Image Modeling without Reconstruction
Hongwei Xue
Peng Gao
Hongyang Li
Yu Qiao
Hao Sun
Houqiang Li
Jiebo Luo
25
31
0
16 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
54
673
0
14 Nov 2022
Towards Sustainable Self-supervised Learning
Towards Sustainable Self-supervised Learning
Shanghua Gao
Pan Zhou
Mingg-Ming Cheng
Shuicheng Yan
CLL
25
7
0
20 Oct 2022
A Unified View of Masked Image Modeling
A Unified View of Masked Image Modeling
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
VLM
52
35
0
19 Oct 2022
Exploring Target Representations for Masked Autoencoders
Exploring Target Representations for Masked Autoencoders
Xingbin Liu
Jinghao Zhou
Tao Kong
Xianming Lin
Rongrong Ji
79
49
0
08 Sep 2022
An Empirical Study of End-to-End Video-Language Transformers with Masked
  Visual Modeling
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
VLM
19
63
0
04 Sep 2022
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image
  Pretraining
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
...
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
VLM
32
157
0
25 Aug 2022
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
16
303
0
12 Aug 2022
MILAN: Masked Image Pretraining on Language Assisted Representation
MILAN: Masked Image Pretraining on Language Assisted Representation
Zejiang Hou
Fei Sun
Yen-kuang Chen
Yuan Xie
S. Kung
ViT
26
67
0
11 Aug 2022
Fine-Grained Semantically Aligned Vision-Language Pre-Training
Fine-Grained Semantically Aligned Vision-Language Pre-Training
Juncheng Li
Xin He
Longhui Wei
Long Qian
Linchao Zhu
Lingxi Xie
Yueting Zhuang
Qi Tian
Siliang Tang
VLM
24
78
0
04 Aug 2022
Contrastive Masked Autoencoders are Stronger Vision Learners
Contrastive Masked Autoencoders are Stronger Vision Learners
Zhicheng Huang
Xiaojie Jin
Cheng Lu
Qibin Hou
Mingg-Ming Cheng
Dongmei Fu
Xiaohui Shen
Jiashi Feng
26
147
0
27 Jul 2022
Masked Image Modeling with Denoising Contrast
Masked Image Modeling with Denoising Contrast
Kun Yi
Yixiao Ge
Xiaotong Li
Shusheng Yang
Dian Li
Jianping Wu
Ying Shan
Xiaohu Qie
VLM
30
50
0
19 May 2022
Context Autoencoder for Self-Supervised Representation Learning
Context Autoencoder for Self-Supervised Representation Learning
Xiaokang Chen
Mingyu Ding
Xiaodi Wang
Ying Xin
Shentong Mo
Yunhao Wang
Shumin Han
Ping Luo
Gang Zeng
Jingdong Wang
SSL
27
386
0
07 Feb 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,412
0
11 Nov 2021
Exploring the Diversity and Invariance in Yourself for Visual
  Pre-Training Task
Exploring the Diversity and Invariance in Yourself for Visual Pre-Training Task
Longhui Wei
Lingxi Xie
Wen-gang Zhou
Houqiang Li
Qi Tian
SSL
19
3
0
01 Jun 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
298
5,761
0
29 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Improved Baselines with Momentum Contrastive Learning
Improved Baselines with Momentum Contrastive Learning
Xinlei Chen
Haoqi Fan
Ross B. Girshick
Kaiming He
SSL
238
3,359
0
09 Mar 2020
Semantic Understanding of Scenes through the ADE20K Dataset
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
249
1,821
0
18 Aug 2016
Previous
12