ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.07636
  4. Cited By
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
v1v2 (latest)

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Computer Vision and Pattern Recognition (CVPR), 2022
14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
    VLMCLIP
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (2496★)

Papers citing "EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"

50 / 579 papers shown
End-to-end Autonomous Driving: Challenges and Frontiers
End-to-end Autonomous Driving: Challenges and FrontiersIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Li Chen
Peng Wu
Kashyap Chitta
Bernhard Jaeger
Andreas Geiger
Guoying Gu
3DV
368
578
0
29 Jun 2023
Towards Language Models That Can See: Computer Vision Through the LENS
  of Natural Language
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
William Berrios
Gautam Mittal
Tristan Thrush
Douwe Kiela
Amanpreet Singh
MLLMVLM
186
65
0
28 Jun 2023
Hybrid Distillation: Connecting Masked Autoencoders with Contrastive
  Learners
Hybrid Distillation: Connecting Masked Autoencoders with Contrastive LearnersInternational Conference on Learning Representations (ICLR), 2023
Bowen Shi
Xiaopeng Zhang
Yaoming Wang
Jin Li
Wenrui Dai
Junni Zou
H. Xiong
Qi Tian
294
9
0
28 Jun 2023
Are aligned neural networks adversarially aligned?
Are aligned neural networks adversarially aligned?Neural Information Processing Systems (NeurIPS), 2023
Nicholas Carlini
Milad Nasr
Christopher A. Choquette-Choo
Matthew Jagielski
Irena Gao
...
Pang Wei Koh
Daphne Ippolito
Katherine Lee
Florian Tramèr
Ludwig Schmidt
AAML
284
312
0
26 Jun 2023
A Survey on Multimodal Large Language Models
A Survey on Multimodal Large Language ModelsNational Science Review (NSR), 2023
Xinglong Mao
Chaoyou Fu
Zhengye Zhang
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLMLRM
455
995
0
23 Jun 2023
Visual Adversarial Examples Jailbreak Aligned Large Language Models
Visual Adversarial Examples Jailbreak Aligned Large Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2023
Xiangyu Qi
Kaixuan Huang
Ashwinee Panda
Peter Henderson
Mengdi Wang
Prateek Mittal
AAML
284
267
0
22 Jun 2023
Pushing the Limits of 3D Shape Generation at Scale
Pushing the Limits of 3D Shape Generation at Scale
Wang Yu
Xuelin Qian
Jingyang Huo
Tiejun Huang
Bo Zhao
Yanwei Fu
271
12
0
20 Jun 2023
Path to Medical AGI: Unify Domain-specific Medical LLMs with the Lowest
  Cost
Path to Medical AGI: Unify Domain-specific Medical LLMs with the Lowest CostmedRxiv (medRxiv), 2023
Juexiao Zhou
Preslav Nakov
Xin Gao
LM&MAAI4CE
226
17
0
19 Jun 2023
Parameter-efficient is not sufficient: Exploring Parameter, Memory, and
  Time Efficient Adapter Tuning for Dense Predictions
Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense PredictionsACM Multimedia (ACM MM), 2023
Dongshuo Yin
Xueting Han
Bin Li
Hao Feng
Jinghua Bai
VPVLM
273
27
0
16 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large
  Vision-Language Models
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Peng Xu
Wenqi Shao
Kaipeng Zhang
Shiyang Feng
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELMMLLM
309
230
0
15 Jun 2023
Transferring Knowledge for Food Image Segmentation using Transformers
  and Convolutions
Transferring Knowledge for Food Image Segmentation using Transformers and Convolutions
Grant Sinha
Krishna Parmar
Hilda Azimi
Chi-en Amy Tai
Yuhao Chen
Alexander Wong
Pengcheng Xi
ViT
98
5
0
15 Jun 2023
MOFI: Learning Image Representations from Noisy Entity Annotated Images
MOFI: Learning Image Representations from Noisy Entity Annotated ImagesInternational Conference on Learning Representations (ICLR), 2023
Wentao Wu
Aleksei Timofeev
Chen Chen
Bowen Zhang
Kun Duan
...
Yantao Zheng
Jonathon Shlens
Xianzhi Du
Zhe Gan
Yinfei Yang
VLM
239
9
0
13 Jun 2023
VISION Datasets: A Benchmark for Vision-based InduStrial InspectiON
VISION Datasets: A Benchmark for Vision-based InduStrial InspectiON
Haoping Bai
Shancong Mou
Tatiana Likhomanenko
R. G. Cinbis
Oncel Tuzel
Ping Huang
Jiulong Shan
Jianjun Shi
Mengsi Cao
VLM
235
36
0
13 Jun 2023
Scalable 3D Captioning with Pretrained Models
Scalable 3D Captioning with Pretrained ModelsNeural Information Processing Systems (NeurIPS), 2023
Tiange Luo
C. Rockwell
Honglak Lee
Justin Johnson
305
213
0
12 Jun 2023
Beyond Detection: Visual Realism Assessment of Deepfakes
Beyond Detection: Visual Realism Assessment of Deepfakes
Luka Dragar
Peter Peer
Vitomir Štruc
Borut Batagelj
178
5
0
09 Jun 2023
Customizing General-Purpose Foundation Models for Medical Report
  Generation
Customizing General-Purpose Foundation Models for Medical Report Generation
Bang-ju Yang
Asif Raza
Yuexian Zou
Tong Zhang
MedIm
173
14
0
09 Jun 2023
Large-scale Dataset Pruning with Dynamic Uncertainty
Large-scale Dataset Pruning with Dynamic Uncertainty
Muyang He
Shuo Yang
Tiejun Huang
Bo Zhao
340
53
0
08 Jun 2023
Fine-Grained Visual Prompting
Fine-Grained Visual PromptingNeural Information Processing Systems (NeurIPS), 2023
Lingfeng Yang
Yueze Wang
Xiang Li
Xinlong Wang
Jian Yang
ObjDVLM
245
98
0
07 Jun 2023
Semantic Segmentation on VSPW Dataset through Contrastive Loss and
  Multi-dataset Training Approach
Semantic Segmentation on VSPW Dataset through Contrastive Loss and Multi-dataset Training Approach
Min Yan
Qianxiong Ning
Qian Wang
105
1
0
06 Jun 2023
Adversarial alignment: Breaking the trade-off between the strength of an
  attack and its relevance to human perception
Adversarial alignment: Breaking the trade-off between the strength of an attack and its relevance to human perception
Drew Linsley
Pinyuan Feng
Thibaut Boissin
A. Ashok
Thomas Fel
Stephanie Olaiya
Thomas Serre
AAML
221
10
0
05 Jun 2023
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video
  Understanding
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hang Zhang
Xin Li
Lidong Bing
MLLM
568
1,485
0
05 Jun 2023
Revisiting the Role of Language Priors in Vision-Language Models
Revisiting the Role of Language Priors in Vision-Language ModelsInternational Conference on Machine Learning (ICML), 2023
Zhiqiu Lin
Xinyue Chen
Deepak Pathak
Pengchuan Zhang
Deva Ramanan
VLM
463
38
0
02 Jun 2023
Consistency-guided Prompt Learning for Vision-Language Models
Consistency-guided Prompt Learning for Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Shuvendu Roy
Ali Etemad
VLMVPVLM
307
90
0
01 Jun 2023
StyleGAN knows Normal, Depth, Albedo, and More
StyleGAN knows Normal, Depth, Albedo, and MoreNeural Information Processing Systems (NeurIPS), 2023
Anand Bhattad
Daniel McKee
Derek Hoiem
David A. Forsyth
GAN
211
48
0
01 Jun 2023
MERT: Acoustic Music Understanding Model with Large-Scale
  Self-supervised Training
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised TrainingInternational Conference on Learning Representations (ICLR), 2023
Yi Zhou
Ruibin Yuan
Ge Zhang
Yi Ma
Xingran Chen
...
Yemin Shi
Wen-Fen Huang
Zili Wang
Yi-Ting Guo
Jie Fu
409
229
0
31 May 2023
AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot
  Manipulation
AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Chuhao Jin
Wenhui Tan
Jiange Yang
Bei Liu
Ruihua Song
Limin Wang
Jianlong Fu
LM&RoLRM
158
27
0
30 May 2023
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating
  Vision-Language Transformers
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language TransformersInternational Conference on Machine Learning (ICML), 2023
Dachuan Shi
Chaofan Tao
Anyi Rao
Zhendong Yang
Chun Yuan
Yuan Liu
VLM
455
38
0
27 May 2023
ViTMatte: Boosting Image Matting with Pretrained Plain Vision
  Transformers
ViTMatte: Boosting Image Matting with Pretrained Plain Vision TransformersInformation Fusion (Inf. Fusion), 2023
J. Yao
Xinggang Wang
Shusheng Yang
Baoyuan Wang
ViT
235
87
0
24 May 2023
Delving Deeper into Data Scaling in Masked Image Modeling
Delving Deeper into Data Scaling in Masked Image Modeling
Cheng Lu
Xiaojie Jin
Qibin Hou
Jun Hao Liew
Mingg-Ming Cheng
Jiashi Feng
173
6
0
24 May 2023
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of ThoughtNeural Information Processing Systems (NeurIPS), 2023
Yao Mu
Qinglong Zhang
Mengkang Hu
Wen Wang
Mingyu Ding
Jun Jin
Sijin Yu
Jifeng Dai
Yu Qiao
Ping Luo
LM&RoLRM
389
348
0
24 May 2023
BLIP-Diffusion: Pre-trained Subject Representation for Controllable
  Text-to-Image Generation and Editing
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and EditingNeural Information Processing Systems (NeurIPS), 2023
Dongxu Li
Junnan Li
Steven C. H. Hoi
393
461
0
24 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
261
112
0
22 May 2023
VLAB: Enhancing Video Language Pre-training by Feature Adapting and
  Blending
VLAB: Enhancing Video Language Pre-training by Feature Adapting and BlendingIEEE transactions on multimedia (IEEE TMM), 2023
Xingjian He
Sihan Chen
Fan Ma
Zhicheng Huang
Xiaojie Jin
Zikang Liu
Dongmei Fu
Yi Yang
Qingbin Liu
Jiashi Feng
VLMCLIP
293
23
0
22 May 2023
What Makes for Good Visual Tokenizers for Large Language Models?
What Makes for Good Visual Tokenizers for Large Language Models?
Guangzhi Wang
Yixiao Ge
Xiaohan Ding
Mohan S. Kankanhalli
Ying Shan
MLLMVLM
287
45
0
20 May 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for
  Vision-Centric Tasks
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric TasksNeural Information Processing Systems (NeurIPS), 2023
Wen Wang
Zhe Chen
Xiaokang Chen
Jiannan Wu
Xizhou Zhu
...
Ping Luo
Tong Lu
Jie Zhou
Yu Qiao
Jifeng Dai
MLLMVLM
302
617
0
18 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLMMLLMObjD
579
153
0
18 May 2023
MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical
  Images and Texts
MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and TextsAsian Conference on Computer Vision (ACCV), 2023
Qiuhui Chen
Xinyue Hu
Zirui Wang
Yi Hong
LM&MAMedIm
173
67
0
18 May 2023
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with
  Foundation Models
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation ModelsNeural Information Processing Systems (NeurIPS), 2023
Zhimin Chen
Longlong Jing
Yingwei Li
Bing Li
367
49
0
15 May 2023
Self-Chained Image-Language Model for Video Localization and Question
  Answering
Self-Chained Image-Language Model for Video Localization and Question AnsweringNeural Information Processing Systems (NeurIPS), 2023
Shoubin Yu
Jaemin Cho
Prateek Yadav
Joey Tianyi Zhou
395
199
0
11 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with
  Instruction Tuning
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction TuningNeural Information Processing Systems (NeurIPS), 2023
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLMVLM
1.4K
2,884
0
11 May 2023
Segment Anything is A Good Pseudo-label Generator for Weakly Supervised
  Semantic Segmentation
Segment Anything is A Good Pseudo-label Generator for Weakly Supervised Semantic Segmentation
Peng-Tao Jiang
Yuqi Yang
VLM
237
36
0
02 May 2023
A Strong and Reproducible Object Detector with Only Public Datasets
A Strong and Reproducible Object Detector with Only Public Datasets
Tianhe Ren
Jianwei Yang
Siyi Liu
Ailing Zeng
Feng Li
Hao Zhang
Hongyang Li
Zhaoyang Zeng
Lei Zhang
ObjD
169
13
0
25 Apr 2023
A Cookbook of Self-Supervised Learning
A Cookbook of Self-Supervised Learning
Randall Balestriero
Mark Ibrahim
Vlad Sobal
Ari S. Morcos
Shashank Shekhar
...
Pierre Fernandez
Amir Bar
Hamed Pirsiavash
Yann LeCun
Micah Goldblum
SyDaFedMLSSL
428
362
0
24 Apr 2023
SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual
  Large Language Model
SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual Large Language ModelmedRxiv (medRxiv), 2023
Juexiao Zhou
Xiao-Zhen He
Liyuan Sun
Jiannan Xu
Preslav Nakov
Yuetan Chu
Longxi Zhou
Xingyu Liao
Bin Zhang
Xin Gao
LM&MA
248
41
0
21 Apr 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large
  Language Models
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLMMLLM
465
2,709
0
20 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Edouard Grave
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLMCLIPSSL
1.1K
5,994
0
14 Apr 2023
On Robustness in Multimodal Learning
On Robustness in Multimodal Learning
Brandon McKinzie
Joseph Cheng
Vaishaal Shankar
Yinfei Yang
Jonathon Shlens
Alexander Toshev
175
4
0
10 Apr 2023
ViT-Calibrator: Decision Stream Calibration for Vision TransformerAAAI Conference on Artificial Intelligence (AAAI), 2023
Lin Chen
Zhijie Jia
Tian Qiu
Lechao Cheng
Jie Lei
Zunlei Feng
Min-Gyoo Song
304
3
0
10 Apr 2023
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions
Jun Chen
Deyao Zhu
Kilichbek Haydarov
Xiang Li
Mohamed Elhoseiny
264
44
0
09 Apr 2023
V3Det: Vast Vocabulary Visual Detection Dataset
V3Det: Vast Vocabulary Visual Detection DatasetIEEE International Conference on Computer Vision (ICCV), 2023
Yuan Liu
Pan Zhang
Tao Chu
Yuhang Cao
Yujie Zhou
Tong Wu
Sijin Yu
Conghui He
Dahua Lin
VLMObjD
317
76
0
07 Apr 2023
Previous
123...101112
Next