v1v2 (latest)

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Computer Vision and Pattern Recognition (CVPR), 2022

14 November 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (2496★)

Papers citing "EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"

50 / 579 papers shown

End-to-end Autonomous Driving: Challenges and FrontiersIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Li Chen

368

578

29 Jun 2023

Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language

William Berrios

Gautam Mittal

Tristan Thrush

Douwe Kiela

Amanpreet Singh

MLLM VLM

186

28 Jun 2023

Hybrid Distillation: Connecting Masked Autoencoders with Contrastive LearnersInternational Conference on Learning Representations (ICLR), 2023

294

28 Jun 2023

Are aligned neural networks adversarially aligned?Neural Information Processing Systems (NeurIPS), 2023

Nicholas Carlini

Milad Nasr

Christopher A. Choquette-Choo

Matthew Jagielski

Irena Gao

...

Pang Wei Koh

284

312

26 Jun 2023

A Survey on Multimodal Large Language ModelsNational Science Review (NSR), 2023

Enhong Chen

455

995

23 Jun 2023

Visual Adversarial Examples Jailbreak Aligned Large Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2023

Kaixuan Huang

Mengdi Wang

284

267

22 Jun 2023

Pushing the Limits of 3D Shape Generation at Scale

Xuelin Qian

271

20 Jun 2023

Path to Medical AGI: Unify Domain-specific Medical LLMs with the Lowest CostmedRxiv (medRxiv), 2023

Juexiao Zhou

Preslav Nakov

Xin Gao

LM&MA AI4CE

226

19 Jun 2023

Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense PredictionsACM Multimedia (ACM MM), 2023

273

16 Jun 2023

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Yu Qiao

Ping Luo

ELM MLLM

309

230

15 Jun 2023

Transferring Knowledge for Food Image Segmentation using Transformers and Convolutions

Yuhao Chen

Pengcheng Xi

15 Jun 2023

MOFI: Learning Image Representations from Noisy Entity Annotated ImagesInternational Conference on Learning Representations (ICLR), 2023

Chen Chen

...

Xianzhi Du

239

13 Jun 2023

VISION Datasets: A Benchmark for Vision-based InduStrial InspectiON

235

13 Jun 2023

Scalable 3D Captioning with Pretrained ModelsNeural Information Processing Systems (NeurIPS), 2023

305

213

12 Jun 2023

Beyond Detection: Visual Realism Assessment of Deepfakes

Luka Dragar

Peter Peer

Vitomir Štruc

Borut Batagelj

178

09 Jun 2023

Customizing General-Purpose Foundation Models for Medical Report Generation

Tong Zhang

173

09 Jun 2023

Large-scale Dataset Pruning with Dynamic Uncertainty

340

08 Jun 2023

Fine-Grained Visual PromptingNeural Information Processing Systems (NeurIPS), 2023

Lingfeng Yang

Yueze Wang

Xiang Li

Xinlong Wang

Jian Yang

ObjD VLM

245

07 Jun 2023

Semantic Segmentation on VSPW Dataset through Contrastive Loss and Multi-dataset Training Approach

Min Yan

Qianxiong Ning

Qian Wang

105

06 Jun 2023

Adversarial alignment: Breaking the trade-off between the strength of an attack and its relevance to human perception

221

05 Jun 2023

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Hang Zhang

Xin Li

Lidong Bing

MLLM

568

1,485

05 Jun 2023

Revisiting the Role of Language Priors in Vision-Language ModelsInternational Conference on Machine Learning (ICML), 2023

463

02 Jun 2023

Consistency-guided Prompt Learning for Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2023

Shuvendu Roy

Ali Etemad

VLM VPVLM

307

01 Jun 2023

StyleGAN knows Normal, Depth, Albedo, and MoreNeural Information Processing Systems (NeurIPS), 2023

Anand Bhattad

211

01 Jun 2023

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised TrainingInternational Conference on Learning Representations (ICLR), 2023

Ge Zhang

...

409

229

31 May 2023

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation

158

30 May 2023

CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language TransformersInternational Conference on Machine Learning (ICML), 2023

455

27 May 2023

ViTMatte: Boosting Image Matting with Pretrained Plain Vision TransformersInformation Fusion (Inf. Fusion), 2023

Shusheng Yang

235

24 May 2023

Delving Deeper into Data Scaling in Masked Image Modeling

173

24 May 2023

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of ThoughtNeural Information Processing Systems (NeurIPS), 2023

Mingyu Ding

Yu Qiao

Ping Luo

LM&Ro LRM

389

348

24 May 2023

BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and EditingNeural Information Processing Systems (NeurIPS), 2023

Dongxu Li

Junnan Li

Steven C. H. Hoi

393

461

24 May 2023

VideoLLM: Modeling Video Sequence with Large Language Models

Yifei Huang

...

Yi Wang

Yu Qiao

261

112

22 May 2023

VLAB: Enhancing Video Language Pre-training by Feature Adapting and BlendingIEEE transactions on multimedia (IEEE TMM), 2023

Yi Yang

293

22 May 2023

What Makes for Good Visual Tokenizers for Large Language Models?

Ying Shan

287

20 May 2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric TasksNeural Information Processing Systems (NeurIPS), 2023

...

Yu Qiao

302

617

18 May 2023

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Peng Wang

Shijie Wang

Junyang Lin

Shuai Bai

Xiaohuan Zhou

Jingren Zhou

Xinggang Wang

Chang Zhou

VLM MLLM ObjD

579

153

18 May 2023

MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and TextsAsian Conference on Computer Vision (ACCV), 2023

173

18 May 2023

Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation ModelsNeural Information Processing Systems (NeurIPS), 2023

Zhimin Chen

Longlong Jing

Yingwei Li

Bing Li

367

15 May 2023

Self-Chained Image-Language Model for Video Localization and Question AnsweringNeural Information Processing Systems (NeurIPS), 2023

395

199

11 May 2023

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction TuningNeural Information Processing Systems (NeurIPS), 2023

1.4K

2,884

11 May 2023

Segment Anything is A Good Pseudo-label Generator for Weakly Supervised Semantic Segmentation

Peng-Tao Jiang

Yuqi Yang

VLM

237

02 May 2023

A Strong and Reproducible Object Detector with Only Public Datasets

Jianwei Yang

Ailing Zeng

Lei Zhang

169

25 Apr 2023

A Cookbook of Self-Supervised Learning

...

Pierre Fernandez

428

362

24 Apr 2023

SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual Large Language ModelmedRxiv (medRxiv), 2023

Xin Gao

248

21 Apr 2023

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023

465

2,709

20 Apr 2023

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab

...

1.1K

5,994

14 Apr 2023

On Robustness in Multimodal Learning

175

10 Apr 2023

ViT-Calibrator: Decision Stream Calibration for Vision TransformerAAAI Conference on Artificial Intelligence (AAAI), 2023

304

10 Apr 2023

Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions

264

09 Apr 2023

V3Det: Vast Vocabulary Visual Detection DatasetIEEE International Conference on Computer Vision (ICCV), 2023

Conghui He

317

07 Apr 2023