v1v2 (latest)

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Computer Vision and Pattern Recognition (CVPR), 2022

14 November 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (2496★)

Papers citing "EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"

50 / 579 papers shown

M4-BLIP: Advancing Multi-Modal Media Manipulation Detection through Face-Enhanced Local Analysis

127

01 Dec 2025

DEAL-300K: Diffusion-based Editing Area Localization with a 300K-Scale Dataset and Frequency-Prompted Baseline

28 Nov 2025

Frequency-Aware Token Reduction for Efficient Vision Transformer

188

26 Nov 2025

MuM: Multi-View Masked Image Modeling for 3D Vision

198

21 Nov 2025

NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization

128

11 Nov 2025

Foundation Models for Trajectory Planning in Autonomous Driving: A Review of Progress and Open Challenges

31 Oct 2025

_1

: A Boundless Large Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning

...

168

28 Oct 2025

One-Timestep is Enough: Achieving High-performance ANN-to-SNN Conversion via Scale-and-Fire Neurons

109

27 Oct 2025

HyperET: Efficient Training in Hyperbolic Space for Multi-modal Large Language Models

233

23 Oct 2025

Towards Single-Source Domain Generalized Object Detection via Causal Visual Prompts

123

22 Oct 2025

ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder

227

21 Oct 2025

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

156

16 Oct 2025

Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking

Mitchell Keren Taraday

Shahaf Wagner

Chaim Baskin

VLM

110

08 Oct 2025

Emergent AI Surveillance: Overlearned Person Re-Identification and Its Mitigation in Law Enforcement Context

An Thi Nguyen

Radina Stoykova

Eric Arazo

124

07 Oct 2025

Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs

...

174

02 Oct 2025

NeMo: Needle in a Montage for Video-Language Understanding

...

169

29 Sep 2025

SVAC: Scaling Is All You Need For Referring Video Object Segmentation

149

28 Sep 2025

MMPB: It's Time for Multi-Modal Personalization

190

26 Sep 2025

Advancing Metallic Surface Defect Detection via Anomaly-Guided Pretraining on a Large Industrial Dataset

232

23 Sep 2025

MRN: Harnessing 2D Vision Foundation Models for Diagnosing Parkinson's Disease with Limited 3D MR Data

112

22 Sep 2025

SCENEFORGE: Enhancing 3D-text alignment with Structured Scene Compositions

Cristian Sbrolli

Matteo Matteucci

180

19 Sep 2025

RangeSAM: On the Potential of Visual Foundation Models for Range-View represented LiDAR segmentation

285

19 Sep 2025

An Empirical Analysis of VLM-based OOD Detection: Mechanisms, Advantages, and Sensitivity

180

16 Sep 2025

ER-LoRA: Effective-Rank Guided Adaptation for Weather-Generalized Depth Estimation

252

31 Aug 2025

Category-level Text-to-Image Retrieval Improved: Bridging the Domain Gap with Diffusion Models and Vision Encoders

100

29 Aug 2025

MobileCLIP2: Improving Multi-Modal Reinforced Training

Fartash Faghri

Pavan Kumar Anasosalu Vasu

432

28 Aug 2025

Multimodal LLMs See Sentiment

135

23 Aug 2025

From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations

122

21 Aug 2025

Temporal Grounding as a Learning Signal for Referring Video Object Segmentation

Siwon Kim

...

206

16 Aug 2025

Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?

Xuezheng Chen

Zhengbo Zou

MLLM

14 Aug 2025

Failures to Surface Harmful Contents in Video Large Language Models

161

14 Aug 2025

DoorDet: Semi-Automated Multi-Class Door Detection Dataset via Object Detection and Large Language Models

109

11 Aug 2025

Membership Inference Attacks with False Discovery Rate Control

133

09 Aug 2025

CoCAViT: Compact Vision Transformer with Robust Global Coordination

112

07 Aug 2025

A Survey on Video Temporal Grounding with Multimodal Large Language ModelIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

145

07 Aug 2025

Multi-Granularity Feature Calibration via VFM for Domain Generalized Semantic Segmentation

Xinhui Li

Xiaojie Guo

140

05 Aug 2025

Adversarial Attention Perturbations for Large Object Detection Transformers

135

05 Aug 2025

A Multi-Agent System for Complex Reasoning in Radiology Visual Question Answering

199

04 Aug 2025

Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting

200

04 Aug 2025

Set Pivot Learning: Redefining Generalized Segmentation with Vision Foundation Models

123

03 Aug 2025

Rein++: Efficient Generalization and Adaptation for Semantic Segmentation with Vision Foundation Models

175

03 Aug 2025

Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models

135

01 Aug 2025

ART: Adaptive Relation Tuning for Generalized Relation Prediction

134

31 Jul 2025

DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception

Pei Deng

Wenqian Zhou

Hanlin Wu

115

30 Jul 2025

HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models

167

30 Jul 2025

TESPEC: Temporally-Enhanced Self-Supervised Pretraining for Event Cameras

152

29 Jul 2025

The Early Bird Identifies the Worm: You Can't Beat a Head Start in Long-Term Body Re-ID (ECHO-BID)

Thomas M. Metz

Matthew Q. Hill

A. O’toole

206

23 Jul 2025

Latent Denoising Makes Good Visual Tokenizers

192

21 Jul 2025

ChestGPT: Integrating Large Language Models and Vision Transformers for Disease Detection and Localization in Chest X-Rays

151

04 Jul 2025

Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark

...

596

01 Jul 2025