v1v2 (latest)

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Computer Vision and Pattern Recognition (CVPR), 2022

14 November 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (2496★)

Papers citing "EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"

50 / 579 papers shown

Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark

...

609

01 Jul 2025

HAWAII: Hierarchical Visual Knowledge Transfer for Efficient Vision-Language Models

180

23 Jun 2025

LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models

218

20 Jun 2025

DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs

An-Zi Yen

331

13 Jun 2025

Beyond Overconfidence: Foundation Models Redefine Calibration in Deep Neural Networks

260

11 Jun 2025

Leveraging Depth and Language for Open-Vocabulary Domain-Generalized Semantic Segmentation

298

11 Jun 2025

When Kernels Multiply, Clusters Unify: Fusing Embeddings with the Kronecker Product

Youqi Wu

Jingwei Zhang

Farzan Farnia

233

10 Jun 2025

ARGUS: Hallucination and Omission Evaluation in Video-LLMs

291

09 Jun 2025

The State-of-the-Art in Lifelog Retrieval: A Review of Progress at the ACM Lifelog Search Challenge Workshop 2022-24

...

163

07 Jun 2025

Aligning Multimodal Representations through an Information Bottleneck

Antonio Almudévar

José Miguel Hernández-Lobato

296

05 Jun 2025

Fighting Fire with Fire (F3): A Training-free and Efficient Visual Adversarial Example Purification Method in LVLMs

342

01 Jun 2025

The Security Threat of Compressed Projectors in Large Vision-Language Models

149

31 May 2025

S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual RepresentationComputer Vision and Pattern Recognition (CVPR), 2025

...

297

30 May 2025

Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces

...

205

30 May 2025

Argus: Vision-Centric Reasoning with Grounded Chain-of-ThoughtComputer Vision and Pattern Recognition (CVPR), 2025

335

29 May 2025

Mitigating Hallucination in Large Vision-Language Models via Adaptive Attention Calibration

474

27 May 2025

The Missing Point in Vision Transformers for Universal Image Segmentation

Konstantinos N. Plataniotis

Arash Mohammadi

ViT ISeg

337

26 May 2025

FastCAV: Efficient Computation of Concept Activation Vectors for Explaining Deep Neural Networks

189

23 May 2025

Semantic segmentation with reward

525

23 May 2025

DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval

...

464

23 May 2025

NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment

...

387

22 May 2025

Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval

Siting Li

Xiang Gao

Simon Shaolei Du

459

21 May 2025

Exploring The Visual Feature Space for Multimodal Neural Decoding

Weihao Xia

Steven Chacko

289

21 May 2025

Know When to Abstain: Optimal Selective Classification with Likelihood Ratios

Alvin Heng

Harold Soh

364

21 May 2025

Vision-Language Modeling Meets Remote Sensing: Models, Datasets and PerspectivesIEEE Geoscience and Remote Sensing Magazine (GRSM), 2025

398

20 May 2025

Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding

396

19 May 2025

X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP

486

08 May 2025

Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions

...

382

04 May 2025

DEEMO: De-identity Multimodal Emotion Recognition and Reasoning

314

28 Apr 2025

MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane RepresentationInternational Conference on Learning Representations (ICLR), 2025

248

20 Apr 2025

Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation

416

17 Apr 2025

Perception Encoder: The best visual embeddings are not at the output of the network

Daniel Bolya

Po-Yao (Bernie) Huang

...

Christoph Feichtenhofer

ObjD VOS

678

118

17 Apr 2025

Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis

197

16 Apr 2025

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

237

14 Apr 2025

CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates

265

14 Apr 2025

Enhancing Multi-task Learning Capability of Medical Generalist Foundation Model via Image-centric Multi-annotation Data

294

14 Apr 2025

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations

Cheng-Yu Hsieh

Pavan Kumar Anasosalu Vasu

966

11 Apr 2025

VC-LLM: Automated Advertisement Video Creation from Raw Footage using Multi-modal LLMs

183

08 Apr 2025

REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding

277

07 Apr 2025

Rip Current Segmentation: A Novel Benchmark and YOLOv8 Baseline Results

388

03 Apr 2025

Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite Imagery

252

03 Apr 2025

Enhanced Cross-modal 3D Retrieval via Tri-modal Reconstruction

Junlong Ren

Hao Wang

319

02 Apr 2025

RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and SafetyComputer Vision and Pattern Recognition (CVPR), 2025

381

01 Apr 2025

Evaluating Text-to-Image and Text-to-Video Synthesis with a Conditional Fréchet Distance

339

27 Mar 2025

382

26 Mar 2025

Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders

285

25 Mar 2025

Scaling Vision Pre-Training to 4K ResolutionComputer Vision and Pattern Recognition (CVPR), 2025

...

906

25 Mar 2025

Seeing What Matters: Empowering CLIP with Patch Generation-to-SelectionComputer Vision and Pattern Recognition (CVPR), 2025

311

21 Mar 2025

REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models

338

20 Mar 2025

Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance

952

20 Mar 2025