v1v2 (latest)

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Computer Vision and Pattern Recognition (CVPR), 2022

14 November 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (2496★)

Papers citing "EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"

50 / 579 papers shown

Visual Position Prompt for MLLM based Visual Grounding

529

19 Mar 2025

Exploring Disparity-Accuracy Trade-offs in Face Recognition Systems: The Role of Datasets, Architectures, and Loss FunctionsInternational Conference on Web and Social Media (ICWSM), 2025

142

18 Mar 2025

CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model

287

13 Mar 2025

Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization

Zongshang Pang

Mayu Otani

Yuta Nakashima

335

12 Mar 2025

Multi-Modal Foundation Models for Computational Pathology: A Survey

444

12 Mar 2025

Scale-Aware Pre-Training for Human-Centric Visual Perception: Enabling Lightweight and Generalizable Models

265

11 Mar 2025

Similarity-Guided Layer-Adaptive Vision Transformer for UAV TrackingComputer Vision and Pattern Recognition (CVPR), 2025

211

09 Mar 2025

StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition

927

08 Mar 2025

Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training

446

04 Mar 2025

Generalizable Prompt Learning of CLIP: A Brief Overview

1.4K

03 Mar 2025

Re-Imagining Multimodal Instruction Tuning: A Representation ViewInternational Conference on Learning Representations (ICLR), 2025

...

1.1K

02 Mar 2025

MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual RepresentationsComputer Vision and Pattern Recognition (CVPR), 2025

539

02 Mar 2025

Streaming Video Question-Answering with In-context Video KV-Cache RetrievalInternational Conference on Learning Representations (ICLR), 2025

209

01 Mar 2025

Towards High-performance Spiking Transformers from ANN to SNN ConversionACM Multimedia (MM), 2024

419

28 Feb 2025

Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025

Zhaoyi Liu

Huan Zhang

AAML

703

25 Feb 2025

UniGS: Unified Language-Image-3D Pretraining with Gaussian SplattingInternational Conference on Learning Representations (ICLR), 2025

Michael C. Kampffmeyer

Hang Xu

Xiaodan Liang

3DGS

340

25 Feb 2025

Pretrained Image-Text Models are Secretly Video CaptionersNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

485

20 Feb 2025

VAQUUM: Are Vague Quantifiers Grounded in Visual Data?Annual Meeting of the Association for Computational Linguistics (ACL), 2025

Hugh Mee Wong

Rick Nouwen

Albert Gatt

466

17 Feb 2025

Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable DecisionsInternational Conference on Web and Social Media (ICWSM), 2025

Ming Shan Hee

Roy Ka-wei Lee

VLM

243

16 Feb 2025

I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

318

12 Feb 2025

HCMRM: A High-Consistency Multimodal Relevance Model for Search AdsThe Web Conference (WWW), 2025

221

09 Feb 2025

UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic SegmentationInternational Conference on Learning Representations (ICLR), 2025

432

04 Feb 2025

Towards Robust Multimodal Large Language Models Against Jailbreak Attacks

338

02 Feb 2025

Vision-Language Model Selection and Reuse for Downstream Adaptation

362

30 Jan 2025

Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink

245

28 Jan 2025

Rethinking Encoder-Decoder Flow Through Shared StructuresIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

248

24 Jan 2025

ReasVQA: Advancing VideoQA with Imperfect Reasoning ProcessNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

265

23 Jan 2025

Patent Figure Classification using Large Vision-language ModelsEuropean Conference on Information Retrieval (ECIR), 2025

Sushil Awale

Eric Müller-Budack

Ralph Ewerth

210

22 Jan 2025

TeD-Loc: Text Distillation for Weakly Supervised Object Localization

427

22 Jan 2025

Sublinear Variational Optimization of Gaussian Mixture Models with Millions to Billions of Parameters

307

21 Jan 2025

Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection

625

20 Jan 2025

A Comprehensive Survey of Foundation Models in MedicineIEEE Reviews in Biomedical Engineering (RBME), 2024

767

17 Jan 2025

EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision

268

14 Jan 2025

Concept Matching with Agent for Out-of-Distribution Detection

320

08 Jan 2025

ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers

179

31 Dec 2024

A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in MedicineInformation Fusion (Inf. Fusion), 2024

450

31 Dec 2024

Towards Visual Grounding: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

969

28 Dec 2024

When SAM2 Meets Video Shadow and Mirror Detection

Leiping Jie

VLM

216

26 Dec 2024

Retention Score: Quantifying Jailbreak Risks for Vision Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024

188

23 Dec 2024

Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-CaptioningAAAI Conference on Artificial Intelligence (AAAI), 2024

298

18 Dec 2024

GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial UnderstandingComputer Vision and Pattern Recognition (CVPR), 2024

485

17 Dec 2024

DINO-Foresight: Looking into the Future with DINO

613

16 Dec 2024

Neptune: The Long Orbit to Benchmarking Long Video Understanding

...

445

12 Dec 2024

Mixture of Physical Priors Adapter for Parameter-Efficient Fine-Tuning

255

03 Dec 2024

HandOS: 3D Hand Reconstruction in One StageComputer Vision and Pattern Recognition (CVPR), 2024

492

02 Dec 2024

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024

393

02 Dec 2024

SEAL: Semantic Attention Learning for Long Video RepresentationComputer Vision and Pattern Recognition (CVPR), 2024

628

02 Dec 2024

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

466

28 Nov 2024

NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects?

322

26 Nov 2024

Edge Weight Prediction For Category-Agnostic Pose Estimation

Or Hirschorn

S. Avidan

271

25 Nov 2024