v1v2 (latest)

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Computer Vision and Pattern Recognition (CVPR), 2022

14 November 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (2496★)

Papers citing "EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"

50 / 579 papers shown

EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning

194

22 Apr 2024

Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers

Georgios Pantazopoulos

204

21 Apr 2024

Dynamic in Static: Hybrid Visual Correspondence for Self-Supervised Video Object Segmentation

Yazhou Yao

246

21 Apr 2024

BLINK: Multimodal Large Language Models Can See but Not Perceive

564

305

18 Apr 2024

Semantic-Based Active Perception for Humanoid Visual Tasks with Foveal Sensors

Joao Luzio

Alexandre Bernardino

Plinio Moreno

159

16 Apr 2024

MEEL: Multi-Modal Event Evolution Learning

174

16 Apr 2024

HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision

Siddhant Bansal

Michael Wray

Dima Damen

216

15 Apr 2024

GLID: Pre-training a Generalist Encoder-Decoder Vision Model

202

11 Apr 2024

BRAVE: Broadening the visual encoding of vision-language modelsEuropean Conference on Computer Vision (ECCV), 2024

296

10 Apr 2024

SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving

...

238

10 Apr 2024

Monocular 3D lane detection for Autonomous Driving: Recent Achievements, Challenges, and Outlooks

Yuxuan Liu

292

10 Apr 2024

MoReVQA: Exploring Modular Reasoning Models for Video Question Answering

418

09 Apr 2024

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Ser-Nam Lim

356

180

08 Apr 2024

Progressive Alignment with VLM-LLM Feature to Augment Defect Classification for the ASE Dataset

Chih-Chung Hsu

Chia-Ming Lee

Chun-Hung Sun

Kuang-Ming Wu

156

08 Apr 2024

RoboMP

^2

: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models

231

07 Apr 2024

Cross-Modal Conditioned Reconstruction for Language-guided Medical Image SegmentationIEEE Transactions on Medical Imaging (IEEE TMI), 2024

264

03 Apr 2024

What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and BiasesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

255

03 Apr 2024

ViTamin: Designing Scalable Vision Models in the Vision-Language EraComputer Vision and Pattern Recognition (CVPR), 2024

Liang-Chieh Chen

411

02 Apr 2024

Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning

208

01 Apr 2024

Siamese Vision Transformers are Scalable Audio-visual Learners

Yan-Bo Lin

Gedas Bertasius

267

28 Mar 2024

Toward Interactive Regional Understanding in Vision-Large Language Models

304

27 Mar 2024

Elysium: Exploring Object-level Perception in Videos via MLLM

315

25 Mar 2024

If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions

305

25 Mar 2024

A Multimodal Approach for Cross-Domain Image Retrieval

Lucas Iijima

Tania Stathaki

213

22 Mar 2024

MMIDR: Teaching Large Language Model to Interpret Multimodal Misinformation via Knowledge Distillation

304

21 Mar 2024

Improved Baselines for Data-efficient Perceptual Augmentation of LLMs

313

20 Mar 2024

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

Xingyuan Dai

Yisheng Lv

216

20 Mar 2024

When Do We Not Need Larger Vision Models?

407

19 Mar 2024

VisualCritic: Making LMMs Perceive Visual Quality Like Humans

243

19 Mar 2024

ViTGaze: Gaze Following with Interaction Features in Vision Transformers

216

19 Mar 2024

Fusion Transformer with Object Mask Guidance for Image Forgery Analysis

Dimitrios Karageorgiou

Giorgos Kordopatis-Zilos

Symeon Papadopoulos

ViT

195

18 Mar 2024

Better (pseudo-)labels for semi-supervised instance segmentation

171

18 Mar 2024

Depth-induced Saliency Comparison Network for Diagnosis of Alzheimer's Disease via Jointly Analysis of Visual Stimuli and Eye Movements

124

15 Mar 2024

Knowledge Condensation and Reasoning for Knowledge-based VQA

...

186

15 Mar 2024

UniCode: Learning a Unified Codebook for Multimodal Large Language ModelsEuropean Conference on Computer Vision (ECCV), 2024

219

14 Mar 2024

MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning

239

13 Mar 2024

ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense PredictionsComputer Vision and Pattern Recognition (CVPR), 2024

417

127

12 Mar 2024

FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks

Muhammad Gul Zain Ali Khan

Muhammad Ferjad Naeem

F. Tombari

Luc Van Gool

Didier Stricker

Muhammad Zeshan Afzal

VLM CLIP

198

11 Mar 2024

VLM-PL: Advanced Pseudo Labeling Approach for Class Incremental Object Detection via Vision-Language Model

330

08 Mar 2024

Spatiotemporal Predictive Pre-training for Robotic Motor Control

Gangshan Wu

369

08 Mar 2024

Embodied Understanding of Driving ScenariosEuropean Conference on Computer Vision (ECCV), 2024

Yu Qiao

254

07 Mar 2024

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

...

Hsin-Ying Lee

Ming-Hsuan Yang

366

338

29 Feb 2024

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

...

Yu Qiao

318

29 Feb 2024

VideoMAC: Video Masked Autoencoders Meet ConvNets

Yazhou Yao

243

29 Feb 2024

Vision Transformers with Natural Language Semantics

153

27 Feb 2024

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

...

Yu Qiao

Mingyu Ding

Ping Luo

249

25 Feb 2024

Uncertainty-Aware Evaluation for Vision-Language Models

436

22 Feb 2024

SoMeLVLM: A Large Vision Language Model for Social Media Processing

Xuanjing Huang

217

20 Feb 2024

VideoPrism: A Foundational Visual Encoder for Video Understanding

...

386

20 Feb 2024

Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability

Ying Tai

...

Tiejun Huang

251

19 Feb 2024