v1v2 (latest)

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Computer Vision and Pattern Recognition (CVPR), 2022

14 November 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (2496★)

Papers citing "EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"

50 / 579 papers shown

Can Large Multimodal Models Uncover Deep Semantics Behind Images?

Qingxiu Dong

Zhifang Sui

186

17 Feb 2024

II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering

210

16 Feb 2024

Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering

David Romero

Thamar Solorio

292

16 Feb 2024

Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance

321

13 Feb 2024

VisLingInstruct: Elevating Zero-Shot Learning in Multi-Modal Language Models with Autonomous Instruction OptimizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

298

12 Feb 2024

Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchyInternational Conference on Learning Representations (ICLR), 2024

Simon Ging

M. A. Bravo

Thomas Brox

VLM

401

11 Feb 2024

Large Language Models for Captioning and Retrieving Remote Sensing Images

210

09 Feb 2024

Examining Gender and Racial Bias in Large Vision-Language Models Using a Novel Dataset of Parallel Images

Kathleen C. Fraser

S. Kiritchenko

271

08 Feb 2024

Question Aware Vision Transformer for Multimodal Reasoning

299

08 Feb 2024

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

317

06 Feb 2024

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional TokenizationInternational Conference on Machine Learning (ICML), 2024

Kun Xu

...

262

05 Feb 2024

Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm PerspectivesIEEE Transactions on Intelligent Vehicles (TIV), 2024

...

Yi Yang

411

05 Feb 2024

GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering

259

04 Feb 2024

Region-Based Representations Revisited

Michal Shlapentokh-Rothman

487

04 Feb 2024

Can MLLMs Perform Text-to-Image In-Context Learning?

263

02 Feb 2024

Hybrid Quantum Vision Transformers for Event Classification in High Energy Physics

...

Konstantin T. Matchev

Katia Matcheva

294

01 Feb 2024

ControlCap: Controllable Region-level Captioning

425

31 Jan 2024

Computer Vision for Primate Behavior Analysis in the Wild

...

406

29 Jan 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

...

Conghui He

Xingcheng Zhang

Yu Qiao

Dahua Lin

Yuan Liu

VLM MLLM

370

344

29 Jan 2024

VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models

337

29 Jan 2024

MM-LLMs: Recent Advances in MultiModal Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

512

335

24 Jan 2024

STICKERCONV: Generating Multimodal Empathetic Responses from ScratchAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Shi Feng

197

20 Jan 2024

Image Safeguarding: Reasoning with Conditional Vision Language Model and Obfuscating Unsafe Content CounterfactuallyAAAI Conference on Artificial Intelligence (AAAI), 2024

Mazal Bethany

Brandon Wherry

Nishant Vishwamitra

Peyman Najafirad

DiffM

134

19 Jan 2024

Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering

390

19 Jan 2024

OMG-Seg: Is One Model Good Enough For All Segmentation?

Xiangtai Li

Henghui Ding

311

106

18 Jan 2024

Supervised Fine-tuning in turn Improves Visual Foundation Models

Chun Yuan

Ying Shan

VLM CLIP

253

18 Jan 2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

...

Yu Qiao

240

18 Jan 2024

SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model

Yangfan Zhan

Zhitong Xiong

Yuan. Yuan

MLLM

254

120

18 Jan 2024

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space ModelInternational Conference on Machine Learning (ICML), 2024

485

1,378

17 Jan 2024

Beyond Anti-Forgetting: Multimodal Continual Instruction Tuning with Positive Forward Transfer

344

17 Jan 2024

UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World UnderstandingEuropean Conference on Computer Vision (ECCV), 2024

...

193

12 Jan 2024

Video Anomaly Detection and Explanation via Large Language Models

Hui Lv

Qianru Sun

253

11 Jan 2024

Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics

Hao Zhao

215

10 Jan 2024

Revisiting Adversarial Training at ScaleComputer Vision and Pattern Recognition (CVPR), 2024

Zeyu Wang

Xianhang Li

Hongru Zhu

Cihang Xie

427

09 Jan 2024

Effective pruning of web-scale datasets based on complexity of concept clustersInternational Conference on Learning Representations (ICLR), 2024

Wieland Brendel

297

09 Jan 2024

Denoising Vision Transformers

Yue Wang

247

05 Jan 2024

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything ModelComputer Vision and Pattern Recognition (CVPR), 2024

510

04 Jan 2024

Masked Modeling for Self-supervised Representation Learning on Vision and Beyond

Siyuan Li

Luyuan Zhang

Zedong Wang

Di Wu

Lirong Wu

...

Jun Xia

Cheng Tan

Yang Liu

Baigui Sun

Stan Z. Li

SSL

300

31 Dec 2023

FerKD: Surgical Label Adaptation for Efficient DistillationIEEE International Conference on Computer Vision (ICCV), 2023

Zhiqiang Shen

272

29 Dec 2023

Video Understanding with Large Language Models: A Survey

...

720

170

29 Dec 2023

Learning Vision from Models Rivals Learning Vision from DataComputer Vision and Pattern Recognition (CVPR), 2023

279

28 Dec 2023

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices

...

Chunhua Shen

312

28 Dec 2023

ChartBench: A Benchmark for Complex Visual Reasoning in Charts

Zhengzhuo Xu

Sinan Du

Yiyan Qi

Chengjin Xu

Chun Yuan

Jian Guo

440

26 Dec 2023

FoodLMM: A Versatile Food Assistant using Large Multi-modal Model

Chong-Wah Ngo

267

22 Dec 2023

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Weijie Su

...

Ping Luo

Yu Qiao

641

2,210

21 Dec 2023

GSVA: Generalized Segmentation via Multimodal Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2023

Gao Huang

597

125

15 Dec 2023

General Object Foundation Model for Images and Videos at ScaleComputer Vision and Pattern Recognition (CVPR), 2023

343

14 Dec 2023

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

363

217

14 Dec 2023

ViLA: Efficient Video-Language Alignment for Video Question AnsweringEuropean Conference on Computer Vision (ECCV), 2023

325

13 Dec 2023

Building Universal Foundation Models for Medical Image Analysis with Spatially Adaptive Networks

218

12 Dec 2023