v1v2 (latest)

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Computer Vision and Pattern Recognition (CVPR), 2022

14 November 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (2496★)

Papers citing "EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"

50 / 579 papers shown

Towards Event-oriented Long Video Understanding

Kun Zhou

Wayne Xin Zhao

Bingning Wang

Weipeng Chen

Ji-Rong Wen

VLM

204

20 Jun 2024

VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning

288

20 Jun 2024

VoCo-LLaMA: Towards Vision Compression with Large Language Models

Yansong Tang

393

18 Jun 2024

Unveiling Encoder-Free Vision-Language Models

Yueze Wang

Xinlong Wang

242

17 Jun 2024

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

Jiaqi Wang

...

Licheng Jiao

262

17 Jun 2024

Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent

Lin Wang

Zhichao Wang

Xiaoying Tang

236

17 Jun 2024

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

Zebang Cheng

Zhi-Qi Cheng

Jun-Yan He

Yuxuan Zhou

Kai Wang

Yuxiang Lin

Zheng Lian

Xiaojiang Peng

Alexander G. Hauptmann

MLLM

251

118

17 Jun 2024

ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2024

219

14 Jun 2024

Explore the Limits of Omni-modal Pretraining at Scale

Handong Li

251

13 Jun 2024

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Qin Liu

...

Kai-Wei Chang

Dan Roth

Sheng Zhang

Hoifung Poon

Muhao Chen

VLM

324

111

13 Jun 2024

Comparison Visual Instruction Tuning

Wei Lin

277

13 Jun 2024

Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024

Zeyu Wang

Boning Wang

221

13 Jun 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Zhe Chen

...

Dahua Lin

Yu Qiao

Botian Shi

Conghui He

Jifeng Dai

VLM OffRL

269

12 Jun 2024

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

Chenyu Yang

Xizhou Zhu

Jinguo Zhu

Weijie Su

Junjie Wang

...

Lewei Lu

Bin Li

Jie Zhou

Yu Qiao

Jifeng Dai

VLM CLIP

200

11 Jun 2024

2DP-2MRC: 2-Dimensional Pointer-based Machine Reading Comprehension Method for Multimodal Moment Retrieval

Jiajun He

Tomoki Toda

251

10 Jun 2024

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data PerspectivesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

593

09 Jun 2024

Parameter-Inverted Image Pyramid NetworksNeural Information Processing Systems (NeurIPS), 2024

Xizhou Zhu

Xue Yang

Zhaokai Wang

Hao Li

Jifeng Dai

218

06 Jun 2024

Tiny models from tiny data: Textual and null-text inversion for few-shot distillation

Erik Landolsi

Fredrik Kahl

DiffM

395

05 Jun 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

Qi Liu

...

279

03 Jun 2024

On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines

242

30 May 2024

Enhancing Vision-Language Model with Unmasked Token Alignment

196

29 May 2024

FocSAM: Delving Deeply into Focused Objects in Segmenting Anything

Liujuan Cao

Rongrong Ji

217

29 May 2024

ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention

310

28 May 2024

Hawk: Learning to Understand Open-World Video Anomalies

Xiaogang Xu

Jiangbo Lu

188

27 May 2024

PLUG: Revisiting Amodal Segmentation with Foundation Model and Hierarchical Focus

268

25 May 2024

Streaming Long Video Understanding with Large Language Models

Dahua Lin

253

113

25 May 2024

DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution

211

25 May 2024

Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models

Yue Zhang

Hehe Fan

Yi Yang

291

24 May 2024

Open-Vocabulary SAM3D: Understand Any 3D Scene

Hanchen Tai

Qingdong He

Jiangning Zhang

Yijie Qian

Ying Tai

Xiaobin Hu

Yabiao Wang

Yong Liu

VLM

285

24 May 2024

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Chae Won Kim

339

24 May 2024

Configuring Data Augmentations to Reduce Variance Shift in Positional Embedding of Vision TransformersAAAI Conference on Artificial Intelligence (AAAI), 2024

Bum Jun Kim

Sang Woo Kim

ViT

196

23 May 2024

A Survey on Vision-Language-Action Models for Embodied AI

903

169

23 May 2024

LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate

336

22 May 2024

Influence of Water Droplet Contamination for Transparency Segmentation

326

21 May 2024

OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models

217

21 May 2024

Hierarchical Selective ClassificationNeural Information Processing Systems (NeurIPS), 2024

315

19 May 2024

Efficient Multimodal Large Language Models: A Survey

Yizhang Jin

Jian Li

Yexin Liu

Tianjun Gu

Kai Wu

...

Xin Tan

Zhenye Gan

Yabiao Wang

Chengjie Wang

Lizhuang Ma

LRM

307

17 May 2024

Compressive Feature Selection for Remote Visual Multi-Task Inference

Saeed Ranjbar Alvar

Ivan V. Bajić

151

15 May 2024

FreeVA: Offline MLLM as Training-Free Video Assistant

Wenhao Wu

VLM OffRL

297

13 May 2024

EVA-X: A Foundation Model for General Chest X-ray Analysis with Self-supervised Learning

252

08 May 2024

Selective Classification Under Distribution Shifts

371

08 May 2024

THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models

417

08 May 2024

Auto-Encoding Morph-Tokens for Multimodal LLMInternational Conference on Machine Learning (ICML), 2024

254

03 May 2024

Multi-modal Learnable Queries for Image Aesthetics AssessmentIEEE International Conference on Multimedia and Expo (ICME), 2024

179

02 May 2024

Towards Incremental Learning in Large Language Models: A Critical Review

M. Jovanovic

Peter Voss

ELM CLL KELM

597

28 Apr 2024

MovieChat+: Question-aware Sparse Memory for Long Video Question Answering

Xi Li

253

26 Apr 2024

Leveraging Large Language Models for Multimodal Search

249

24 Apr 2024

Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

Ser-Nam Lim

188

23 Apr 2024

AutoAD III: The Prequel -- Back to the Pixels

312

22 Apr 2024

Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering

Jing Liu

307

22 Apr 2024