v1v2 (latest)

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Computer Vision and Pattern Recognition (CVPR), 2022

14 November 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (2496★)

Papers citing "EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"

50 / 579 papers shown

4M: Massively Multimodal Masked Modeling

270

107

11 Dec 2023

MAFA: Managing False Negatives for Vision-Language Pre-training

414

11 Dec 2023

Localized Symbolic Knowledge Distillation for Visual Commonsense ModelsNeural Information Processing Systems (NeurIPS), 2023

...

Yejin Choi

270

08 Dec 2023

Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

481

100

07 Dec 2023

AI-SAM: Automatic and Interactive Segment Anything Model

224

05 Dec 2023

Rejuvenating image-GPT as Strong Visual Representation LearnersInternational Conference on Machine Learning (ICML), 2023

Cihang Xie

284

04 Dec 2023

Bootstrapping SparseFormers from Vision Foundation ModelsComputer Vision and Pattern Recognition (CVPR), 2023

197

04 Dec 2023

InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models

318

04 Dec 2023

Vision-Language Models Learn Super Images for Efficient Partially Relevant Video RetrievalACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) (TOMM), 2023

312

01 Dec 2023

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

Ran Xu

Silvio Savarese

Caiming Xiong

Juan Carlos Niebles

VLM MLLM

276

30 Nov 2023

DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB ImageACM Transactions on Graphics (TOG), 2023

301

30 Nov 2023

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-AllocationComputer Vision and Pattern Recognition (CVPR), 2023

Conghui He

Dahua Lin

472

363

29 Nov 2023

Language-conditioned Detection TransformerComputer Vision and Pattern Recognition (CVPR), 2023

Jang Hyun Cho

Philipp Krahenbuhl

VLM ObjD

187

29 Nov 2023

A Graph-Based Approach for Category-Agnostic Pose EstimationEuropean Conference on Computer Vision (ECCV), 2023

Or Hirschorn

S. Avidan

369

29 Nov 2023

Leveraging VLM-Based Pipelines to Annotate 3D ObjectsInternational Conference on Machine Learning (ICML), 2023

Rishabh Kabra

Loic Matthey

Alexander Lerchner

Niloy J. Mitra

274

29 Nov 2023

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced TrainingComputer Vision and Pattern Recognition (CVPR), 2023

Pavan Kumar Anasosalu Vasu

682

28 Nov 2023

LLaMA-VID: An Image is Worth 2 Tokens in Large Language ModelsEuropean Conference on Computer Vision (ECCV), 2023

331

480

28 Nov 2023

ViT-Lens: Towards Omni-modal RepresentationsComputer Vision and Pattern Recognition (CVPR), 2023

Ying Shan

203

27 Nov 2023

EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World ComprehensionComputer Vision and Pattern Recognition (CVPR), 2023

266

27 Nov 2023

Fully Authentic Visual Question Answering Dataset from Online CommunitiesEuropean Conference on Computer Vision (ECCV), 2023

Chongyan Chen

Xiyang Dai

Noel Codella

Yunsheng Li

Lu Yuan

Danna Gurari

373

27 Nov 2023

Adapter is All You Need for Tuning Visual Tasks

273

25 Nov 2023

Towards Transferable Multi-modal Perception Representation Learning for Autonomy: NeRF-Supervised Masked AutoEncoder

Xiaohao Xu

345

23 Nov 2023

Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs

Hao Feng

301

22 Nov 2023

ShareGPT4V: Improving Large Multi-Modal Models with Better CaptionsEuropean Conference on Computer Vision (ECCV), 2023

Conghui He

Dahua Lin

380

936

21 Nov 2023

LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

302

20 Nov 2023

Event Camera Data Dense Pre-training

Yan Yang

Liyuan Pan

Liu Liu

146

20 Nov 2023

Towards Open-Ended Visual Recognition with Large Language Model

Qihang Yu

Xiaohui Shen

Liang-Chieh Chen

VLM

246

14 Nov 2023

Vision-Language Instruction Tuning: A Review and Analysis

Ying Shan

322

14 Nov 2023

Finding and Editing Multi-Modal Neurons in Pre-Trained TransformersAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Xiaozhi Wang

305

13 Nov 2023

Analyzing Modular Approaches for Visual Question DecompositionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Apoorv Khandelwal

Ellie Pavlick

Chen Sun

261

10 Nov 2023

How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model

Shasha Li

276

10 Nov 2023

FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual PromptsAAAI Conference on Artificial Intelligence (AAAI), 2023

670

281

09 Nov 2023

OtterHD: A High-Resolution Multi-modality Model

Ziwei Liu

187

07 Nov 2023

GTP-ViT: Efficient Vision Transformers via Graph-based Token PropagationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

314

06 Nov 2023

Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot GeneralizationNeural Information Processing Systems (NeurIPS), 2023

Jameel Hassan

Hanan Gani

Noor Hussein

Muhammad Uzair Khattak

Muzammal Naseer

Fahad Shahbaz Khan

Salman Khan

VLM OOD

398

114

02 Nov 2023

Towards Evaluating Transfer-based Attacks Systematically, Practically, and FairlyNeural Information Processing Systems (NeurIPS), 2023

Wangmeng Zuo

285

02 Nov 2023

AiluRus: A Scalable ViT Framework for Dense PredictionNeural Information Processing Systems (NeurIPS), 2023

286

02 Nov 2023

CapsFusion: Rethinking Image-Text Data at ScaleComputer Vision and Pattern Recognition (CVPR), 2023

367

31 Oct 2023

DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data Capacity of SRAM-based Processing-In-MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (IEEE TCAD), 2023

...

128

31 Oct 2023

Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from BackboneNeural Information Processing Systems (NeurIPS), 2023

Zeyinzi Jiang

248

30 Oct 2023

Open-NeRF: Towards Open Vocabulary NeRF DecompositionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Hao Zhang

Fang Li

Narendra Ahuja

186

25 Oct 2023

SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

Haoxiang Wang

Pavan Kumar Anasosalu Vasu

555

127

23 Oct 2023

MSFormer: A Skeleton-multiview Fusion Method For Tooth Instance Segmentation

280

23 Oct 2023

Learning from Rich Semantics and Coarse Locations for Long-tailed Object DetectionNeural Information Processing Systems (NeurIPS), 2023

Jianwei Yang

Zuxuan Wu

Lu Yuan

Yu-Gang Jiang

157

18 Oct 2023

Beyond Segmentation: Road Network Generation with Multi-Modal LLMs

Sumedh Rasal

Sanjay K. Boddhu

251

15 Oct 2023

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

Raghuraman Krishnamoorthi

1.4K

628

14 Oct 2023

Uni3D: Exploring Unified 3D Representation at ScaleInternational Conference on Learning Representations (ICLR), 2023

Tiejun Huang

255

165

10 Oct 2023

On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets

Yu Qiao

149

10 Oct 2023

Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2023

Archiki Prasad

Elias Stengel-Eskin

Mohit Bansal

ReLM LRM

261

09 Oct 2023

No Token Left Behind: Efficient Vision Transformer via Dynamic Token IdlingApplied Informatics (AI), 2023

Xiaojun Chang

229

09 Oct 2023