Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2211.07636
Cited By
v1
v2 (latest)
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Computer Vision and Pattern Recognition (CVPR), 2022
14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (2496★)
Papers citing
"EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"
50 / 579 papers shown
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
270
107
0
11 Dec 2023
MAFA: Managing False Negatives for Vision-Language Pre-training
Jaeseok Byun
Dohoon Kim
Taesup Moon
VLM
414
13
0
11 Dec 2023
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Neural Information Processing Systems (NeurIPS), 2023
Jinho Park
Jack Hessel
Khyathi Chandu
Paul Pu Liang
Ximing Lu
...
Youngjae Yu
Qiuyuan Huang
Jianfeng Gao
Ali Farhadi
Yejin Choi
VLM
270
13
0
08 Dec 2023
Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
Zhixiang Wei
Lin Chen
Yi Jin
Xiaoxiao Ma
Tianle Liu
Pengyang Lin
Ben Wang
Wei Xu
Jinjin Zheng
481
100
0
07 Dec 2023
AI-SAM: Automatic and Interactive Segment Anything Model
Yimu Pan
Sitao Zhang
Alison D. Gernand
Jeffery A. Goldstein
J. Z. Wang
VLM
224
10
0
05 Dec 2023
Rejuvenating image-GPT as Strong Visual Representation Learners
International Conference on Machine Learning (ICML), 2023
Sucheng Ren
Zeyu Wang
Hongru Zhu
Junfei Xiao
Yaoyao Liu
Cihang Xie
VLM
284
12
0
04 Dec 2023
Bootstrapping SparseFormers from Vision Foundation Models
Computer Vision and Pattern Recognition (CVPR), 2023
Ziteng Gao
Zhan Tong
Kevin Qinghong Lin
Joya Chen
Mike Zheng Shou
197
0
0
04 Dec 2023
InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models
Xunguang Wang
Zhenlan Ji
Pingchuan Ma
Zongjie Li
Shuai Wang
MLLM
318
19
0
04 Dec 2023
Vision-Language Models Learn Super Images for Efficient Partially Relevant Video Retrieval
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) (TOMM), 2023
Taichi Nishimura
Shota Nakada
Masayoshi Kondo
VLM
312
6
0
01 Dec 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Shafiq Joty
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLM
MLLM
276
69
0
30 Nov 2023
DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image
ACM Transactions on Graphics (TOG), 2023
Daoyi Gao
Dávid Rozenberszki
Stefan Leutenegger
Angela Dai
DiffM
301
26
0
30 Nov 2023
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Computer Vision and Pattern Recognition (CVPR), 2023
Qidong Huang
Xiao-wen Dong
Pan Zhang
Sijin Yu
Conghui He
Yuan Liu
Dahua Lin
Weiming Zhang
Neng H. Yu
MLLM
472
363
0
29 Nov 2023
Language-conditioned Detection Transformer
Computer Vision and Pattern Recognition (CVPR), 2023
Jang Hyun Cho
Philipp Krahenbuhl
VLM
ObjD
187
5
0
29 Nov 2023
A Graph-Based Approach for Category-Agnostic Pose Estimation
European Conference on Computer Vision (ECCV), 2023
Or Hirschorn
S. Avidan
369
20
0
29 Nov 2023
Leveraging VLM-Based Pipelines to Annotate 3D Objects
International Conference on Machine Learning (ICML), 2023
Rishabh Kabra
Loic Matthey
Alexander Lerchner
Niloy J. Mitra
274
10
0
29 Nov 2023
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Computer Vision and Pattern Recognition (CVPR), 2023
Pavan Kumar Anasosalu Vasu
Hadi Pouransari
Fartash Faghri
Raviteja Vemulapalli
Oncel Tuzel
CLIP
VLM
682
84
0
28 Nov 2023
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
European Conference on Computer Vision (ECCV), 2023
Yanwei Li
Chengyao Wang
Jiaya Jia
VLM
MLLM
331
480
0
28 Nov 2023
ViT-Lens: Towards Omni-modal Representations
Computer Vision and Pattern Recognition (CVPR), 2023
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
203
32
0
27 Nov 2023
EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
Computer Vision and Pattern Recognition (CVPR), 2023
Jiaxuan Li
D. Vo
Akihiro Sugimoto
Hideki Nakayama
KELM
VLM
266
43
0
27 Nov 2023
Fully Authentic Visual Question Answering Dataset from Online Communities
European Conference on Computer Vision (ECCV), 2023
Chongyan Chen
Xiyang Dai
Noel Codella
Yunsheng Li
Lu Yuan
Danna Gurari
373
9
0
27 Nov 2023
Adapter is All You Need for Tuning Visual Tasks
Dongshuo Yin
Leiyi Hu
Bin Li
Youqun Zhang
273
23
0
25 Nov 2023
Towards Transferable Multi-modal Perception Representation Learning for Autonomy: NeRF-Supervised Masked AutoEncoder
Xiaohao Xu
345
0
0
23 Nov 2023
Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs
Yonghui Wang
Wen-gang Zhou
Hao Feng
Keyi Zhou
Houqiang Li
301
25
0
22 Nov 2023
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
European Conference on Computer Vision (ECCV), 2023
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Conghui He
Yuan Liu
Feng Zhao
Dahua Lin
MLLM
VLM
380
936
0
21 Nov 2023
LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Gongwei Chen
Leyang Shen
Rui Shao
Xiang Deng
Liqiang Nie
VLM
MLLM
302
81
0
20 Nov 2023
Event Camera Data Dense Pre-training
Yan Yang
Liyuan Pan
Liu Liu
146
13
0
20 Nov 2023
Towards Open-Ended Visual Recognition with Large Language Model
Qihang Yu
Xiaohui Shen
Liang-Chieh Chen
VLM
246
8
0
14 Nov 2023
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
322
17
0
14 Nov 2023
Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Haowen Pan
Yixin Cao
Xiaozhi Wang
Xun Yang
Meng Wang
KELM
305
37
0
13 Nov 2023
Analyzing Modular Approaches for Visual Question Decomposition
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Apoorv Khandelwal
Ellie Pavlick
Chen Sun
261
5
0
10 Nov 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shezheng Song
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
276
18
0
10 Nov 2023
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
AAAI Conference on Artificial Intelligence (AAAI), 2023
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
670
281
0
09 Nov 2023
OtterHD: A High-Resolution Multi-modality Model
Yue Liu
Peiyuan Zhang
Jingkang Yang
Yuanhan Zhang
Fanyi Pu
Ziwei Liu
VLM
MLLM
187
77
0
07 Nov 2023
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Xuwei Xu
Sen Wang
Yudong Chen
Yanping Zheng
Zhewei Wei
Jiajun Liu
ViT
314
21
0
06 Nov 2023
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
Neural Information Processing Systems (NeurIPS), 2023
Jameel Hassan
Hanan Gani
Noor Hussein
Muhammad Uzair Khattak
Muzammal Naseer
Fahad Shahbaz Khan
Salman Khan
VLM
OOD
398
114
0
02 Nov 2023
Towards Evaluating Transfer-based Attacks Systematically, Practically, and Fairly
Neural Information Processing Systems (NeurIPS), 2023
Qizhang Li
Yiwen Guo
Wangmeng Zuo
Hao Chen
ELM
AAML
285
8
0
02 Nov 2023
AiluRus: A Scalable ViT Framework for Dense Prediction
Neural Information Processing Systems (NeurIPS), 2023
Jin Li
Yaoming Wang
Xiaopeng Zhang
Bowen Shi
Dongsheng Jiang
Chenglin Li
Wenrui Dai
Hongkai Xiong
Qi Tian
286
14
0
02 Nov 2023
CapsFusion: Rethinking Image-Text Data at Scale
Computer Vision and Pattern Recognition (CVPR), 2023
Qiying Yu
Quan-Sen Sun
Xiaosong Zhang
Yufeng Cui
Fan Zhang
Yue Cao
Xinlong Wang
Jingjing Liu
VLM
367
88
0
31 Oct 2023
DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data Capacity of SRAM-based Processing-In-Memory
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (IEEE TCAD), 2023
Cenlin Duan
Jianlei Yang
Xiaolin He
Yingjie Qi
Yikun Wang
...
Bonan Yan
Xueyan Wang
Xiaotao Jia
Weitao Pan
Weisheng Zhao
128
9
0
31 Oct 2023
Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone
Neural Information Processing Systems (NeurIPS), 2023
Zeyinzi Jiang
Chaojie Mao
Ziyuan Huang
Ao Ma
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
248
25
0
30 Oct 2023
Open-NeRF: Towards Open Vocabulary NeRF Decomposition
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Hao Zhang
Fang Li
Narendra Ahuja
186
17
0
25 Oct 2023
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Haoxiang Wang
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Mehrdad Farajtabar
Sachin Mehta
Mohammad Rastegari
Oncel Tuzel
Hadi Pouransari
VLM
555
127
0
23 Oct 2023
MSFormer: A Skeleton-multiview Fusion Method For Tooth Instance Segmentation
Yuan Li
Huan Liu
Y. Tao
Xiangyang He
Haifeng Li
Xiaohu Guo
Hai Lin
280
0
0
23 Oct 2023
Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
Neural Information Processing Systems (NeurIPS), 2023
Lingchen Meng
Xiyang Dai
Jianwei Yang
Dongdong Chen
Yinpeng Chen
Xiyang Dai
Yi-Ling Chen
Zuxuan Wu
Lu Yuan
Yu-Gang Jiang
157
12
0
18 Oct 2023
Beyond Segmentation: Road Network Generation with Multi-Modal LLMs
Sumedh Rasal
Sanjay K. Boddhu
251
8
0
15 Oct 2023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
1.4K
628
0
14 Oct 2023
Uni3D: Exploring Unified 3D Representation at Scale
International Conference on Learning Representations (ICLR), 2023
Junsheng Zhou
Jinsheng Wang
Baorui Ma
Yu-Shen Liu
Tiejun Huang
Xinlong Wang
255
165
0
10 Oct 2023
On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets
Ning Liao
Shaofeng Zhang
Renqiu Xia
Min Cao
Yu Qiao
Junchi Yan
MLLM
149
0
0
10 Oct 2023
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models
International Conference on Learning Representations (ICLR), 2023
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
ReLM
LRM
261
13
0
09 Oct 2023
No Token Left Behind: Efficient Vision Transformer via Dynamic Token Idling
Applied Informatics (AI), 2023
Xuwei Xu
Changlin Li
Yudong Chen
Xiaojun Chang
Jiajun Liu
Sen Wang
ViT
229
10
0
09 Oct 2023
Previous
1
2
3
...
10
11
12
8
9
Next