Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2211.07636
Cited By
v1
v2 (latest)
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Computer Vision and Pattern Recognition (CVPR), 2022
14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (2496★)
Papers citing
"EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"
50 / 579 papers shown
Towards Event-oriented Long Video Understanding
Yifan Du
Kun Zhou
Yuqi Huo
Yifan Li
Wayne Xin Zhao
Haoyu Lu
Zijia Zhao
Bingning Wang
Weipeng Chen
Ji-Rong Wen
VLM
201
19
0
20 Jun 2024
VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning
Ziyang Meng
Yu Dai
Zezheng Gong
Shaoxiong Guo
Minglong Tang
Tongquan Wei
VLM
288
7
0
20 Jun 2024
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye
Yukang Gan
Xiaoke Huang
Yixiao Ge
Yansong Tang
MLLM
VLM
390
51
0
18 Jun 2024
Unveiling Encoder-Free Vision-Language Models
Haiwen Diao
Yufeng Cui
Xiaotong Li
Yueze Wang
Huchuan Lu
Xinlong Wang
VLM
239
66
0
17 Jun 2024
V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results
Jiaqi Wang
Yuhang Zang
Pan Zhang
Tao Chu
Yuhang Cao
...
Kehong Yuan
Yanyan Zu
Jiayao Ha
Qiong Gao
Licheng Jiao
ObjD
262
1
0
17 Jun 2024
Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent
Lin Wang
Zhichao Wang
Xiaoying Tang
236
2
0
17 Jun 2024
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Zebang Cheng
Zhi-Qi Cheng
Jun-Yan He
Yuxuan Zhou
Kai Wang
Yuxiang Lin
Zheng Lian
Xiaojiang Peng
Alexander G. Hauptmann
MLLM
248
115
0
17 Jun 2024
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
Computer Vision and Pattern Recognition (CVPR), 2024
Narges Norouzi
Svetlana Orlova
Daan de Geus
Gijs Dubbelman
ViT
FedML
219
23
0
14 Jun 2024
Explore the Limits of Omni-modal Pretraining at Scale
Yiyuan Zhang
Handong Li
Jing Liu
Xiangyu Yue
VLM
LRM
251
1
0
13 Jun 2024
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Fei Wang
Xingyu Fu
James Y. Huang
Zekun Li
Qin Liu
...
Kai-Wei Chang
Dan Roth
Sheng Zhang
Hoifung Poon
Muhao Chen
VLM
318
110
0
13 Jun 2024
Comparison Visual Instruction Tuning
Wei Lin
M. Jehanzeb Mirza
Sivan Doveh
Rogerio Feris
Raja Giryes
Sepp Hochreiter
Leonid Karlinsky
276
5
0
13 Jun 2024
Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024
Peixi Wu
Bosong Chai
Xuan Nie
Longquan Yan
Zeyu Wang
Qifan Zhou
Boning Wang
Yansong Peng
Hebei Li
ObjD
221
1
0
13 Jun 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Qingyun Li
Zhe Chen
Weiyun Wang
Wenhai Wang
Shenglong Ye
...
Dahua Lin
Yu Qiao
Botian Shi
Conghui He
Jifeng Dai
VLM
OffRL
261
47
0
12 Jun 2024
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Chenyu Yang
Xizhou Zhu
Jinguo Zhu
Weijie Su
Junjie Wang
...
Lewei Lu
Bin Li
Jie Zhou
Yu Qiao
Jifeng Dai
VLM
CLIP
197
8
0
11 Jun 2024
2DP-2MRC: 2-Dimensional Pointer-based Machine Reading Comprehension Method for Multimodal Moment Retrieval
Jiajun He
Tomoki Toda
245
2
0
10 Jun 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Thong Nguyen
Yi Bin
Junbin Xiao
Leigang Qu
Yicong Li
Jay Zhangjie Wu
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
582
26
1
09 Jun 2024
Parameter-Inverted Image Pyramid Networks
Neural Information Processing Systems (NeurIPS), 2024
Xizhou Zhu
Xue Yang
Zhaokai Wang
Hao Li
Wenhan Dou
Junqi Ge
Lewei Lu
Ping Luo
Jifeng Dai
215
9
0
06 Jun 2024
Tiny models from tiny data: Textual and null-text inversion for few-shot distillation
Erik Landolsi
Fredrik Kahl
DiffM
395
1
0
05 Jun 2024
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
Weichao Zhao
Hao Feng
Qi Liu
Jingqun Tang
Shubo Wei
...
Lei Liao
Yongjie Ye
Hao Liu
Houqiang Li
Can Huang
LMTD
276
46
0
03 Jun 2024
On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines
Selim Kuzucu
Kemal Oksuz
Jonathan Sadeghi
P. Dokania
239
8
0
30 May 2024
Enhancing Vision-Language Model with Unmasked Token Alignment
Jihao Liu
Jinliang Zheng
Boxiao Liu
Yu Liu
Jiaming Song
CLIP
196
0
0
29 May 2024
FocSAM: Delving Deeply into Focused Objects in Segmenting Anything
You Huang
Zongyu Lan
Liujuan Cao
Xianming Lin
Shengchuan Zhang
Guannan Jiang
Rongrong Ji
VLM
209
6
0
29 May 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
304
8
0
28 May 2024
Hawk: Learning to Understand Open-World Video Anomalies
Jiaqi Tang
Hao Lu
Ruizheng Wu
Xiaogang Xu
Ke Ma
Cheng Fang
Bin Guo
Jiangbo Lu
Qifeng Chen
Ying-Cong Chen
VLM
188
32
0
27 May 2024
PLUG: Revisiting Amodal Segmentation with Foundation Model and Hierarchical Focus
Zhaochen Liu
Limeng Qiao
Xiangxiang Chu
Tingting Jiang
264
2
0
25 May 2024
Streaming Long Video Understanding with Large Language Models
Rui Qian
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Shuangrui Ding
Dahua Lin
Yuan Liu
VLM
251
113
0
25 May 2024
DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
Yuzhong Zhao
Feng Liu
Yue Liu
Mingxiang Liao
Chen Gong
QiXiang Ye
Fang Wan
ObjD
211
0
0
25 May 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
288
6
0
24 May 2024
Open-Vocabulary SAM3D: Understand Any 3D Scene
Hanchen Tai
Qingdong He
Jiangning Zhang
Yijie Qian
Ying Tai
Xiaobin Hu
Yabiao Wang
Yong Liu
VLM
285
1
0
24 May 2024
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Byung-Kwan Lee
Chae Won Kim
Beomchan Park
Yonghyun Ro
MLLM
LRM
339
28
0
24 May 2024
Configuring Data Augmentations to Reduce Variance Shift in Positional Embedding of Vision Transformers
AAAI Conference on Artificial Intelligence (AAAI), 2024
Bum Jun Kim
Sang Woo Kim
ViT
196
2
0
23 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
885
166
0
23 May 2024
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
336
4
0
22 May 2024
Influence of Water Droplet Contamination for Transparency Segmentation
Volker Knauthe
Paul Weitz
Thomas Pollabauer
Tristan Wirth
Arne Rak
Arjan Kuijper
Dieter W. Fellner
326
1
0
21 May 2024
OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models
Zhaojian Yu
Yinghao Wu
Zhuotao Deng
Yansong Tang
Jinqiang Cui
212
6
0
21 May 2024
Hierarchical Selective Classification
Neural Information Processing Systems (NeurIPS), 2024
Shani Goren
Ido Galil
Ran El-Yaniv
BDL
311
6
0
19 May 2024
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
293
86
0
17 May 2024
Compressive Feature Selection for Remote Visual Multi-Task Inference
Saeed Ranjbar Alvar
Ivan V. Bajić
151
0
0
15 May 2024
FreeVA: Offline MLLM as Training-Free Video Assistant
Wenhao Wu
VLM
OffRL
293
25
0
13 May 2024
EVA-X: A Foundation Model for General Chest X-ray Analysis with Self-supervised Learning
Jingfeng Yao
Xinggang Wang
Yuehao Song
Huangxuan Zhao
Jun Ma
Yajie Chen
Wenyu Liu
Bo Wang
ViT
249
18
0
08 May 2024
Selective Classification Under Distribution Shifts
Hengyue Liang
Le Peng
Ju Sun
UQCV
368
5
0
08 May 2024
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Prannay Kaul
Zhizhong Li
Hao Yang
Yonatan Dukler
Ashwin Swaminathan
C. Taylor
Stefano Soatto
HILM
417
28
0
08 May 2024
Auto-Encoding Morph-Tokens for Multimodal LLM
International Conference on Machine Learning (ICML), 2024
Kaihang Pan
Siliang Tang
Juncheng Li
Zhaoyu Fan
Wei Chow
Shuicheng Yan
Tat-Seng Chua
Yueting Zhuang
Hanwang Zhang
MLLM
253
32
0
03 May 2024
Multi-modal Learnable Queries for Image Aesthetics Assessment
IEEE International Conference on Multimedia and Expo (ICME), 2024
Zhiwei Xiong
Yunfan Zhang
Zhiqi Shen
Peiran Ren
Han Yu
EGVM
172
1
0
02 May 2024
Towards Incremental Learning in Large Language Models: A Critical Review
M. Jovanovic
Peter Voss
ELM
CLL
KELM
597
8
0
28 Apr 2024
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
Enxin Song
Wenhao Chai
Tianbo Ye
Lei Li
Xi Li
Gaoang Wang
VLM
MLLM
253
51
0
26 Apr 2024
Leveraging Large Language Models for Multimodal Search
Oriol Barbany
Michael Huang
Xinliang Zhu
Arnab Dhua
249
15
0
24 Apr 2024
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval
Young Kyun Jang
Donghyun Kim
Zihang Meng
Dat Huynh
Ser-Nam Lim
188
18
0
23 Apr 2024
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
312
33
0
22 Apr 2024
Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering
Dongze Hao
Qunbo Wang
Longteng Guo
Jie Jiang
Jing Liu
301
9
0
22 Apr 2024
Previous
1
2
3
...
5
6
7
...
10
11
12
Next