Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2211.07636
Cited By
v1
v2 (latest)
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Computer Vision and Pattern Recognition (CVPR), 2022
14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (2496★)
Papers citing
"EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"
50 / 579 papers shown
Plug n' Play: Channel Shuffle Module for Enhancing Tiny Vision Transformers
International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2023
Xuwei Xu
Sen Wang
Yudong Chen
Jiajun Liu
ViT
241
1
0
09 Oct 2023
Low-Resolution Self-Attention for Semantic Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yu-Huan Wu
Shi-Chen Zhang
Yun-Hai Liu
Le Zhang
Xin Zhan
Daquan Zhou
Jiashi Feng
Ming-Ming Cheng
Liangli Zhen
ViT
464
11
0
08 Oct 2023
Enhancing Representations through Heterogeneous Self-Supervised Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Zhongyu Li
Bo-Wen Yin
Yongxiang Liu
Tianpeng Liu
Ming-Ming Cheng
SSL
359
3
0
08 Oct 2023
Improved Baselines with Visual Instruction Tuning
Computer Vision and Pattern Recognition (CVPR), 2023
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLM
MLLM
606
4,171
0
05 Oct 2023
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yiren Jian
Tingkai Liu
Yunzhe Tao
Chunhui Zhang
Soroush Vosoughi
HX Yang
VLM
355
20
0
05 Oct 2023
Text-image Alignment for Diffusion-based Perception
Computer Vision and Pattern Recognition (CVPR), 2023
Neehar Kondapaneni
Markus Marks
Manuel Knott
Rogério Guimarães
Pietro Perona
VLM
DiffM
495
53
0
29 Sep 2023
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Pan Zhang
Xiaoyi Wang
Bin Wang
Yuhang Cao
Chao Xu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Da Lin
Yuan Liu
MLLM
790
307
0
26 Sep 2023
MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection
Kemal Oksuz
Selim Kuzucu
Tom Joy
P. Dokania
MoE
510
13
0
26 Sep 2023
Viewpoint Integration and Registration with Vision Language Foundation Model for Image Change Understanding
Xiaonan Lu
Jianlong Yuan
Ruigang Niu
Yuan Hu
Fan Wang
149
3
0
15 Sep 2023
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
International Conference on Learning Representations (ICLR), 2023
Haozhe Zhao
Zefan Cai
Shuzheng Si
Xiaojian Ma
Kaikai An
Liang Chen
Zixuan Liu
Sheng Wang
Wenjuan Han
Baobao Chang
MLLM
VLM
448
184
0
14 Sep 2023
Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation
Yunhao Ge
Lyne Tchapmi
Brian Nlong Zhao
Neel Joshi
Laurent Itti
Vibhav Vineet
DiffM
215
16
0
12 Sep 2023
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
International Conference on Learning Representations (ICLR), 2023
Yang Jin
Kun Xu
Kun Xu
Liwei Chen
Chao Liao
...
Xiaoqiang Lei
Chen Zhang
Wenwu Ou
Kun Gai
Yadong Mu
MLLM
VLM
233
76
0
09 Sep 2023
Do We Still Need Non-Maximum Suppression? Accurate Confidence Estimates and Implicit Duplication Modeling with IoU-Aware Calibration
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Johannes Gilg
Torben Teepe
Fabian Herzog
Philipp Wolters
Gerhard Rigoll
220
2
0
06 Sep 2023
Image Aesthetics Assessment via Learnable Queries
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zhiwei Xiong
Yunfan Zhang
Zhiqi Shen
Peiran Ren
Han Yu
197
5
0
06 Sep 2023
NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
Taehoon Kim
Pyunghwan Ahn
Sangyun Kim
Sihaeng Lee
Mark A Marsden
...
Yujin Wang
Yimu Wang
Tiancheng Gu
Xingchang Lv
Mingmao Sun
VLM
230
8
0
05 Sep 2023
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Zhuofan Xia
Xuran Pan
Shiji Song
Li Erran Li
Gao Huang
ViT
248
41
0
04 Sep 2023
RevColV2: Exploring Disentangled Representations in Masked Image Modeling
Neural Information Processing Systems (NeurIPS), 2023
Qi Han
Yuxuan Cai
Xiangyu Zhang
303
13
0
02 Sep 2023
Contrastive Feature Masking Open-Vocabulary Vision Transformer
IEEE International Conference on Computer Vision (ICCV), 2023
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
326
37
0
02 Sep 2023
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Yupan Huang
Zaiqiao Meng
Fangyu Liu
Yixuan Su
Nigel Collier
Yutong Lu
MLLM
182
32
0
31 Aug 2023
A General-Purpose Self-Supervised Model for Computational Pathology
Richard J. Chen
Tong Ding
Ming Y. Lu
Drew F. K. Williamson
Guillaume Jaume
...
Judy J. Wang
Walt Williams
L. Le
Georg Gerber
Faisal Mahmood
MedIm
327
56
0
29 Aug 2023
VIGC: Visual Instruction Generation and Correction
AAAI Conference on Artificial Intelligence (AAAI), 2023
Sijin Yu
Fan Wu
Xiao Han
Jiahui Peng
Huaping Zhong
...
Xiao-wen Dong
Weijia Li
Wei Li
Yuan Liu
Conghui He
MLLM
328
84
0
24 Aug 2023
Spatial Transform Decoupling for Oriented Object Detection
AAAI Conference on Artificial Intelligence (AAAI), 2023
Hongtian Yu
Yunjie Tian
QiXiang Ye
Yunfan Liu
249
48
0
21 Aug 2023
ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights
Weixian Lei
Yixiao Ge
Jianfeng Zhang
Dylan Sun
Kun Yi
Ying Shan
Mike Zheng Shou
167
1
0
20 Aug 2023
A Unified Interactive Model Evaluation for Classification, Object Detection, and Instance Segmentation in Computer Vision
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2023
Changjian Chen
Yukai Guo
Fengyuan Tian
Siyi Liu
Weikai Yang
Zhao-Ming Wang
Jing Wu
Hang Su
Hanspeter Pfister
Shixia Liu
227
21
0
09 Aug 2023
High-Level Parallelism and Nested Features for Dynamic Inference Cost and Top-Down Attention
A. Kelm
Niels Hannemann
Bruno Heberle
Lucas Schmidt
Tim Rolff
Christian Wilms
Ehsan Yaghoubi
Simone Frintrop
194
0
0
09 Aug 2023
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
International Conference on Learning Representations (ICLR), 2023
Juncheng Li
Kaihang Pan
Zhiqi Ge
Minghe Gao
Wei Ji
Wenqiao Zhang
Tat-Seng Chua
Siliang Tang
Hanwang Zhang
Yueting Zhuang
MLLM
312
89
0
08 Aug 2023
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
IEEE Transactions on Big Data (IEEE Trans. Big Data), 2023
Wenqi Shao
Yutao Hu
Shiyang Feng
Meng Lei
Kaipeng Zhang
...
Peng Xu
Siyuan Huang
Jiaming Song
Yuning Qiao
Ping Luo
VLM
MLLM
207
24
0
07 Aug 2023
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
International Conference on Machine Learning (ICML), 2023
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
541
1,029
0
04 Aug 2023
A Parameter-efficient Multi-subject Model for Predicting fMRI Activity
Connor Lane
Gregory Kiar
167
2
0
04 Aug 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
International Conference on Learning Representations (ICLR), 2023
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRM
MLLM
270
118
0
03 Aug 2023
DETR Doesn't Need Multi-Scale or Locality Design
Yutong Lin
Yuhui Yuan
Zheng Zhang
Chen Li
Nanning Zheng
Han Hu
267
5
0
03 Aug 2023
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension
Qiang-feng Zhou
Chaohui Yu
Shaofeng Zhang
Sitong Wu
Zhibin Wang
Fan Wang
169
32
0
03 Aug 2023
Guided Distillation for Semi-Supervised Instance Segmentation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Tariq Berrada
Camille Couprie
Alahari Karteek
Jakob Verbeek
206
21
0
03 Aug 2023
Improving Pixel-based MIM by Reducing Wasted Modeling Capability
IEEE International Conference on Computer Vision (ICCV), 2023
Yuan Liu
Songyang Zhang
Jiacheng Chen
Zhaohui Yu
Kai-xiang Chen
Dahua Lin
208
41
0
01 Aug 2023
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Computer Vision and Pattern Recognition (CVPR), 2023
Enxin Song
Wenhao Chai
Guanhong Wang
Yucheng Zhang
Haoyang Zhou
...
Tianbo Ye
Yanting Zhang
Yang Lu
Lei Li
Gaoang Wang
VLM
MLLM
620
453
0
31 Jul 2023
CLIP Brings Better Features to Visual Aesthetics Learners
Liwu Xu
Jinjin Xu
Yuzhe Yang
Yi-Jie Huang
Yanchun Xie
Yaqian Li
VLM
212
5
0
28 Jul 2023
Human-centric Scene Understanding for 3D Large-scale Scenarios
IEEE International Conference on Computer Vision (ICCV), 2023
Yiteng Xu
Peishan Cong
Yichen Yao
Runnan Chen
Yuenan Hou
Xinge Zhu
Xuming He
Jingyi Yu
Yuexin Ma
3DV
185
31
0
26 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
430
152
0
25 Jul 2023
CLIP-KD: An Empirical Study of CLIP Model Distillation
Computer Vision and Pattern Recognition (CVPR), 2023
Chuanguang Yang
Zhulin An
Libo Huang
Junyu Bi
Xinqiang Yu
Hansheng Yang
Boyu Diao
Yongjun Xu
VLM
345
76
0
24 Jul 2023
COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts
IEEE International Conference on Computer Vision (ICCV), 2023
Xiaofeng Mao
YueFeng Chen
Yao Zhu
Da Chen
Hang Su
Rong Zhang
H. Xue
ObjD
OOD
235
30
0
24 Jul 2023
GEM: Boost Simple Network for Glass Surface Segmentation via Vision Foundation Models
IEEE transactions on multimedia (IEEE TMM), 2023
Jing Hao
Xinyu Li
Liang Gao
Shumin Han
VLM
DiffM
273
4
0
22 Jul 2023
CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots
IEEE International Conference on Robotics and Automation (ICRA), 2023
D. Rivkin
Nikhil Kakodkar
F. Hogan
Bobak H. Baghi
Gregory Dudek
LM&Ro
283
4
0
21 Jul 2023
Watch out Venomous Snake Species: A Solution to SnakeCLEF2023
Conference and Labs of the Evaluation Forum (CLEF), 2023
Feiran Hu
Peng Wang
Yangyang Li
Chenlong Duan
Zijian Zhu
Fei Wang
Faen Zhang
Yong Li
Xiu-Shen Wei
214
7
0
19 Jul 2023
MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results
Yuki Kondo
Norimichi Ukita
Takayuki Yamaguchi
Haoran Hou
Mu-Yi Shen
...
Ichiro Ide
Yosuke Shinya
Xinyao Liu
Guang Liang
S. Yasui
194
25
0
18 Jul 2023
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Neural Information Processing Systems (NeurIPS), 2023
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLM
MLLM
388
44
0
13 Jul 2023
Self-regulating Prompts: Foundational Model Adaptation without Forgetting
IEEE International Conference on Computer Vision (ICCV), 2023
Muhammad Uzair Khattak
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
386
308
0
13 Jul 2023
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
Gregor Geigle
Abhay Jain
Radu Timofte
Goran Glavaš
VLM
MLLM
228
42
0
13 Jul 2023
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yan Zeng
Hanbo Zhang
Jiani Zheng
Jiangnan Xia
Guoqiang Wei
Yang Wei
Yuchen Zhang
Tao Kong
MLLM
317
88
0
05 Jul 2023
Surgical fine-tuning for Grape Bunch Segmentation under Visual Domain Shifts
European Conference on Mobile Robots (ECMR), 2023
Agnese Chiatti
R. Bertoglio
Nicolás Catalano
Matteo Gatti
Matteo Matteucci
150
4
0
03 Jul 2023
Stitched ViTs are Flexible Vision Backbones
European Conference on Computer Vision (ECCV), 2023
Zizheng Pan
Jing Liu
Haoyu He
Jianfei Cai
Bohan Zhuang
187
4
0
30 Jun 2023
Previous
1
2
3
...
10
11
12
9
Next