ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.07636
  4. Cited By
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
v1v2 (latest)

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Computer Vision and Pattern Recognition (CVPR), 2022
14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
    VLMCLIP
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (2496★)

Papers citing "EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"

50 / 579 papers shown
Plug n' Play: Channel Shuffle Module for Enhancing Tiny Vision
  Transformers
Plug n' Play: Channel Shuffle Module for Enhancing Tiny Vision TransformersInternational Conference on Digital Image Computing: Techniques and Applications (DICTA), 2023
Xuwei Xu
Sen Wang
Yudong Chen
Jiajun Liu
ViT
241
1
0
09 Oct 2023
Low-Resolution Self-Attention for Semantic Segmentation
Low-Resolution Self-Attention for Semantic SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yu-Huan Wu
Shi-Chen Zhang
Yun-Hai Liu
Le Zhang
Xin Zhan
Daquan Zhou
Jiashi Feng
Ming-Ming Cheng
Liangli Zhen
ViT
464
11
0
08 Oct 2023
Enhancing Representations through Heterogeneous Self-Supervised Learning
Enhancing Representations through Heterogeneous Self-Supervised LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Zhongyu Li
Bo-Wen Yin
Yongxiang Liu
Tianpeng Liu
Ming-Ming Cheng
SSL
359
3
0
08 Oct 2023
Improved Baselines with Visual Instruction Tuning
Improved Baselines with Visual Instruction TuningComputer Vision and Pattern Recognition (CVPR), 2023
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLMMLLM
606
4,171
0
05 Oct 2023
Expedited Training of Visual Conditioned Language Generation via
  Redundancy Reduction
Expedited Training of Visual Conditioned Language Generation via Redundancy ReductionAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yiren Jian
Tingkai Liu
Yunzhe Tao
Chunhui Zhang
Soroush Vosoughi
HX Yang
VLM
355
20
0
05 Oct 2023
Text-image Alignment for Diffusion-based Perception
Text-image Alignment for Diffusion-based PerceptionComputer Vision and Pattern Recognition (CVPR), 2023
Neehar Kondapaneni
Markus Marks
Manuel Knott
Rogério Guimarães
Pietro Perona
VLMDiffM
495
53
0
29 Sep 2023
InternLM-XComposer: A Vision-Language Large Model for Advanced
  Text-image Comprehension and Composition
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Pan Zhang
Xiaoyi Wang
Bin Wang
Yuhang Cao
Chao Xu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Da Lin
Yuan Liu
MLLM
790
307
0
26 Sep 2023
MoCaE: Mixture of Calibrated Experts Significantly Improves Object
  Detection
MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection
Kemal Oksuz
Selim Kuzucu
Tom Joy
P. Dokania
MoE
510
13
0
26 Sep 2023
Viewpoint Integration and Registration with Vision Language Foundation
  Model for Image Change Understanding
Viewpoint Integration and Registration with Vision Language Foundation Model for Image Change Understanding
Xiaonan Lu
Jianlong Yuan
Ruigang Niu
Yuan Hu
Fan Wang
149
3
0
15 Sep 2023
MMICL: Empowering Vision-language Model with Multi-Modal In-Context
  Learning
MMICL: Empowering Vision-language Model with Multi-Modal In-Context LearningInternational Conference on Learning Representations (ICLR), 2023
Haozhe Zhao
Zefan Cai
Shuzheng Si
Xiaojian Ma
Kaikai An
Liang Chen
Zixuan Liu
Sheng Wang
Wenjuan Han
Baobao Chang
MLLMVLM
448
184
0
14 Sep 2023
Beyond Generation: Harnessing Text to Image Models for Object Detection
  and Segmentation
Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation
Yunhao Ge
Lyne Tchapmi
Brian Nlong Zhao
Neel Joshi
Laurent Itti
Vibhav Vineet
DiffM
215
16
0
12 Sep 2023
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual
  Tokenization
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual TokenizationInternational Conference on Learning Representations (ICLR), 2023
Yang Jin
Kun Xu
Kun Xu
Liwei Chen
Chao Liao
...
Xiaoqiang Lei
Chen Zhang
Wenwu Ou
Kun Gai
Yadong Mu
MLLMVLM
233
76
0
09 Sep 2023
Do We Still Need Non-Maximum Suppression? Accurate Confidence Estimates
  and Implicit Duplication Modeling with IoU-Aware Calibration
Do We Still Need Non-Maximum Suppression? Accurate Confidence Estimates and Implicit Duplication Modeling with IoU-Aware CalibrationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Johannes Gilg
Torben Teepe
Fabian Herzog
Philipp Wolters
Gerhard Rigoll
220
2
0
06 Sep 2023
Image Aesthetics Assessment via Learnable Queries
Image Aesthetics Assessment via Learnable QueriesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zhiwei Xiong
Yunfan Zhang
Zhiqi Shen
Peiran Ren
Han Yu
197
5
0
06 Sep 2023
NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
Taehoon Kim
Pyunghwan Ahn
Sangyun Kim
Sihaeng Lee
Mark A Marsden
...
Yujin Wang
Yimu Wang
Tiancheng Gu
Xingchang Lv
Mingmao Sun
VLM
230
8
0
05 Sep 2023
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Zhuofan Xia
Xuran Pan
Shiji Song
Li Erran Li
Gao Huang
ViT
248
41
0
04 Sep 2023
RevColV2: Exploring Disentangled Representations in Masked Image
  Modeling
RevColV2: Exploring Disentangled Representations in Masked Image ModelingNeural Information Processing Systems (NeurIPS), 2023
Qi Han
Yuxuan Cai
Xiangyu Zhang
303
13
0
02 Sep 2023
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Contrastive Feature Masking Open-Vocabulary Vision TransformerIEEE International Conference on Computer Vision (ICCV), 2023
Dahun Kim
A. Angelova
Weicheng Kuo
ObjDVLM
326
37
0
02 Sep 2023
Sparkles: Unlocking Chats Across Multiple Images for Multimodal
  Instruction-Following Models
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Yupan Huang
Zaiqiao Meng
Fangyu Liu
Yixuan Su
Nigel Collier
Yutong Lu
MLLM
182
32
0
31 Aug 2023
A General-Purpose Self-Supervised Model for Computational Pathology
A General-Purpose Self-Supervised Model for Computational Pathology
Richard J. Chen
Tong Ding
Ming Y. Lu
Drew F. K. Williamson
Guillaume Jaume
...
Judy J. Wang
Walt Williams
L. Le
Georg Gerber
Faisal Mahmood
MedIm
327
56
0
29 Aug 2023
VIGC: Visual Instruction Generation and Correction
VIGC: Visual Instruction Generation and CorrectionAAAI Conference on Artificial Intelligence (AAAI), 2023
Sijin Yu
Fan Wu
Xiao Han
Jiahui Peng
Huaping Zhong
...
Xiao-wen Dong
Weijia Li
Wei Li
Yuan Liu
Conghui He
MLLM
328
84
0
24 Aug 2023
Spatial Transform Decoupling for Oriented Object Detection
Spatial Transform Decoupling for Oriented Object DetectionAAAI Conference on Artificial Intelligence (AAAI), 2023
Hongtian Yu
Yunjie Tian
QiXiang Ye
Yunfan Liu
249
48
0
21 Aug 2023
ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights
ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights
Weixian Lei
Yixiao Ge
Jianfeng Zhang
Dylan Sun
Kun Yi
Ying Shan
Mike Zheng Shou
167
1
0
20 Aug 2023
A Unified Interactive Model Evaluation for Classification, Object
  Detection, and Instance Segmentation in Computer Vision
A Unified Interactive Model Evaluation for Classification, Object Detection, and Instance Segmentation in Computer VisionIEEE Transactions on Visualization and Computer Graphics (TVCG), 2023
Changjian Chen
Yukai Guo
Fengyuan Tian
Siyi Liu
Weikai Yang
Zhao-Ming Wang
Jing Wu
Hang Su
Hanspeter Pfister
Shixia Liu
227
21
0
09 Aug 2023
High-Level Parallelism and Nested Features for Dynamic Inference Cost
  and Top-Down Attention
High-Level Parallelism and Nested Features for Dynamic Inference Cost and Top-Down Attention
A. Kelm
Niels Hannemann
Bruno Heberle
Lucas Schmidt
Tim Rolff
Christian Wilms
Ehsan Yaghoubi
Simone Frintrop
194
0
0
09 Aug 2023
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative
  Instructions
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative InstructionsInternational Conference on Learning Representations (ICLR), 2023
Juncheng Li
Kaihang Pan
Zhiqi Ge
Minghe Gao
Wei Ji
Wenqiao Zhang
Tat-Seng Chua
Siliang Tang
Hanwang Zhang
Yueting Zhuang
MLLM
312
89
0
08 Aug 2023
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Tiny LVLM-eHub: Early Multimodal Experiments with BardIEEE Transactions on Big Data (IEEE Trans. Big Data), 2023
Wenqi Shao
Yutao Hu
Shiyang Feng
Meng Lei
Kaipeng Zhang
...
Peng Xu
Siyuan Huang
Jiaming Song
Yuning Qiao
Ping Luo
VLMMLLM
207
24
0
07 Aug 2023
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
MM-Vet: Evaluating Large Multimodal Models for Integrated CapabilitiesInternational Conference on Machine Learning (ICML), 2023
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
541
1,029
0
04 Aug 2023
A Parameter-efficient Multi-subject Model for Predicting fMRI Activity
A Parameter-efficient Multi-subject Model for Predicting fMRI Activity
Connor Lane
Gregory Kiar
167
2
0
04 Aug 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and
  Understanding of the Open World
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open WorldInternational Conference on Learning Representations (ICLR), 2023
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRMMLLM
270
118
0
03 Aug 2023
DETR Doesn't Need Multi-Scale or Locality Design
DETR Doesn't Need Multi-Scale or Locality Design
Yutong Lin
Yuhui Yuan
Zheng Zhang
Chen Li
Nanning Zheng
Han Hu
267
5
0
03 Aug 2023
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic
  and Regional Comprehension
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension
Qiang-feng Zhou
Chaohui Yu
Shaofeng Zhang
Sitong Wu
Zhibin Wang
Fan Wang
169
32
0
03 Aug 2023
Guided Distillation for Semi-Supervised Instance Segmentation
Guided Distillation for Semi-Supervised Instance SegmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Tariq Berrada
Camille Couprie
Alahari Karteek
Jakob Verbeek
206
21
0
03 Aug 2023
Improving Pixel-based MIM by Reducing Wasted Modeling Capability
Improving Pixel-based MIM by Reducing Wasted Modeling CapabilityIEEE International Conference on Computer Vision (ICCV), 2023
Yuan Liu
Songyang Zhang
Jiacheng Chen
Zhaohui Yu
Kai-xiang Chen
Dahua Lin
208
41
0
01 Aug 2023
MovieChat: From Dense Token to Sparse Memory for Long Video
  Understanding
MovieChat: From Dense Token to Sparse Memory for Long Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2023
Enxin Song
Wenhao Chai
Guanhong Wang
Yucheng Zhang
Haoyang Zhou
...
Tianbo Ye
Yanting Zhang
Yang Lu
Lei Li
Gaoang Wang
VLMMLLM
620
453
0
31 Jul 2023
CLIP Brings Better Features to Visual Aesthetics Learners
CLIP Brings Better Features to Visual Aesthetics Learners
Liwu Xu
Jinjin Xu
Yuzhe Yang
Yi-Jie Huang
Yanchun Xie
Yaqian Li
VLM
212
5
0
28 Jul 2023
Human-centric Scene Understanding for 3D Large-scale Scenarios
Human-centric Scene Understanding for 3D Large-scale ScenariosIEEE International Conference on Computer Vision (ICCV), 2023
Yiteng Xu
Peishan Cong
Yichen Yao
Runnan Chen
Yuenan Hou
Xinge Zhu
Xuming He
Jingyi Yu
Yuexin Ma
3DV
185
31
0
26 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
430
152
0
25 Jul 2023
CLIP-KD: An Empirical Study of CLIP Model Distillation
CLIP-KD: An Empirical Study of CLIP Model DistillationComputer Vision and Pattern Recognition (CVPR), 2023
Chuanguang Yang
Zhulin An
Libo Huang
Junyu Bi
Xinqiang Yu
Hansheng Yang
Boyu Diao
Yongjun Xu
VLM
345
76
0
24 Jul 2023
COCO-O: A Benchmark for Object Detectors under Natural Distribution
  Shifts
COCO-O: A Benchmark for Object Detectors under Natural Distribution ShiftsIEEE International Conference on Computer Vision (ICCV), 2023
Xiaofeng Mao
YueFeng Chen
Yao Zhu
Da Chen
Hang Su
Rong Zhang
H. Xue
ObjDOOD
235
30
0
24 Jul 2023
GEM: Boost Simple Network for Glass Surface Segmentation via Vision
  Foundation Models
GEM: Boost Simple Network for Glass Surface Segmentation via Vision Foundation ModelsIEEE transactions on multimedia (IEEE TMM), 2023
Jing Hao
Xinyu Li
Liang Gao
Shumin Han
VLMDiffM
273
4
0
22 Jul 2023
CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction
  Execution for Robots
CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction Execution for RobotsIEEE International Conference on Robotics and Automation (ICRA), 2023
D. Rivkin
Nikhil Kakodkar
F. Hogan
Bobak H. Baghi
Gregory Dudek
LM&Ro
283
4
0
21 Jul 2023
Watch out Venomous Snake Species: A Solution to SnakeCLEF2023
Watch out Venomous Snake Species: A Solution to SnakeCLEF2023Conference and Labs of the Evaluation Forum (CLEF), 2023
Feiran Hu
Peng Wang
Yangyang Li
Chenlong Duan
Zijian Zhu
Fei Wang
Faen Zhang
Yong Li
Xiu-Shen Wei
214
7
0
19 Jul 2023
MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset,
  Methods, and Results
MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results
Yuki Kondo
Norimichi Ukita
Takayuki Yamaguchi
Haoran Hou
Mu-Yi Shen
...
Ichiro Ide
Yosuke Shinya
Xinyao Liu
Guang Liang
S. Yasui
194
25
0
18 Jul 2023
Bootstrapping Vision-Language Learning with Decoupled Language
  Pre-training
Bootstrapping Vision-Language Learning with Decoupled Language Pre-trainingNeural Information Processing Systems (NeurIPS), 2023
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLMMLLM
388
44
0
13 Jul 2023
Self-regulating Prompts: Foundational Model Adaptation without
  Forgetting
Self-regulating Prompts: Foundational Model Adaptation without ForgettingIEEE International Conference on Computer Vision (ICCV), 2023
Muhammad Uzair Khattak
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
386
308
0
13 Jul 2023
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
Gregor Geigle
Abhay Jain
Radu Timofte
Goran Glavaš
VLMMLLM
228
42
0
13 Jul 2023
What Matters in Training a GPT4-Style Language Model with Multimodal
  Inputs?
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yan Zeng
Hanbo Zhang
Jiani Zheng
Jiangnan Xia
Guoqiang Wei
Yang Wei
Yuchen Zhang
Tao Kong
MLLM
317
88
0
05 Jul 2023
Surgical fine-tuning for Grape Bunch Segmentation under Visual Domain
  Shifts
Surgical fine-tuning for Grape Bunch Segmentation under Visual Domain ShiftsEuropean Conference on Mobile Robots (ECMR), 2023
Agnese Chiatti
R. Bertoglio
Nicolás Catalano
Matteo Gatti
Matteo Matteucci
150
4
0
03 Jul 2023
Stitched ViTs are Flexible Vision Backbones
Stitched ViTs are Flexible Vision BackbonesEuropean Conference on Computer Vision (ECCV), 2023
Zizheng Pan
Jing Liu
Haoyu He
Jianfei Cai
Bohan Zhuang
187
4
0
30 Jun 2023
Previous
123...1011129
Next