Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2211.07636
Cited By
v1
v2 (latest)
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Computer Vision and Pattern Recognition (CVPR), 2022
14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (2496★)
Papers citing
"EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"
50 / 579 papers shown
M4-BLIP: Advancing Multi-Modal Media Manipulation Detection through Face-Enhanced Local Analysis
Hang Wu
Ke Sun
Jiayi Ji
Xiaoshuai Sun
Rongrong Ji
127
0
0
01 Dec 2025
DEAL-300K: Diffusion-based Editing Area Localization with a 300K-Scale Dataset and Frequency-Prompted Baseline
Rui Zhang
Hongxia Wang
Hangqing Liu
Yang Zhou
Q. Zeng
81
0
0
28 Nov 2025
Frequency-Aware Token Reduction for Efficient Vision Transformer
Dong-Jae Lee
Jiwan Hur
Jaehyun Choi
Jaemyung Yu
Junmo Kim
188
0
0
26 Nov 2025
MuM: Multi-View Masked Image Modeling for 3D Vision
David Nordström
Johan Edstedt
Fredrik Kahl
Georg Bökman
198
0
0
21 Nov 2025
NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization
Xiyuan Wei
Chih-Jen Lin
Tianbao Yang
VLM
128
0
0
11 Nov 2025
Foundation Models for Trajectory Planning in Autonomous Driving: A Review of Progress and Open Challenges
Kemal Oksuz
Alexandru Buburuzan
Anthony Knittel
Yuhan Yao
P. Dokania
81
0
0
31 Oct 2025
BLM
1
_1
1
: A Boundless Large Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning
Wentao Tan
Bowen Wang
Heng Zhi
Chenyu Liu
Z. Li
...
Chen Xu
Zhibin Wang
Tianshi Wang
Lei Zhu
Heng Tao Shen
LM&Ro
168
0
0
28 Oct 2025
One-Timestep is Enough: Achieving High-performance ANN-to-SNN Conversion via Scale-and-Fire Neurons
Qiuyang Chen
Huiqi Yang
Qingyan Meng
Zhengyu Ma
109
0
0
27 Oct 2025
HyperET: Efficient Training in Hyperbolic Space for Multi-modal Large Language Models
Zelin Peng
Zhengqin Xu
Qingyang Liu
Xiaokang Yang
Wei Shen
233
0
0
23 Oct 2025
Towards Single-Source Domain Generalized Object Detection via Causal Visual Prompts
Chen Li
Huiying Xu
Changxin Gao
Zeyu Wang
Y. Liu
Xinzhong Zhu
123
0
0
22 Oct 2025
ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
Xiaoxing Hu
Kaicheng Yang
Ziyang Gong
Qi Ming
Zonghao Guo
Xiang An
Ziyong Feng
Junchi Yan
Xue Yang
CLIP
VLM
227
0
0
21 Oct 2025
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Haiwen Diao
Mingxuan Li
Silei Wu
Linjun Dai
Xiaohua Wang
Hanming Deng
Lewei Lu
Dahua Lin
Ziwei Liu
VLM
156
0
0
16 Oct 2025
Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking
Mitchell Keren Taraday
Shahaf Wagner
Chaim Baskin
VLM
110
1
0
08 Oct 2025
Emergent AI Surveillance: Overlearned Person Re-Identification and Its Mitigation in Law Enforcement Context
An Thi Nguyen
Radina Stoykova
Eric Arazo
124
0
0
07 Oct 2025
Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs
Yongyi Su
H. Zhang
Shijie Li
Nanqing Liu
Jingyi Liao
...
Chen Li
Nancy F. Chen
Shuicheng Yan
Xulei Yang
Xun Xu
MLLM
VLM
174
3
0
02 Oct 2025
NeMo: Needle in a Montage for Video-Language Understanding
Zi-Yuan Hu
Shuo Liang
Duo Zheng
Yanyang Li
Yeyao Tao
...
Jianguang Yu
Jing-ling Huang
Meng Fang
Yin Li
Liwei Wang
169
2
0
29 Sep 2025
SVAC: Scaling Is All You Need For Referring Video Object Segmentation
Li Zhang
Haoxiang Gao
Zhihao Zhang
Luoxiao Huang
Tao Zhang
VOS
149
0
0
28 Sep 2025
MMPB: It's Time for Multi-Modal Personalization
Jaeik Kim
Woojin Kim
Woohyeon Park
Jaeyoung Do
VLM
190
0
0
26 Sep 2025
Advancing Metallic Surface Defect Detection via Anomaly-Guided Pretraining on a Large Industrial Dataset
Chuni Liu
Hongjie Li
Jiaqi Du
Yangyang Hou
Qian Sun
Lei Jin
Ke Xu
OnRL
AI4CE
232
0
0
23 Sep 2025
MRN: Harnessing 2D Vision Foundation Models for Diagnosing Parkinson's Disease with Limited 3D MR Data
Ding Shaodong
Liu Ziyang
Zhou Yijun
Liu Tao
112
0
0
22 Sep 2025
SCENEFORGE: Enhancing 3D-text alignment with Structured Scene Compositions
Cristian Sbrolli
Matteo Matteucci
180
0
0
19 Sep 2025
RangeSAM: On the Potential of Visual Foundation Models for Range-View represented LiDAR segmentation
Paul Julius Kühn
Duc Anh Nguyen
Arjan Kuijper
Holger Graf
Dieter W. Fellner
3DPC
285
0
0
19 Sep 2025
An Empirical Analysis of VLM-based OOD Detection: Mechanisms, Advantages, and Sensitivity
YuXiao Lee
Xiaofeng Cao
Wei Ye
Jiangchao Yao
Jingkuan Song
Heng Tao Shen
MLLM
180
0
0
16 Sep 2025
ER-LoRA: Effective-Rank Guided Adaptation for Weather-Generalized Depth Estimation
Weilong Yan
Xin Zhang
Robby T. Tan
MDE
252
0
0
31 Aug 2025
Category-level Text-to-Image Retrieval Improved: Bridging the Domain Gap with Diffusion Models and Vision Encoders
Faizan Farooq Khan
Vladan Stojnić
Zakaria Laskar
Mohamed Elhoseiny
Giorgos Tolias
DiffM
VLM
100
0
0
29 Aug 2025
MobileCLIP2: Improving Multi-Modal Reinforced Training
Fartash Faghri
Pavan Kumar Anasosalu Vasu
Cem Koc
Vaishaal Shankar
Alexander Toshev
Oncel Tuzel
Hadi Pouransari
CLIP
VLM
432
1
0
28 Aug 2025
Multimodal LLMs See Sentiment
Neemias B. da Silva
John Harrison
Rodrigo Minetto
Myriam Delgado
B. Nassu
Thiago H Silva
135
0
0
23 Aug 2025
From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations
Anthony Bisulco
Rahul Ramesh
Randall Balestriero
Pratik Chaudhari
122
0
0
21 Aug 2025
Temporal Grounding as a Learning Signal for Referring Video Object Segmentation
Seunghun Lee
Jiwan Seo
Jeonghoon Kim
S. Kim
Siwon Kim
...
Wonhyeok Choi
Jaehoon Jeong
Zane Durante
Sang Hyun Park
Sunghoon Im
VOS
206
0
0
16 Aug 2025
Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?
Xuezheng Chen
Zhengbo Zou
MLLM
95
0
0
14 Aug 2025
Failures to Surface Harmful Contents in Video Large Language Models
Yuxin Cao
Wei Song
Derui Wang
Jingling Xue
Jin Song Dong
AAML
161
3
0
14 Aug 2025
DoorDet: Semi-Automated Multi-Class Door Detection Dataset via Object Detection and Large Language Models
Licheng Zhang
Bach Le
Naveed Akhtar
Tuan Ngo
109
1
0
11 Aug 2025
Membership Inference Attacks with False Discovery Rate Control
Chenxu Zhao
Wei Qian
Aobo Chen
Mengdi Huai
133
1
0
09 Aug 2025
CoCAViT: Compact Vision Transformer with Robust Global Coordination
Xuyang Wang
Lingjuan Miao
Zhiqiang Zhou
ViT
VLM
112
0
0
07 Aug 2025
A Survey on Video Temporal Grounding with Multimodal Large Language Model
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Yue Yu
Wei Liu
Y. Liu
Meng-yang Liu
Liqiang Nie
Zhouchen Lin
C. Chen
AI4TS
VLM
LRM
145
6
0
07 Aug 2025
Multi-Granularity Feature Calibration via VFM for Domain Generalized Semantic Segmentation
Xinhui Li
Xiaojie Guo
140
0
0
05 Aug 2025
Adversarial Attention Perturbations for Large Object Detection Transformers
Zachary Yahn
Selim Furkan Tekin
Fatih Ilhan
Sihao Hu
Tiansheng Huang
Yichang Xu
Margaret Loper
Ling Liu
AAML
ViT
135
2
0
05 Aug 2025
A Multi-Agent System for Complex Reasoning in Radiology Visual Question Answering
Ziruo Yi
Jinyu Liu
Ting Xiao
Mark V. Albert
199
0
0
04 Aug 2025
Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting
Miaosen Luo
Jiesen Long
Zequn Li
Yunying Yang
Yuncheng Jiang
Sijie Mai
200
2
0
04 Aug 2025
Set Pivot Learning: Redefining Generalized Segmentation with Vision Foundation Models
Xinhui Li
Xinyu He
Qiming Hu
Xiaojie Guo
123
0
0
03 Aug 2025
Rein++: Efficient Generalization and Adaptation for Semantic Segmentation with Vision Foundation Models
Zhixiang Wei
Xiaoxiao Ma
Ruishen Yan
Tao Tu
Wei Xu
Jinjin Zheng
Yi-jing Jin
Enhong Chen
VLM
175
1
0
03 Aug 2025
Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
Hyundong Jin
Hyung Jin Chang
Eunwoo Kim
VLM
135
0
0
01 Aug 2025
ART: Adaptive Relation Tuning for Generalized Relation Prediction
Gopika Sudhakaran
Hikaru Shindo
P. Schramowski
Simone Schaub-Meyer
Kristian Kersting
Stefan Roth
134
0
0
31 Jul 2025
DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception
Pei Deng
Wenqian Zhou
Hanlin Wu
115
0
0
30 Jul 2025
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models
Zhixiang Wei
Guangting Wang
Xiaoxiao Ma
Ke Mei
Wei Xu
Yi-jing Jin
Fengyun Rao
CLIP
MLLM
VLM
167
5
0
30 Jul 2025
TESPEC: Temporally-Enhanced Self-Supervised Pretraining for Event Cameras
Mohammad Mohammadi
Ziyi Wu
Igor Gilitschenski
ViT
152
0
0
29 Jul 2025
The Early Bird Identifies the Worm: You Can't Beat a Head Start in Long-Term Body Re-ID (ECHO-BID)
Thomas M. Metz
Matthew Q. Hill
A. O’toole
206
1
0
23 Jul 2025
Latent Denoising Makes Good Visual Tokenizers
Jiawei Yang
Tianhong Li
Lijie Fan
Yonglong Tian
Yue Wang
192
13
0
21 Jul 2025
ChestGPT: Integrating Large Language Models and Vision Transformers for Disease Detection and Localization in Chest X-Rays
Shehroz S. Khan
Petar Przulj
A. Ashraf
Ali Abedi
LM&MA
MedIm
151
1
0
04 Jul 2025
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark
Yi Xin
Jianjiang Yang
Haodi Zhou
Junlong Du
Qi Qin
...
Bin Fu
Xiaokang Yang
Guangtao Zhai
Ming-Hsuan Yang
Xiaohong Liu
VLM
596
86
0
01 Jul 2025
1
2
3
4
...
10
11
12
Next