Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2211.07636
Cited By
v1
v2 (latest)
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Computer Vision and Pattern Recognition (CVPR), 2022
14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (2496★)
Papers citing
"EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"
50 / 579 papers shown
Visual Position Prompt for MLLM based Visual Grounding
Wei Tang
Yanpeng Sun
Qinying Gu
Zechao Li
VLM
529
7
0
19 Mar 2025
Exploring Disparity-Accuracy Trade-offs in Face Recognition Systems: The Role of Datasets, Architectures, and Loss Functions
International Conference on Web and Social Media (ICWSM), 2025
S. Jaiswal
Sagnik Basu
Sandipan Sikdar
Animesh Mukherjee
142
1
0
18 Mar 2025
CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model
Yuxuan Luo
Jiaqi Tang
Chenyi Huang
Feiyang Hao
Zhouhui Lian
VLM
287
0
0
13 Mar 2025
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization
Zongshang Pang
Mayu Otani
Yuta Nakashima
335
3
0
12 Mar 2025
Multi-Modal Foundation Models for Computational Pathology: A Survey
Dong Li
Guihong Wan
Xintao Wu
Xinyu Wu
Xiaohui Chen
Yi He
Christine G. Lian
Peter K. Sorger
Yevgeniy R. Semenov
Chen Zhao
MedIm
444
7
0
12 Mar 2025
Scale-Aware Pre-Training for Human-Centric Visual Perception: Enabling Lightweight and Generalizable Models
Xuanhan Wang
Huimin Deng
Lianli Gao
Jingkuan Song
VLM
265
2
0
11 Mar 2025
Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking
Computer Vision and Pattern Recognition (CVPR), 2025
Chaocan Xue
Bineng Zhong
Qihua Liang
Yaozong Zheng
Ning Li
Yuanliang Xue
Shuxiang Song
211
29
0
09 Mar 2025
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding
Hao Wu
Yue Yang
Shiqi Jiang
Donglin Bai
Zhibo Chen
Ting Cao
927
9
0
08 Mar 2025
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
Vaibhav Singh
Vaibhav Singh
Paria Mehrbod
Adam Ibrahim
Irina Rish
Eugene Belilovsky
Benjamin Thérien
CLL
446
5
0
04 Mar 2025
Generalizable Prompt Learning of CLIP: A Brief Overview
Fangming Cui
Yonggang Zhang
Xuan Wang
Xule Wang
Liang Xiao
VPVLM
VLM
1.4K
2
0
03 Mar 2025
Re-Imagining Multimodal Instruction Tuning: A Representation View
International Conference on Learning Representations (ICLR), 2025
Yiyang Liu
James Liang
Ruixiang Tang
Yugyung Lee
Majid Rabbani
...
Raghuveer M. Rao
Lifu Huang
Dongfang Liu
Qifan Wang
Cheng Han
1.1K
11
0
02 Mar 2025
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
Computer Vision and Pattern Recognition (CVPR), 2025
Ziyang Zhang
Yang Yu
Yucheng Chen
Xulei Yang
S. Yeo
MedIm
539
9
0
02 Mar 2025
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
International Conference on Learning Representations (ICLR), 2025
Shangzhe Di
Zhelun Yu
Guanghao Zhang
Haoyuan Li
Tao Zhong
Hao Cheng
Bolin Li
Wanggui He
Fangxun Shu
Hao Jiang
209
38
0
01 Mar 2025
Towards High-performance Spiking Transformers from ANN to SNN Conversion
ACM Multimedia (MM), 2024
Zihan Huang
Xinyu Shi
Zecheng Hao
Tong Bu
Jianhao Ding
Zhaofei Yu
Tiejun Huang
419
14
0
28 Feb 2025
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Computer Vision and Pattern Recognition (CVPR), 2025
Zhaoyi Liu
Huan Zhang
AAML
703
7
0
25 Feb 2025
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
International Conference on Learning Representations (ICLR), 2025
Haoyuan Li
Yanpeng Zhou
Tao Tang
Jifei Song
Yihan Zeng
Michael C. Kampffmeyer
Hang Xu
Xiaodan Liang
3DGS
340
4
0
25 Feb 2025
Pretrained Image-Text Models are Secretly Video Captioners
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Chunhui Zhang
Yiren Jian
Z. Ouyang
Soroush Vosoughi
VLM
485
13
0
20 Feb 2025
VAQUUM: Are Vague Quantifiers Grounded in Visual Data?
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Hugh Mee Wong
Rick Nouwen
Albert Gatt
466
0
0
17 Feb 2025
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
International Conference on Web and Social Media (ICWSM), 2025
Ming Shan Hee
Roy Ka-wei Lee
VLM
243
11
0
16 Feb 2025
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
Zhenxing Mi
Kuan-Chieh Wang
Guocheng Qian
Hanrong Ye
Runtao Liu
Sergey Tulyakov
Kfir Aberman
Dan Xu
LRM
318
10
0
12 Feb 2025
HCMRM: A High-Consistency Multimodal Relevance Model for Search Ads
The Web Conference (WWW), 2025
Guobing Gan
Kaiming Gao
Li Wang
Shen Jiang
Peng Jiang
221
1
0
09 Feb 2025
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
International Conference on Learning Representations (ICLR), 2025
Tao Zhang
Jinyong Wen
Zhen Chen
Kun Ding
Di Zhang
Chunhong Pan
432
2
0
04 Feb 2025
Towards Robust Multimodal Large Language Models Against Jailbreak Attacks
Ziyi Yin
Yuanpu Cao
Han Liu
Ting Wang
Jinghui Chen
Fenhlong Ma
AAML
338
2
0
02 Feb 2025
Vision-Language Model Selection and Reuse for Downstream Adaptation
Hao-Zhe Tan
Zhi Zhou
Lan-Zhe Guo
Yu-Feng Li
VLM
362
0
0
30 Jan 2025
Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink
Yining Wang
Mi Zhang
Junjie Sun
Chenyue Wang
Min Yang
Hui Xue
Jialing Tao
Ranjie Duan
Qingbin Liu
245
6
0
28 Jan 2025
Rethinking Encoder-Decoder Flow Through Shared Structures
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Frederik Laboyrie
M. K. Yucel
Albert Saà-Garriga
AI4CE
248
0
0
24 Jan 2025
ReasVQA: Advancing VideoQA with Imperfect Reasoning Process
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Jianxin Liang
Xiaojun Meng
Huishuai Zhang
Yijiao Wang
Jiansheng Wei
Dongyan Zhao
LRM
265
3
0
23 Jan 2025
Patent Figure Classification using Large Vision-language Models
European Conference on Information Retrieval (ECIR), 2025
Sushil Awale
Eric Müller-Budack
Ralph Ewerth
210
1
0
22 Jan 2025
TeD-Loc: Text Distillation for Weakly Supervised Object Localization
Shakeeb Murtaza
Soufiane Belharbi
M. Pedersoli
Mohammadhadi Shateri
WSOL
VLM
427
2
0
22 Jan 2025
Sublinear Variational Optimization of Gaussian Mixture Models with Millions to Billions of Parameters
Sebastian Salwig
Till Kahlke
F. Hirschberger
D. Forster
Jorg Lucke
VLM
307
0
0
21 Jan 2025
Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection
Yuanze Li
Haolin Wang
Shihao Yuan
Ming-Yu Liu
Debin Zhao
Yiwen Guo
Chen Xu
Guangming Shi
Wangmeng Zuo
625
42
0
20 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
IEEE Reviews in Biomedical Engineering (RBME), 2024
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
767
71
0
17 Jan 2025
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision
Diego A. Velázquez
Pau Rodríguez López
Sergio Alonso
Josep M. Gonfaus
Jordi Gonzalez
Gerardo Richarte
Javier Marin
Yoshua Bengio
Alexandre Lacoste
268
7
0
14 Jan 2025
Concept Matching with Agent for Out-of-Distribution Detection
YuXiao Lee
Xiaofeng Cao
Jingcai Guo
Wei Ye
Qing Guo
Yi Chang
320
0
0
08 Jan 2025
ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers
Chao Fan
Qipei Mei
Xiaonan Wang
Xinming Li
179
4
0
31 Dec 2024
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Information Fusion (Inf. Fusion), 2024
Hanguang Xiao
Feizhong Zhou
Xianglong Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILaw
LM&MA
LRM
450
82
0
31 Dec 2024
Towards Visual Grounding: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
969
31
0
28 Dec 2024
When SAM2 Meets Video Shadow and Mirror Detection
Leiping Jie
VLM
216
1
0
26 Dec 2024
Retention Score: Quantifying Jailbreak Risks for Vision Language Models
AAAI Conference on Artificial Intelligence (AAAI), 2024
Zaitang Li
Pin-Yu Chen
Tsung-Yi Ho
AAML
188
1
0
23 Dec 2024
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning
AAAI Conference on Artificial Intelligence (AAAI), 2024
Yunbin Tu
Liang-Sheng Li
Li Su
Qingming Huang
298
1
0
18 Dec 2024
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
Computer Vision and Pattern Recognition (CVPR), 2024
Haoyi Jiang
Liu Liu
Tianheng Cheng
Xinjie Wang
Tianwei Lin
Zhizhong Su
Wen Liu
Xinyu Wang
3DGS
ViT
485
30
0
17 Dec 2024
DINO-Foresight: Looking into the Future with DINO
Efstathios Karypidis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
AI4CE
613
15
0
16 Dec 2024
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Ruotong Wang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
445
16
0
12 Dec 2024
Mixture of Physical Priors Adapter for Parameter-Efficient Fine-Tuning
Xiping Hu
C. J. Li
QiXiang Ye
Tong Zhang
MoE
255
1
0
03 Dec 2024
HandOS: 3D Hand Reconstruction in One Stage
Computer Vision and Pattern Recognition (CVPR), 2024
Xingyu Chen
Zhuheng Song
Xiaoke Jiang
Yaoqing Hu
Junzhi Yu
Lei Zhang
3DH
HAI
492
5
0
02 Dec 2024
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Computer Vision and Pattern Recognition (CVPR), 2024
Byung-Kwan Lee
Ryo Hachiuma
Yu-Chiang Frank Wang
Y. Ro
Yueh-Hua Wu
VLM
393
6
0
02 Dec 2024
SEAL: Semantic Attention Learning for Long Video Representation
Computer Vision and Pattern Recognition (CVPR), 2024
Lan Wang
Yujia Chen
Wen-Sheng Chu
Vishnu Boddeti
Du Tran
VLM
628
7
0
02 Dec 2024
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
Luca Barsellotti
Lorenzo Bianchi
Nicola Messina
F. Carrara
Marcella Cornia
Lorenzo Baraldi
Fabrizio Falchi
Rita Cucchiara
VLM
466
17
0
28 Nov 2024
NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects?
Jiaxuan Li
Junwen Mo
MinhDuc Vo
Akihiro Sugimoto
Hideki Nakayama
322
1
0
26 Nov 2024
Edge Weight Prediction For Category-Agnostic Pose Estimation
Or Hirschorn
S. Avidan
271
1
0
25 Nov 2024
Previous
1
2
3
4
5
6
...
10
11
12
Next