ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.07636
  4. Cited By
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
v1v2 (latest)

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Computer Vision and Pattern Recognition (CVPR), 2022
14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
    VLMCLIP
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (2496★)

Papers citing "EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"

50 / 579 papers shown
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark
Yi Xin
Jianjiang Yang
Haodi Zhou
Junlong Du
Qi Qin
...
Bin Fu
Xiaokang Yang
Guangtao Zhai
Ming-Hsuan Yang
Xiaohong Liu
VLM
599
86
0
01 Jul 2025
HAWAII: Hierarchical Visual Knowledge Transfer for Efficient Vision-Language Models
HAWAII: Hierarchical Visual Knowledge Transfer for Efficient Vision-Language Models
Yimu Wang
Mozhgan Nasr Azadani
Sean Sedwards
Krzysztof Czarnecki
VLM
159
1
0
23 Jun 2025
LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models
LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models
Fanfei Li
Thomas Klein
Wieland Brendel
Robert Geirhos
Roland S. Zimmermann
OODD
211
3
0
20 Jun 2025
DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs
DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs
Bo-Cheng Chiu
Jen-Jee Chen
Yu-Chee Tseng
Feng-Chi Chen
327
0
0
13 Jun 2025
Beyond Overconfidence: Foundation Models Redefine Calibration in Deep Neural Networks
Achim Hekler
Lukas Kuhn
Florian Buettner
UQCV
255
1
0
11 Jun 2025
Leveraging Depth and Language for Open-Vocabulary Domain-Generalized Semantic Segmentation
Leveraging Depth and Language for Open-Vocabulary Domain-Generalized Semantic Segmentation
Siyu Chen
Ting Han
Chengzheng Fu
Changshe Zhang
Chaolei Wang
Jinhe Su
Guorong Cai
Meiliu Wu
ObjDVLM
295
1
0
11 Jun 2025
When Kernels Multiply, Clusters Unify: Fusing Embeddings with the Kronecker Product
Youqi Wu
Jingwei Zhang
Farzan Farnia
229
1
0
10 Jun 2025
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
Ruchit Rawal
Reza Shirkavand
Heng-Chiao Huang
Gowthami Somepalli
Tom Goldstein
280
3
0
09 Jun 2025
The State-of-the-Art in Lifelog Retrieval: A Review of Progress at the ACM Lifelog Search Challenge Workshop 2022-24
The State-of-the-Art in Lifelog Retrieval: A Review of Progress at the ACM Lifelog Search Challenge Workshop 2022-24
Allie Tran
Werner Bailer
Duc-Tien Dang-Nguyen
Graham Healy
Steve Hodges
...
Luca Rossetto
Klaus Schoeffmann
Minh-Triet Tran
Lucia Vadicamo
C. Gurrin
159
5
0
07 Jun 2025
Aligning Multimodal Representations through an Information Bottleneck
Antonio Almudévar
José Miguel Hernández-Lobato
Sameer Khurana
R. Marxer
Alfonso Ortega
SSL
294
5
0
05 Jun 2025
Fighting Fire with Fire (F3): A Training-free and Efficient Visual Adversarial Example Purification Method in LVLMs
Fighting Fire with Fire (F3): A Training-free and Efficient Visual Adversarial Example Purification Method in LVLMs
Yudong Zhang
Ruobing Xie
Yiqing Huang
Jiansheng Chen
Xingwu Sun
Zhanhui Kang
Di Wang
Yu Wang
AAML
339
1
0
01 Jun 2025
The Security Threat of Compressed Projectors in Large Vision-Language Models
The Security Threat of Compressed Projectors in Large Vision-Language Models
Yudong Zhang
Ruobing Xie
Xingwu Sun
Jiansheng Chen
Zhanhui Kang
Di Wang
Yu Wang
145
0
0
31 May 2025
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual Representation
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual RepresentationComputer Vision and Pattern Recognition (CVPR), 2025
Yichen Xie
Runsheng Xu
Tong He
Jyh-Jing Hwang
Katie Luo
...
Letian Chen
Yiren Lu
Zhaoqi Leng
Dragomir Anguelov
Mingxing Tan
VLMLRM
271
10
0
30 May 2025
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
Gen Luo
Ganlin Yang
Ziyang Gong
Guanzhou Chen
Haonan Duan
...
Wenhai Wang
Jifeng Dai
Yu Qiao
Rongrong Ji
X. Zhu
LM&Ro
203
19
0
30 May 2025
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
Argus: Vision-Centric Reasoning with Grounded Chain-of-ThoughtComputer Vision and Pattern Recognition (CVPR), 2025
Yunze Man
De-An Huang
Guilin Liu
Shiwei Sheng
Shilong Liu
Liang-Yan Gui
Jan Kautz
Yu Wang
Zhiding Yu
MLLMLRM
335
19
0
29 May 2025
Mitigating Hallucination in Large Vision-Language Models via Adaptive Attention Calibration
Mitigating Hallucination in Large Vision-Language Models via Adaptive Attention Calibration
Mehrdad Fazli
Bowen Wei
Ahmet Sari
Ziwei Zhu
VLM
474
3
0
27 May 2025
The Missing Point in Vision Transformers for Universal Image Segmentation
The Missing Point in Vision Transformers for Universal Image Segmentation
Sajjad Shahabodini
Mobina Mansoori
Farnoush Bayatmakou
J. Abouei
Konstantinos N. Plataniotis
Arash Mohammadi
ViTISeg
316
0
0
26 May 2025
FastCAV: Efficient Computation of Concept Activation Vectors for Explaining Deep Neural Networks
FastCAV: Efficient Computation of Concept Activation Vectors for Explaining Deep Neural Networks
Laines Schmalwasser
Niklas Penzel
Joachim Denzler
Julia Niebling
178
4
0
23 May 2025
Semantic segmentation with reward
Semantic segmentation with reward
Xie Ting
Ye Huang
Zhilin Liu
Lixin Duan
514
0
0
23 May 2025
DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval
DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval
Yuxin Yang
Yinan Zhou
Yuxin Chen
Ziqi Zhang
Zongyang Ma
...
Bing Li
Lin Song
Jun Gao
Peng Li
Weiming Hu
464
1
0
23 May 2025
NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment
NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment
Shuhao Han
Haotian Fan
Fangyuan Kong
Wenjie Liao
Chunle Guo
...
Jian Guo
Zhizhuo Shao
Ziyu Feng
Bing Li
Weiming Hu
381
24
0
22 May 2025
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
Siting Li
Xiang Gao
Simon Shaolei Du
452
1
0
21 May 2025
Exploring The Visual Feature Space for Multimodal Neural Decoding
Exploring The Visual Feature Space for Multimodal Neural Decoding
Weihao Xia
Steven Chacko
276
4
0
21 May 2025
Know When to Abstain: Optimal Selective Classification with Likelihood Ratios
Know When to Abstain: Optimal Selective Classification with Likelihood Ratios
Alvin Heng
Harold Soh
360
1
0
21 May 2025
Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives
Vision-Language Modeling Meets Remote Sensing: Models, Datasets and PerspectivesIEEE Geoscience and Remote Sensing Magazine (GRSM), 2025
Xingxing Weng
Chao Pang
Gui-Song Xia
VLM
351
12
0
20 May 2025
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding
Thong Nguyen
Zhiyuan Hu
Xu Lin
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
378
1
0
19 May 2025
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
Hanxun Huang
Sarah Monazam Erfani
Yige Li
Jiabo He
James Bailey
AAML
474
9
0
08 May 2025
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Cunxin Fan
Xiaosong Jia
Yihang Sun
Yixiao Wang
Jianglan Wei
...
Xiangyu Zhao
Masayoshi Tomizuka
Songyuan Li
Junchi Yan
Mingyu Ding
LM&RoVLM
371
25
0
04 May 2025
DEEMO: De-identity Multimodal Emotion Recognition and Reasoning
DEEMO: De-identity Multimodal Emotion Recognition and Reasoning
Deng Li
Bohao Xing
Xin Liu
Baiqiang Xia
Bihan Wen
Heikki Kälviäinen
VLM
292
6
0
28 Apr 2025
MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation
MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane RepresentationInternational Conference on Learning Representations (ICLR), 2025
Siyi Jiao
Wenzheng Zeng
Y. Li
Han Zhang
Changxin Gao
Nong Sang
Mike Zheng Shou
246
1
0
20 Apr 2025
Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation
Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation
Siyu Chen
Ting Han
Changshe Zhang
Xin Luo
Meiliu Wu
Guorong Cai
Jinhe Su
MDE
404
1
0
17 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjDVOS
666
107
0
17 Apr 2025
Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis
Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis
Shravan Chaudhari
Trilokya Akula
Yoon Kim
Tom Blake
LRM
192
1
0
16 Apr 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei
Jiacong Wang
Haochen Wang
Xuelong Li
Jun Hao Liew
Jiashi Feng
Zilong Huang
229
20
0
14 Apr 2025
CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates
CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates
Ankit Kumar Shaw
Yunlong Wang
Tuopu Wen
Chandan Kumar Sah
Xinyu Jiao
Mengmeng Yang
Ke Wang
Xiaoli Lian
249
2
0
14 Apr 2025
Enhancing Multi-task Learning Capability of Medical Generalist Foundation Model via Image-centric Multi-annotation Data
Enhancing Multi-task Learning Capability of Medical Generalist Foundation Model via Image-centric Multi-annotation Data
Xun Zhu
Fanbin Mo
Zheng Zhang
Jing Wang
Yiming Shi
Ming Wu
Chuang Zhang
Chenyi Guo
Ji Wu
287
0
0
14 Apr 2025
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Cheng-Yu Hsieh
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Hadi Pouransari
VLM
961
0
0
11 Apr 2025
VC-LLM: Automated Advertisement Video Creation from Raw Footage using Multi-modal LLMs
VC-LLM: Automated Advertisement Video Creation from Raw Footage using Multi-modal LLMs
Dongjun Qian
Kai Su
Yiming Tan
Qishuai Diao
Xian Wu
Chang Liu
Zehuan Yuan
Zehuan Yuan
VGen
182
1
0
08 Apr 2025
REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding
REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding
Sakib Reza
Xiyun Song
Heather Yu
Zongfang Lin
Mohsen Moghaddam
Mario Sznaier
274
0
0
07 Apr 2025
Rip Current Segmentation: A Novel Benchmark and YOLOv8 Baseline Results
Rip Current Segmentation: A Novel Benchmark and YOLOv8 Baseline Results
Andrei Dumitriu
Florin Tatui
Florin Miron
Radu Tudor Ionescu
Radu Timofte
386
34
0
03 Apr 2025
Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite Imagery
Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite Imagery
Mykola Lavreniuk
Nataliia Kussul
Andrii Shelestov
Bohdan Yailymov
Yevhenii Salii
Volodymyr Kuzin
Zoltan Szantoi
247
2
0
03 Apr 2025
Enhanced Cross-modal 3D Retrieval via Tri-modal Reconstruction
Enhanced Cross-modal 3D Retrieval via Tri-modal Reconstruction
Junlong Ren
Hao Wang
319
2
0
02 Apr 2025
RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety
RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and SafetyComputer Vision and Pattern Recognition (CVPR), 2025
Andrei Dumitriu
Florin Tatui
Florin Miron
Aakash Ralhan
Radu Tudor Ionescu
Radu Timofte
365
1
0
01 Apr 2025
Evaluating Text-to-Image and Text-to-Video Synthesis with a Conditional Fréchet Distance
Evaluating Text-to-Image and Text-to-Video Synthesis with a Conditional Fréchet Distance
Jaywon Koo
J. Hernandez
Moayed Haji-Ali
Ziyan Yang
Vicente Ordonez
EGVM
320
0
0
27 Mar 2025
Vision as LoRA
Vision as LoRA
Han Wang
Yongjie Ye
Bingru Li
Yuxiang Nie
Jinghui Lu
Jingqun Tang
Yanjie Wang
Can Huang
376
12
0
26 Mar 2025
Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
Paul Koch
Jörg Krüger
Ankit Chowdhury
O. Heimann
MDE
278
0
0
25 Mar 2025
Scaling Vision Pre-Training to 4K Resolution
Scaling Vision Pre-Training to 4K ResolutionComputer Vision and Pattern Recognition (CVPR), 2025
Baifeng Shi
Boyi Li
Han Cai
Yaojie Lu
Sifei Liu
...
Jan Kautz
Enze Xie
Trevor Darrell
Pavlo Molchanov
Hongxu Yin
CLIP
905
12
0
25 Mar 2025
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Seeing What Matters: Empowering CLIP with Patch Generation-to-SelectionComputer Vision and Pattern Recognition (CVPR), 2025
Gensheng Pei
Tao Chen
Yujia Wang
Xinhao Cai
Xiangbo Shu
Tianfei Zhou
Yazhou Yao
VLM
309
5
0
21 Mar 2025
REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models
REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models
Jie M. Zhang
Zheng Yuan
Ziyi Wang
Bei Yan
Sibo Wang
Xiangkui Cao
Zonghui Guo
Shiguang Shan
Xilin Chen
ELM
326
2
0
20 Mar 2025
Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance
Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance
Hui Liu
Wenya Wang
Kecheng Chen
Jie Liu
Yibing Liu
Tiexin Qin
Peisong He
Xinghao Jiang
Haoliang Li
BDLVLM
947
0
0
20 Mar 2025
Previous
12345...101112
Next