ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.07636
  4. Cited By
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
    VLM
    CLIP
ArXivPDFHTML

Papers citing "EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"

50 / 507 papers shown
Title
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
Hanxun Huang
Sarah Monazam Erfani
Yige Li
Xingjun Ma
James Bailey
AAML
34
0
0
08 May 2025
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Cunxin Fan
Xiaosong Jia
Yihang Sun
Yixiao Wang
Jianglan Wei
...
Xiangyu Zhao
M. Tomizuka
Xue Yang
Junchi Yan
Mingyu Ding
LM&Ro
VLM
54
2
0
04 May 2025
DEEMO: De-identity Multimodal Emotion Recognition and Reasoning
DEEMO: De-identity Multimodal Emotion Recognition and Reasoning
Deng Li
Bohao Xing
Xin Liu
Baiqiang Xia
Bihan Wen
H. Kalviainen
VLM
68
0
0
28 Apr 2025
MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation
MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation
Siyi Jiao
Wenzheng Zeng
Y. Li
H. Zhang
Changxin Gao
Nong Sang
Mike Zheng Shou
12
0
0
20 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation
Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation
Siyu Chen
Ting Han
Changshe Zhang
Xin Luo
Meiliu Wu
Guorong Cai
Jinhe Su
MDE
32
0
0
17 Apr 2025
Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis
Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis
Shravan Chaudhari
Trilokya Akula
Yoon Kim
Tom Blake
LRM
40
0
0
16 Apr 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei
Jiacong Wang
Haochen Wang
X. Li
Jun Hao Liew
Jiashi Feng
Zilong Huang
26
1
0
14 Apr 2025
CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates
CleanMAP: Distilling Multimodal LLMs for Confidence-Driven Crowdsourced HD Map Updates
Ankit Kumar Shaw
Kun Jiang
Tuopu Wen
Chandan Kumar Sah
Yining Shi
Mengmeng Yang
D. Yang
Xiaoli Lian
26
0
0
14 Apr 2025
Enhancing Multi-task Learning Capability of Medical Generalist Foundation Model via Image-centric Multi-annotation Data
Enhancing Multi-task Learning Capability of Medical Generalist Foundation Model via Image-centric Multi-annotation Data
Xun Zhu
Fanbin Mo
Zheng Zhang
J. Wang
Yiming Shi
Ming Wu
Chuang Zhang
Miao Li
Ji Wu
24
0
0
14 Apr 2025
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Cheng-Yu Hsieh
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Hadi Pouransari
VLM
61
0
0
11 Apr 2025
VC-LLM: Automated Advertisement Video Creation from Raw Footage using Multi-modal LLMs
VC-LLM: Automated Advertisement Video Creation from Raw Footage using Multi-modal LLMs
Dongjun Qian
Kai Su
Yiming Tan
Qishuai Diao
Xian Wu
Chang Liu
Bingyue Peng
Zehuan Yuan
VGen
16
0
0
08 Apr 2025
REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding
REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding
Sakib Reza
Xiyun Song
Heather Yu
Zongfang Lin
Mohsen Moghaddam
Octavia Camps
23
0
0
07 Apr 2025
Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite Imagery
Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite Imagery
Mykola Lavreniuk
Nataliia Kussul
Andrii Shelestov
Bohdan Yailymov
Yevhenii Salii
Volodymyr Kuzin
Zoltan Szantoi
29
0
0
03 Apr 2025
Rip Current Segmentation: A Novel Benchmark and YOLOv8 Baseline Results
Rip Current Segmentation: A Novel Benchmark and YOLOv8 Baseline Results
Andrei Dumitriu
Florin Tatui
Florin Miron
Radu Tudor Ionescu
Radu Timofte
37
19
0
03 Apr 2025
Enhanced Cross-modal 3D Retrieval via Tri-modal Reconstruction
Enhanced Cross-modal 3D Retrieval via Tri-modal Reconstruction
Junlong Ren
Hao Wang
36
0
0
02 Apr 2025
RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety
RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety
Andrei Dumitriu
Florin Tatui
Florin Miron
Aakash Ralhan
Radu Tudor Ionescu
Radu Timofte
26
0
0
01 Apr 2025
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance
Jaywon Koo
J. Hernandez
Moayed Haji-Ali
Ziyan Yang
Vicente Ordonez
EGVM
67
0
0
27 Mar 2025
Vision as LoRA
Vision as LoRA
Han Wang
Yongjie Ye
Bingru Li
Yuxiang Nie
Jinghui Lu
Jingqun Tang
Yanjie Wang
Can Huang
86
0
0
26 Mar 2025
Scaling Vision Pre-Training to 4K Resolution
Scaling Vision Pre-Training to 4K Resolution
Baifeng Shi
Boyi Li
Han Cai
Y. Lu
Sifei Liu
...
Jan Kautz
Song Han
Trevor Darrell
Pavlo Molchanov
Hongxu Yin
CLIP
53
0
0
25 Mar 2025
Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
Paul Koch
Jörg Krüger
Ankit Chowdhury
O. Heimann
MDE
51
0
0
25 Mar 2025
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei
Tao Chen
Yujia Wang
Xinhao Cai
Xiangbo Shu
Tianfei Zhou
Yazhou Yao
VLM
48
1
0
21 Mar 2025
Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance
Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance
Hui Liu
Wenya Wang
Kecheng Chen
Jie Liu
Yibing Liu
Tiexin Qin
Peisong He
Xinghao Jiang
Haoliang Li
BDL
VLM
78
0
0
20 Mar 2025
REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models
REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models
Jie M. Zhang
Zheng Yuan
Z. Wang
Bei Yan
Sibo Wang
Xiangkui Cao
Zonghui Guo
Shiguang Shan
Xilin Chen
ELM
36
0
0
20 Mar 2025
Visual Position Prompt for MLLM based Visual Grounding
Visual Position Prompt for MLLM based Visual Grounding
Wei Tang
Yanpeng Sun
Qinying Gu
Zechao Li
VLM
45
0
0
19 Mar 2025
Exploring Disparity-Accuracy Trade-offs in Face Recognition Systems: The Role of Datasets, Architectures, and Loss Functions
Exploring Disparity-Accuracy Trade-offs in Face Recognition Systems: The Role of Datasets, Architectures, and Loss Functions
S. Jaiswal
Sagnik Basu
Sandipan Sikdar
Animesh Mukherjee
33
0
0
18 Mar 2025
CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model
Yuxuan Luo
Jiaqi Tang
Chenyi Huang
Feiyang Hao
Zhouhui Lian
VLM
56
0
0
13 Mar 2025
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization
Zongshang Pang
Mayu Otani
Yuta Nakashima
51
0
0
12 Mar 2025
Multi-Modal Foundation Models for Computational Pathology: A Survey
Multi-Modal Foundation Models for Computational Pathology: A Survey
Dong Li
Guihong Wan
Xintao Wu
Xinyu Wu
Xiaohui Chen
Yi He
Christine G. Lian
Peter K. Sorger
Yevgeniy R. Semenov
Chen Zhao
MedIm
44
0
0
12 Mar 2025
Scale-Aware Pre-Training for Human-Centric Visual Perception: Enabling Lightweight and Generalizable Models
Xuanhan Wang
Huimin Deng
Lianli Gao
Jingkuan Song
VLM
47
0
0
11 Mar 2025
Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking
Chaocan Xue
Bineng Zhong
Qihua Liang
Yaozong Zheng
Ning Li
Yuanliang Xue
Shuxiang Song
36
0
0
09 Mar 2025
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding
Hao Wu
Y. Yang
Shiqi Jiang
Donglin Bai
Zhibo Chen
Ting Cao
50
0
0
08 Mar 2025
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
Paul Janson
Vaibhav Singh
Paria Mehrbod
Adam Ibrahim
Irina Rish
Eugene Belilovsky
Benjamin Thérien
CLL
73
0
0
04 Mar 2025
Generalizable Prompt Learning of CLIP: A Brief Overview
Generalizable Prompt Learning of CLIP: A Brief Overview
Fangming Cui
Yonggang Zhang
Xuan Wang
Xule Wang
Liang Xiao
VPVLM
VLM
73
0
0
03 Mar 2025
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
Ziyang Zhang
Yang Yu
Yucheng Chen
Xulei Yang
S. Yeo
MedIm
45
1
0
02 Mar 2025
Re-Imagining Multimodal Instruction Tuning: A Representation View
Re-Imagining Multimodal Instruction Tuning: A Representation View
Yiyang Liu
James Liang
Ruixiang Tang
Yugyung Lee
Majid Rabbani
...
Raghuveer M. Rao
Lifu Huang
Dongfang Liu
Qifan Wang
Cheng Han
48
0
0
02 Mar 2025
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
Shangzhe Di
Zhelun Yu
Guanghao Zhang
Haoyuan Li
Tao Zhong
Hao Cheng
Bolin Li
Wanggui He
Fangxun Shu
Hao Jiang
63
4
0
01 Mar 2025
Towards High-performance Spiking Transformers from ANN to SNN Conversion
Towards High-performance Spiking Transformers from ANN to SNN Conversion
Zihan Huang
Xinyu Shi
Zecheng Hao
Tong Bu
Jianhao Ding
Zhaofei Yu
Tiejun Huang
28
7
0
28 Feb 2025
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Zhaoyi Liu
Huan Zhang
AAML
72
0
0
25 Feb 2025
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
Haoyuan Li
Yanpeng Zhou
Tao Tang
Jifei Song
Yihan Zeng
Michael C. Kampffmeyer
Hang Xu
Xiaodan Liang
3DGS
57
1
0
25 Feb 2025
Pretrained Image-Text Models are Secretly Video Captioners
Pretrained Image-Text Models are Secretly Video Captioners
Chunhui Zhang
Yiren Jian
Z. Ouyang
Soroush Vosoughi
VLM
63
3
0
20 Feb 2025
VAQUUM: Are Vague Quantifiers Grounded in Visual Data?
VAQUUM: Are Vague Quantifiers Grounded in Visual Data?
Hugh Mee Wong
Rick Nouwen
Albert Gatt
46
0
0
17 Feb 2025
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
Ming Shan Hee
Roy Ka-Wei Lee
VLM
75
0
0
16 Feb 2025
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
Zhenxing Mi
Kuan-Chieh Jackson Wang
Guocheng Qian
Hanrong Ye
Runtao Liu
Sergey Tulyakov
Kfir Aberman
Dan Xu
LRM
42
0
0
12 Feb 2025
HCMRM: A High-Consistency Multimodal Relevance Model for Search Ads
Guobing Gan
Kaiming Gao
Li Wang
Shen Jiang
Peng Jiang
59
0
0
09 Feb 2025
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
Tao Zhang
Jinyong Wen
Zhen Chen
Kun Ding
S. Xiang
Chunhong Pan
70
1
0
04 Feb 2025
Towards Robust Multimodal Large Language Models Against Jailbreak Attacks
Towards Robust Multimodal Large Language Models Against Jailbreak Attacks
Ziyi Yin
Yuanpu Cao
Han Liu
Ting Wang
Jinghui Chen
Fenhlong Ma
AAML
47
0
0
02 Feb 2025
Vision-Language Model Selection and Reuse for Downstream Adaptation
Vision-Language Model Selection and Reuse for Downstream Adaptation
Hao-Zhe Tan
Zhi-Hua Zhou
Lan-Zhe Guo
Yu-Feng Li
VLM
88
0
0
30 Jan 2025
Rethinking Encoder-Decoder Flow Through Shared Structures
Rethinking Encoder-Decoder Flow Through Shared Structures
Frederik Laboyrie
M. K. Yucel
Albert Saà-Garriga
AI4CE
40
0
0
24 Jan 2025
Patent Figure Classification using Large Vision-language Models
Patent Figure Classification using Large Vision-language Models
Sushil Awale
Eric Müller-Budack
Ralph Ewerth
31
0
0
22 Jan 2025
1234...91011
Next