ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.07636
  4. Cited By
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
v1v2 (latest)

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Computer Vision and Pattern Recognition (CVPR), 2022
14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
    VLMCLIP
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (2496★)

Papers citing "EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"

50 / 579 papers shown
IVGF: The Fusion-Guided Infrared and Visible General Framework
IVGF: The Fusion-Guided Infrared and Visible General Framework
Fangcen Liu
Chenqiang Gao
Fang Chen
Pengcheng Li
Junjie Guo
Deyu Meng
376
1
0
02 Sep 2024
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene
  Understanding
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding
Yonghui Wang
Wengang Zhou
Hao Feng
Houqiang Li
VLM
163
1
0
30 Aug 2024
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
Gueter Josmy Faure
Jia-Fong Yeh
Min-Hung Chen
Hung-Ting Su
S. Lai
Winston H. Hsu
421
0
0
30 Aug 2024
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi
Fuxiao Liu
Shihao Wang
Shijia Liao
Subhashree Radhakrishnan
...
Andrew Tao
Andrew Tao
Zhiding Yu
Guilin Liu
Guilin Liu
MLLM
402
115
0
28 Aug 2024
A New Era in Computational Pathology: A Survey on Foundation and
  Vision-Language Models
A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models
Dibaloke Chanda
Milan Aryal
Nasim Yahya Soltani
Masoud Ganji
AI4CEVLM
426
11
0
23 Aug 2024
CathAction: A Benchmark for Endovascular Intervention Understanding
CathAction: A Benchmark for Endovascular Intervention Understanding
Baoru Huang
Tuan Vo
Chayun Kongtongvattana
G. Dagnino
Dennis Kundrat
...
Francisco Vasconcelos
Danail Stoyanov
Daniel Elson
Ferdinando Rodriguez y Baena
Anh Nguyen
192
6
0
23 Aug 2024
Has Multimodal Learning Delivered Universal Intelligence in Healthcare?
  A Comprehensive Survey
Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive SurveyInformation Fusion (Inf. Fusion), 2024
Qika Lin
Yifan Zhu
Xin Mei
Ling Huang
Jingying Ma
Kai He
Zhen Peng
Xiaoshi Zhong
Mengling Feng
287
61
0
23 Aug 2024
Semantic Alignment for Multimodal Large Language Models
Semantic Alignment for Multimodal Large Language ModelsACM Multimedia (MM), 2024
Tao Wu
Mengze Li
Jingyuan Chen
Wei Ji
Wang Lin
Jinyang Gao
Kun Kuang
Zhou Zhao
Fei Wu
211
18
0
23 Aug 2024
Sapiens: Foundation for Human Vision Models
Sapiens: Foundation for Human Vision ModelsEuropean Conference on Computer Vision (ECCV), 2024
Rawal Khirodkar
Timur M. Bagautdinov
Julieta Martinez
Su Zhaoen
Austin James
Peter Selednik
Stuart Anderson
Forrest Iandola
VLM
442
167
0
22 Aug 2024
OE3DIS: Open-Ended 3D Point Cloud Instance Segmentation
OE3DIS: Open-Ended 3D Point Cloud Instance Segmentation
P. Nguyen
Minh Luu
Anh Tran
Cuong Pham
Khoi Duc Minh Nguyen
3DPC
286
1
0
21 Aug 2024
Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large
  Language Model Augmented Framework
Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented FrameworkAAAI Conference on Artificial Intelligence (AAAI), 2024
Jiandong Jin
Xiao Wang
Qian Zhu
Haiyang Wang
Chenglong Li
VLM
175
12
0
19 Aug 2024
5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual
  Recognition Tasks
5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition TasksComputer Vision and Pattern Recognition (CVPR), 2024
Dongshuo Yin
Leiyi Hu
Bin Li
Youqun Zhang
Xue Yang
388
37
0
15 Aug 2024
Masked Image Modeling: A Survey
Masked Image Modeling: A SurveyInternational Journal of Computer Vision (IJCV), 2024
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
Andrii Zadaianchuk
481
20
0
13 Aug 2024
Multi-scale Contrastive Adaptor Learning for Segmenting Anything in
  Underperformed Scenes
Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes
Ke Zhou
Zhongwei Qiu
Dongmei Fu
VLM
196
5
0
12 Aug 2024
Efficient Test-Time Prompt Tuning for Vision-Language Models
Efficient Test-Time Prompt Tuning for Vision-Language Models
Yuhan Zhu
Guozhen Zhang
Chen Xu
Haocheng Shen
Xiaoxin Chen
Gangshan Wu
Limin Wang
VLM
281
8
0
11 Aug 2024
Efficient Diffusion Transformer with Step-wise Dynamic Attention
  Mediators
Efficient Diffusion Transformer with Step-wise Dynamic Attention MediatorsEuropean Conference on Computer Vision (ECCV), 2024
Yifan Pu
Zhuofan Xia
Jiayi Guo
Dongchen Han
Qixiu Li
...
Ji Li
Yizeng Han
Shiji Song
Gao Huang
Xiu Li
331
22
0
11 Aug 2024
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond
  Scaling
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond ScalingNeural Information Processing Systems (NeurIPS), 2024
Haider Al-Tahan
Q. Garrido
Randall Balestriero
Diane Bouchacourt
C. Hazirbas
Mark Ibrahim
VLM
283
22
0
09 Aug 2024
How Well Can Vision Language Models See Image Details?
How Well Can Vision Language Models See Image Details?
Chenhui Gou
Abdulwahab Felemban
Faizan Farooq Khan
Deyao Zhu
Jianfei Cai
Hamid Rezatofighi
Mohamed Elhoseiny
VLMMLLM
236
12
0
07 Aug 2024
A Novel Evaluation Framework for Image2Text Generation
A Novel Evaluation Framework for Image2Text Generation
Jia-Hong Huang
Hongyi Zhu
Yixian Shen
Stevan Rudinac
A. M. Pacces
Evangelos Kanoulas
237
11
0
03 Aug 2024
Multi-Frame Vision-Language Model for Long-form Reasoning in Driver
  Behavior Analysis
Multi-Frame Vision-Language Model for Long-form Reasoning in Driver Behavior Analysis
Hiroshi Takato
Hiroshi Tsutsui
Komei Soda
Hidetaka Kamigaito
VLM
258
2
0
03 Aug 2024
EZSR: Event-based Zero-Shot Recognition
EZSR: Event-based Zero-Shot Recognition
Yan Yang
Sehwan Kim
Dongxu Li
Y. Sun
243
2
0
31 Jul 2024
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual
  Question Answering for Autonomous Driving
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving
Peiru Zheng
Yun Zhao
Zhan Gong
Hong Zhu
Shaohua Wu
MLLM
249
13
0
31 Jul 2024
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language
  Models
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models
Ali Abdollahi
Mahdi Ghaznavi
Mohammad Reza Karimi Nejad
Arash Mari Oriyad
Reza Abbasi
Ali Salesi
Melika Behjati
M. Rohban
M. Baghshah
CoGe
398
3
0
30 Jul 2024
UniProcessor: A Text-induced Unified Low-level Image Processor
UniProcessor: A Text-induced Unified Low-level Image Processor
Huiyu Duan
Xiongkuo Min
Sijing Wu
Wei Shen
Guangtao Zhai
DiffM
190
15
0
30 Jul 2024
Learning Spectral-Decomposed Tokens for Domain Generalized Semantic
  Segmentation
Learning Spectral-Decomposed Tokens for Domain Generalized Semantic Segmentation
Jingjun Yi
Qi Bi
Hao Zheng
Haolan Zhan
Wei Ji
Yawen Huang
Yuexiang Li
Yefeng Zheng
291
29
0
26 Jul 2024
Cost-effective Instruction Learning for Pathology Vision and Language Analysis
Cost-effective Instruction Learning for Pathology Vision and Language Analysis
Kaitao Chen
Mianxin Liu
Fang Yan
Lei Ma
Xiaoming Shi
...
Xiaosong Wang
Lifeng Zhu
Zhe Wang
Mu Zhou
Shaoting Zhang
336
6
0
25 Jul 2024
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li
Junfeng Wu
Weizhi Zhao
Song Bai
Xiang Bai
229
13
0
23 Jul 2024
QPT V2: Masked Image Modeling Advances Visual Scoring
QPT V2: Masked Image Modeling Advances Visual Scoring
Qizhi Xie
Kun Yuan
Yunpeng Qu
Mingda Wu
Ming Sun
Chao Zhou
Jihong Zhu
233
6
0
23 Jul 2024
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
Pengfei Chen
Lingxi Xie
Xinyue Huo
Xuehui Yu
Xiaopeng Zhang
Yingfei Sun
Zhenjun Han
Qi Tian
VLM
481
5
0
23 Jul 2024
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with
  Extensive Diversity
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Yangzhou Liu
Yue Cao
Zhangwei Gao
Weiyun Wang
Zhe Chen
...
Lewei Lu
Xizhou Zhu
Tong Lu
Yu Qiao
Jifeng Dai
VLMMLLM
311
41
0
22 Jul 2024
End-to-End Video Question Answering with Frame Scoring Mechanisms and
  Adaptive Sampling
End-to-End Video Question Answering with Frame Scoring Mechanisms and Adaptive Sampling
Jianxin Liang
Xiaojun Meng
Yueqian Wang
Chang Liu
Qun Liu
Dongyan Zhao
197
12
0
21 Jul 2024
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
S. Swetha
Jinyu Yang
T. Neiman
Mamshad Nayeem Rizve
Son Tran
Benjamin Z. Yao
Trishul Chilimbi
Mubarak Shah
268
9
0
18 Jul 2024
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large
  Language Models
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
Leyang Shen
Gongwei Chen
Rui Shao
Weili Guan
Liqiang Nie
MoE
197
34
0
17 Jul 2024
NavGPT-2: Unleashing Navigational Reasoning Capability for Large
  Vision-Language Models
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
Gengze Zhou
Yicong Hong
Zun Wang
Xin Eric Wang
Qi Wu
LM&Ro
312
74
0
17 Jul 2024
A Closer Look at Benchmarking Self-Supervised Pre-training with Image
  Classification
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification
Markus Marks
Manuel Knott
Neehar Kondapaneni
Elijah Cole
T. Defraeye
Fernando Pérez-Cruz
Pietro Perona
SSL
399
15
0
16 Jul 2024
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
Zehan Wang
Ziang Zhang
Hang Zhang
Luping Liu
Rongjie Huang
Xize Cheng
Hengshuang Zhao
Zhou Zhao
297
26
0
16 Jul 2024
Refusing Safe Prompts for Multi-modal Large Language Models
Refusing Safe Prompts for Multi-modal Large Language Models
Zedian Shao
Hongbin Liu
Yuepeng Hu
Neil Zhenqiang Gong
MLLMLRM
218
4
0
12 Jul 2024
Textual Query-Driven Mask Transformer for Domain Generalized
  Segmentation
Textual Query-Driven Mask Transformer for Domain Generalized Segmentation
Byeonghyun Pak
Byeongju Woo
Sunghwan Kim
Dae-Hwan Kim
Hoseong Kim
420
20
0
12 Jul 2024
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal
  Perception
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Xiaotong Li
Fan Zhang
Haiwen Diao
Yueze Wang
Xinlong Wang
Ling-yu Duan
VLM
339
49
0
11 Jul 2024
Bayesian Detector Combination for Object Detection with Crowdsourced
  Annotations
Bayesian Detector Combination for Object Detection with Crowdsourced Annotations
Zhi Qin Tan
Olga Isupova
Gustavo Carneiro
Xiatian Zhu
Yunpeng Li
ObjD
197
1
0
10 Jul 2024
SHERL: Synthesizing High Accuracy and Efficient Memory for
  Resource-Limited Transfer Learning
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
Haiwen Diao
Bo Wan
Xu Jia
Yunzhi Zhuge
Ying Zhang
Huchuan Lu
Long Chen
VLM
240
11
0
10 Jul 2024
A Survey of Attacks on Large Vision-Language Models: Resources,
  Advances, and Future Trends
A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends
Daizong Liu
Mingyu Yang
Xiaoye Qu
Pan Zhou
Yu Cheng
Wei Hu
ELMAAML
344
73
0
10 Jul 2024
VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle
  Asset Generation in Autonomous Driving
VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving
Yibo Liu
Zheyuan Yang
Guile Wu
Y. Ren
Kejian Lin
Bingbing Liu
Yang Liu
Jinjun Shan
254
10
0
09 Jul 2024
AWT: Transferring Vision-Language Models via Augmentation, Weighting,
  and Transportation
AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
Yuhan Zhu
Yuyang Ji
Zhiyu Zhao
Gangshan Wu
Limin Wang
VLM
321
24
0
05 Jul 2024
ArAIEval Shared Task: Propagandistic Techniques Detection in Unimodal
  and Multimodal Arabic Content
ArAIEval Shared Task: Propagandistic Techniques Detection in Unimodal and Multimodal Arabic Content
Maram Hasanain
Md. Arid Hasan
Fatema Ahmed
Reem Suwaileh
Wajdi Zaghouani
Wajdi Zaghouani
Firoj Alam
VLM
220
25
0
05 Jul 2024
Precision at Scale: Domain-Specific Datasets On-Demand
Precision at Scale: Domain-Specific Datasets On-Demand
Jesús M. Rodríguez-de-Vera
Imanol G. Estepa
Ignacio Sarasúa
Bhalaji Nagarajan
Petia Radeva
250
3
0
03 Jul 2024
FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models
FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models
Ruinan Jin
Zikang Xu
Yuan Zhong
Qiongsong Yao
Qi Dou
S. Kevin Zhou
Xiaoxiao Li
VLM
395
43
0
01 Jul 2024
From Local Concepts to Universals: Evaluating the Multicultural
  Understanding of Vision-Language Models
From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models
Mehar Bhatia
Sahithya Ravi
Aditya Chinchure
EunJeong Hwang
Vered Shwartz
VLM
334
10
0
28 Jun 2024
Chrono: A Simple Blueprint for Representing Time in MLLMs
Chrono: A Simple Blueprint for Representing Time in MLLMs
Meinardus Boris
Batra Anil
Rohrbach Anna
Rohrbach Marcus
Marcus Rohrbach
MLLMVLM
588
4
0
26 Jun 2024
African or European Swallow? Benchmarking Large Vision-Language Models
  for Fine-Grained Object Classification
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification
Gregor Geigle
Radu Timofte
Goran Glavaš
217
18
0
20 Jun 2024
Previous
123456...101112
Next