ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.07636
  4. Cited By
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
v1v2 (latest)

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

Computer Vision and Pattern Recognition (CVPR), 2022
14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
    VLMCLIP
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (2496★)

Papers citing "EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"

50 / 579 papers shown
Video-Text Dataset Construction from Multi-AI Feedback: Promoting
  Weak-to-Strong Preference Learning for Video Large Language Models
Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
Hao Yi
Qingyang Li
Yihan Hu
Fuzheng Zhang
Di Zhang
Yong Liu
VGen
355
0
0
25 Nov 2024
LibraGrad: Balancing Gradient Flow for Universally Better Vision
  Transformer Attributions
LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer AttributionsComputer Vision and Pattern Recognition (CVPR), 2024
Faridoun Mehri
Mahdieh Soleymani Baghshah
Mohammad Taher Pilehvar
293
3
0
24 Nov 2024
ReWind: Understanding Long Videos with Instructed Learnable Memory
ReWind: Understanding Long Videos with Instructed Learnable MemoryComputer Vision and Pattern Recognition (CVPR), 2024
Anxhelo Diko
Tinghuai Wang
Wassim Swaileh
Shiyan Sun
Ioannis Patras
KELMVLM
370
4
0
23 Nov 2024
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention LensComputer Vision and Pattern Recognition (CVPR), 2024
Zhangqi Jiang
Junkai Chen
Beier Zhu
Tingjin Luo
Yankun Shen
Xu Yang
528
49
0
23 Nov 2024
Chanel-Orderer: A Channel-Ordering Predictor for Tri-Channel Natural
  Images
Chanel-Orderer: A Channel-Ordering Predictor for Tri-Channel Natural Images
Shen Li
Lei Jiang
Wei Wang
Hongwei Hu
Liang Li
347
0
0
20 Nov 2024
Generative Timelines for Instructed Visual Assembly
Generative Timelines for Instructed Visual Assembly
Alejandro Pardo
Jui-hsien Wang
Guohao Li
Josef Sivic
Bryan C. Russell
Fabian Caba Heilbron
VGen
272
0
0
19 Nov 2024
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
664
2
0
15 Nov 2024
Classification Done Right for Vision-Language Pre-Training
Classification Done Right for Vision-Language Pre-TrainingNeural Information Processing Systems (NeurIPS), 2024
Zilong Huang
Qinghao Ye
Bingyi Kang
Jiashi Feng
Haoqi Fan
CLIPVLM
419
7
0
05 Nov 2024
UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models
UniGuard: Towards Universal Safety Guardrails for Jailbreak Attacks on Multimodal Large Language Models
Sejoon Oh
Yiqiao Jin
Megha Sharma
Donghyun Kim
Eric Ma
Gaurav Verma
Srijan Kumar
332
12
0
03 Nov 2024
Tracking one-in-a-million: Large-scale benchmark for microbial
  single-cell tracking with experiment-aware robustness metrics
Tracking one-in-a-million: Large-scale benchmark for microbial single-cell tracking with experiment-aware robustness metrics
J. Seiffarth
L. Blöbaum
R. D. Paul
N. Friederich
A. J. Yamachui Sitcheu
R. Mikut
H. Scharr
A. Grünberger
K. Nöh
221
5
0
01 Nov 2024
Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models
  for Medical Visual Grounding
Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual GroundingIEEE International Symposium on Biomedical Imaging (ISBI), 2024
Jinlong He
Pengfei Li
Gang Liu
Shenjun Zhong
233
5
0
31 Oct 2024
Multilingual Vision-Language Pre-training for the Remote Sensing Domain
Multilingual Vision-Language Pre-training for the Remote Sensing Domain
João Daniel Silva
João Magalhães
D. Tuia
Bruno Martins
CLIPVLM
236
6
0
30 Oct 2024
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous
  Driving
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Bo Jiang
Shaoyu Chen
Bencheng Liao
Xingyu Zhang
Wei Yin
Qian Zhang
Chang Huang
Wen Liu
Xinyu Wang
VLMMLLMLRM
307
77
0
29 Oct 2024
Your Image is Secretly the Last Frame of a Pseudo Video
Your Image is Secretly the Last Frame of a Pseudo Video
Wenlong Chen
Wenlin Chen
Lapo Rastrelli
Yingzhen Li
DiffMVGen
377
0
0
26 Oct 2024
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Kim Sung-Bin
Oh Hyun-Bin
JungMok Lee
Arda Senocak
Joon Son Chung
Tae-Hyun Oh
MLLMVLM
445
15
0
23 Oct 2024
PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context
PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context
Maximilian Augustin
Syed Shakib Sarwar
Mostafa Elhoushi
Sai Qian Zhang
Yuecheng Li
B. D. Salvo
253
1
0
23 Oct 2024
Towards Real Zero-Shot Camouflaged Object Segmentation without
  Camouflaged Annotations
Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged AnnotationsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Cheng Lei
Jie Fan
Xinran Li
Tianzhu Xiang
Ao Li
Ce Zhu
Le Zhang
277
4
0
22 Oct 2024
Zero-Shot Scene Reconstruction from Single Images with Deep Prior
  Assembly
Zero-Shot Scene Reconstruction from Single Images with Deep Prior AssemblyNeural Information Processing Systems (NeurIPS), 2024
Junsheng Zhou
Yu-Shen Liu
Zhizhong Han
ViT
284
21
0
21 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness
TIPS: Text-Image Pretraining with Spatial awarenessInternational Conference on Learning Representations (ICLR), 2024
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
439
17
0
21 Oct 2024
A Survey of Hallucination in Large Visual Language Models
A Survey of Hallucination in Large Visual Language Models
Wei Lan
Wenyi Chen
Qingfeng Chen
Shirui Pan
Huiyu Zhou
Yi-Lun Pan
LRM
315
11
0
20 Oct 2024
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping
  Language-Image Pre-training
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-trainingIEEE transactions on multimedia (IEEE TMM), 2024
Muhe Ding
Yang Ma
Pengda Qin
Yue Yu
Yuhong Li
Liqiang Nie
244
4
0
18 Oct 2024
ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
Yin Xie
Kaicheng Yang
Ninghua Yang
Weimo Deng
Xiangzi Dai
Tiancheng Gu
Yumeng Wang
Xiang An
Yongle Zhao
Ziyong Feng
MLLMVLM
360
1
0
18 Oct 2024
Fusion from Decomposition: A Self-Supervised Approach for Image Fusion
  and Beyond
Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond
Pengwei Liang
Junjun Jiang
Qing Ma
Xianming Liu
Jiayi Ma
219
5
0
16 Oct 2024
VidCompress: Memory-Enhanced Temporal Compression for Video
  Understanding in Large Language Models
VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models
Xiaohan Lan
Yitian Yuan
Zequn Jie
Lin Ma
VLM
181
4
0
15 Oct 2024
Automatically Generating Visual Hallucination Test Cases for Multimodal
  Large Language Models
Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models
Zhongye Liu
Hongbin Liu
Yuepeng Hu
Zedian Shao
Neil Zhenqiang Gong
VLMMLLM
158
1
0
15 Oct 2024
Browsing without Third-Party Cookies: What Do You See?
Browsing without Third-Party Cookies: What Do You See?ACM/SIGCOMM Internet Measurement Conference (IMC), 2024
Maxwell Lin
Shihan Lin
Helen Wu
Karen Wang
Xiaowei Yang
BDL
477
45
0
14 Oct 2024
Locality Alignment Improves Vision-Language Models
Locality Alignment Improves Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Ian Covert
Tony Sun
James Zou
Tatsunori Hashimoto
VLM
592
11
0
14 Oct 2024
Large-Scale 3D Medical Image Pre-training with Geometric Context Priors
Large-Scale 3D Medical Image Pre-training with Geometric Context Priors
Linshan Wu
Jiaxin Zhuang
Hao Chen
220
20
0
13 Oct 2024
Large Model for Small Data: Foundation Model for Cross-Modal RF Human
  Activity Recognition
Large Model for Small Data: Foundation Model for Cross-Modal RF Human Activity RecognitionACM International Conference on Embedded Networked Sensor Systems (SenSys), 2024
Yuxuan Weng
Guoquan Wu
Tianyue Zheng
Yanbing Yang
Jun Luo
261
17
0
13 Oct 2024
Conjugated Semantic Pool Improves OOD Detection with Pre-trained
  Vision-Language Models
Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language ModelsNeural Information Processing Systems (NeurIPS), 2024
Mengyuan Chen
Junyu Gao
Changsheng Xu
VLMOODD
332
12
0
11 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with
  Mask Referring Modeling
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring ModelingNeural Information Processing Systems (NeurIPS), 2024
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
434
22
0
10 Oct 2024
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
SPA: 3D Spatial-Awareness Enables Effective Embodied RepresentationInternational Conference on Learning Representations (ICLR), 2024
Haoyi Zhu
Honghui Yang
Yating Wang
Jiange Yang
Limin Wang
Tong He
3DH
381
23
0
10 Oct 2024
Break the Visual Perception: Adversarial Attacks Targeting Encoded
  Visual Tokens of Large Vision-Language Models
Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language ModelsACM Multimedia (MM), 2024
Yubo Wang
Chaohu Liu
Yanqiu Qu
Haoyu Cao
Deqiang Jiang
Linli Xu
MLLMAAML
150
15
0
09 Oct 2024
From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models
From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models
Yuying Shang
Xinyi Zeng
Yutao Zhu
Xiao Yang
Zhengwei Fang
Jingyuan Zhang
Jiawei Chen
Zinan Liu
Yu Tian
VLMMLLM
817
2
0
09 Oct 2024
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to
  See
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See
Phu Pham
Phu Pham
Kun Wan
Yu-Jhe Li
Zeliang Zhang
Daniel Miranda
Ajinkya Kale
Ajinkya Kale
Chenliang Xu
249
1
0
08 Oct 2024
AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Jiaming Zhang
Junhong Ye
Xingjun Ma
Yige Li
Yunfan Yang
Jitao Sang
Dit-Yan Yeung
Dit-Yan Yeung
AAMLVLM
302
0
0
07 Oct 2024
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual ModalitiesInternational Conference on Learning Representations (ICLR), 2024
Wanpeng Zhang
Zilong Xie
Yicheng Feng
Yijiang Li
Xingrun Xing
Sipeng Zheng
Zongqing Lu
MLLM
355
10
0
03 Oct 2024
UlcerGPT: A Multimodal Approach Leveraging Large Language and Vision
  Models for Diabetic Foot Ulcer Image Transcription
UlcerGPT: A Multimodal Approach Leveraging Large Language and Vision Models for Diabetic Foot Ulcer Image TranscriptionInternational Conference on Pattern Recognition (ICPR), 2024
Reza Basiri
Ali Abedi
Chau Nguyen
Milos R. Popovic
Shehroz S. Khan
LM&MAMedIm
98
4
0
02 Oct 2024
EMMA: Efficient Visual Alignment in Multi-Modal LLMs
EMMA: Efficient Visual Alignment in Multi-Modal LLMs
Sara Ghazanfari
Alexandre Araujo
Prashanth Krishnamurthy
Siddharth Garg
Farshad Khorrami
VLM
301
7
0
02 Oct 2024
HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback
  Learning with Vision-enhanced Penalty Decoding
HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty DecodingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Fan Yuan
Chi Qin
Xiaogang Xu
Piji Li
VLMMLLM
162
9
0
30 Sep 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
E.T. Bench: Towards Open-Ended Event-Level Video-Language UnderstandingNeural Information Processing Systems (NeurIPS), 2024
Ye Liu
Zongyang Ma
Chen Ma
Yang Wu
Ying Shan
Chang Wen Chen
267
52
0
26 Sep 2024
VLM's Eye Examination: Instruct and Inspect Visual Competency of Vision
  Language Models
VLM's Eye Examination: Instruct and Inspect Visual Competency of Vision Language Models
Nam Hyeon-Woo
Moon Ye-Bin
Wonseok Choi
Lee Hyun
Tae-Hyun Oh
CoGe
249
3
0
23 Sep 2024
Effectively Enhancing Vision Language Large Models by Prompt
  Augmentation and Caption Utilization
Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization
Minyi Zhao
Jie Wang
Zerui Li
Jiyuan Zhang
Zhenbang Sun
Shuigeng Zhou
MLLMVLM
319
3
0
22 Sep 2024
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training
Yiyi Tao
Zhuoyue Wang
Hang Zhang
Lun Wang
VLM
326
28
0
15 Sep 2024
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Dingxin Cheng
Mingda Li
Jingyu Liu
Yongxin Guo
Bin Jiang
Qingbin Liu
Xi Chen
Bo Zhao
271
12
0
10 Sep 2024
Revisiting Prompt Pretraining of Vision-Language Models
Revisiting Prompt Pretraining of Vision-Language Models
Zhenyuan Chen
Lingfeng Yang
Shuo Chen
Zhaowei Chen
Jiajun Liang
Xiang Li
MLLMVPVLMVLM
363
4
0
10 Sep 2024
Seeing Through the Mask: Rethinking Adversarial Examples for CAPTCHAs
Seeing Through the Mask: Rethinking Adversarial Examples for CAPTCHAs
Yahya Jabary
Andreas Plesner
Turlan Kuzhagaliyev
Roger Wattenhofer
AAML
195
2
0
09 Sep 2024
Top-GAP: Integrating Size Priors in CNNs for more Interpretability,
  Robustness, and Bias Mitigation
Top-GAP: Integrating Size Priors in CNNs for more Interpretability, Robustness, and Bias Mitigation
Lars Nieradzik
Henrike Stephani
Janis Keuper
FAttAAML
249
1
0
07 Sep 2024
Optimizing CLIP Models for Image Retrieval with Maintained
  Joint-Embedding Alignment
Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding AlignmentSimilarity Search and Applications (SISAP), 2024
Konstantin Schall
Kai Uwe Barthel
Nico Hezel
Klaus Jung
VLM
291
8
0
03 Sep 2024
Understanding Multimodal Hallucination with Parameter-Free
  Representation Alignment
Understanding Multimodal Hallucination with Parameter-Free Representation Alignment
Yueqian Wang
Jianxin Liang
Yuxuan Wang
Huishuai Zhang
Dongyan Zhao
237
2
0
02 Sep 2024
Previous
12345...101112
Next