ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.07636
  4. Cited By
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
    VLM
    CLIP
ArXivPDFHTML

Papers citing "EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"

50 / 507 papers shown
Title
Large Model for Small Data: Foundation Model for Cross-Modal RF Human
  Activity Recognition
Large Model for Small Data: Foundation Model for Cross-Modal RF Human Activity Recognition
Yuxuan Weng
Guoquan Wu
Tianyue Zheng
Yanbing Yang
Jun-Jie Luo
16
5
0
13 Oct 2024
Conjugated Semantic Pool Improves OOD Detection with Pre-trained
  Vision-Language Models
Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models
Mengyuan Chen
Junyu Gao
Changsheng Xu
VLM
OODD
23
0
0
11 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with
  Mask Referring Modeling
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
21
5
0
10 Oct 2024
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Haoyi Zhu
Honghui Yang
Yating Wang
Jiange Yang
Limin Wang
Tong He
3DH
43
5
0
10 Oct 2024
From Pixels to Tokens: Revisiting Object Hallucinations in Large
  Vision-Language Models
From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models
Yuying Shang
Xinyi Zeng
Yutao Zhu
Xiao Yang
Zhengwei Fang
Jingyuan Zhang
Jiawei Chen
Zinan Liu
Yu Tian
VLM
MLLM
33
1
0
09 Oct 2024
Break the Visual Perception: Adversarial Attacks Targeting Encoded
  Visual Tokens of Large Vision-Language Models
Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models
Yubo Wang
Chaohu Liu
Yanqiu Qu
Haoyu Cao
Deqiang Jiang
Linli Xu
MLLM
AAML
14
3
0
09 Oct 2024
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to
  See
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See
Phu Pham
Phu Pham
Kun Wan
Yu-Jhe Li
Zeliang Zhang
Daniel Miranda
Ajinkya Kale
Ajinkya Kale
Chenliang Xu
22
5
0
08 Oct 2024
AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
Jiaming Zhang
Junhong Ye
Xingjun Ma
Yige Li
Yunfan Yang
Jitao Sang
Dit-Yan Yeung
Dit-Yan Yeung
AAML
VLM
24
0
0
07 Oct 2024
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Wanpeng Zhang
Zilong Xie
Yicheng Feng
Yijiang Li
Xingrun Xing
Sipeng Zheng
Zongqing Lu
MLLM
18
0
0
03 Oct 2024
UlcerGPT: A Multimodal Approach Leveraging Large Language and Vision
  Models for Diabetic Foot Ulcer Image Transcription
UlcerGPT: A Multimodal Approach Leveraging Large Language and Vision Models for Diabetic Foot Ulcer Image Transcription
Reza Basiri
Ali Abedi
Chau Nguyen
Milos R. Popovic
Shehroz S. Khan
LM&MA
MedIm
31
1
0
02 Oct 2024
HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback
  Learning with Vision-enhanced Penalty Decoding
HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding
Fan Yuan
Chi Qin
Xiaogang Xu
Piji Li
VLM
MLLM
17
4
0
30 Sep 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Ye Liu
Zongyang Ma
Zhongang Qi
Yang Wu
Ying Shan
Chang Wen Chen
31
15
0
26 Sep 2024
VLM's Eye Examination: Instruct and Inspect Visual Competency of Vision
  Language Models
VLM's Eye Examination: Instruct and Inspect Visual Competency of Vision Language Models
Nam Hyeon-Woo
Moon Ye-Bin
Wonseok Choi
Lee Hyun
Tae-Hyun Oh
CoGe
23
3
0
23 Sep 2024
Effectively Enhancing Vision Language Large Models by Prompt
  Augmentation and Caption Utilization
Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization
Minyi Zhao
Jie Wang
Z. Li
Jiyuan Zhang
Zhenbang Sun
Shuigeng Zhou
MLLM
VLM
17
0
0
22 Sep 2024
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training
Yiyi Tao
Zhuoyue Wang
Hang Zhang
Lun Wang
VLM
38
13
0
15 Sep 2024
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Dingxin Cheng
Mingda Li
Jingyu Liu
Yongxin Guo
Bin Jiang
Qingbin Liu
Xi Chen
Bo Zhao
22
4
0
10 Sep 2024
Revisiting Prompt Pretraining of Vision-Language Models
Revisiting Prompt Pretraining of Vision-Language Models
Zhenyuan Chen
Lingfeng Yang
Shuo Chen
Zhaowei Chen
Jiajun Liang
Xiang Li
MLLM
VPVLM
VLM
33
1
0
10 Sep 2024
Seeing Through the Mask: Rethinking Adversarial Examples for CAPTCHAs
Seeing Through the Mask: Rethinking Adversarial Examples for CAPTCHAs
Yahya Jabary
Andreas Plesner
Turlan Kuzhagaliyev
Roger Wattenhofer
AAML
18
0
0
09 Sep 2024
Top-GAP: Integrating Size Priors in CNNs for more Interpretability,
  Robustness, and Bias Mitigation
Top-GAP: Integrating Size Priors in CNNs for more Interpretability, Robustness, and Bias Mitigation
Lars Nieradzik
Henrike Stephani
Janis Keuper
FAtt
AAML
36
0
0
07 Sep 2024
Optimizing CLIP Models for Image Retrieval with Maintained
  Joint-Embedding Alignment
Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding Alignment
Konstantin Schall
Kai Uwe Barthel
Nico Hezel
Klaus Jung
VLM
23
3
0
03 Sep 2024
Understanding Multimodal Hallucination with Parameter-Free
  Representation Alignment
Understanding Multimodal Hallucination with Parameter-Free Representation Alignment
Yueqian Wang
Jianxin Liang
Yuxuan Wang
Huishuai Zhang
Dongyan Zhao
39
1
0
02 Sep 2024
IVGF: The Fusion-Guided Infrared and Visible General Framework
IVGF: The Fusion-Guided Infrared and Visible General Framework
Fangcen Liu
Chenqiang Gao
Fang Chen
Pengcheng Li
Junjie Guo
Deyu Meng
29
0
0
02 Sep 2024
HERMES: temporal-coHERent long-forM understanding with Episodes and
  Semantics
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
Gueter Josmy Faure
Jia-Fong Yeh
Min-Hung Chen
Hung-Ting Su
Winston H. Hsu
Shang-Hong Lai
18
3
0
30 Aug 2024
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene
  Understanding
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding
Yonghui Wang
Wengang Zhou
Hao Feng
Houqiang Li
VLM
22
0
0
30 Aug 2024
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi
Fuxiao Liu
Shihao Wang
Shijia Liao
Subhashree Radhakrishnan
...
Andrew Tao
Andrew Tao
Zhiding Yu
Guilin Liu
Guilin Liu
MLLM
18
53
0
28 Aug 2024
A New Era in Computational Pathology: A Survey on Foundation and
  Vision-Language Models
A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models
Dibaloke Chanda
Milan Aryal
Nasim Yahya Soltani
Masoud Ganji
AI4CE
VLM
34
7
0
23 Aug 2024
CathAction: A Benchmark for Endovascular Intervention Understanding
CathAction: A Benchmark for Endovascular Intervention Understanding
Baoru Huang
Tuan Vo
Chayun Kongtongvattana
G. Dagnino
Dennis Kundrat
...
Francisco Vasconcelos
Danail Stoyanov
Daniel Elson
Ferdinando Rodriguez y Baena
Anh Nguyen
26
2
0
23 Aug 2024
Has Multimodal Learning Delivered Universal Intelligence in Healthcare?
  A Comprehensive Survey
Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive Survey
Qika Lin
Yifan Zhu
Xin Mei
Ling Huang
Jingying Ma
Kai He
Zhen Peng
Erik Cambria
Mengling Feng
32
16
0
23 Aug 2024
Semantic Alignment for Multimodal Large Language Models
Semantic Alignment for Multimodal Large Language Models
Tao Wu
Mengze Li
Jingyuan Chen
Wei Ji
Wang Lin
Jinyang Gao
Kun Kuang
Zhou Zhao
Fei Wu
30
3
0
23 Aug 2024
Sapiens: Foundation for Human Vision Models
Sapiens: Foundation for Human Vision Models
Rawal Khirodkar
Timur M. Bagautdinov
Julieta Martinez
Su Zhaoen
Austin James
Peter Selednik
Stuart Anderson
Shunsuke Saito
VLM
31
63
0
22 Aug 2024
Open-Ended 3D Point Cloud Instance Segmentation
Open-Ended 3D Point Cloud Instance Segmentation
Phuc D. A. Nguyen
Minh Luu
Anh Tran
Cuong Pham
Khoi Nguyen
3DPC
40
1
0
21 Aug 2024
Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large
  Language Model Augmented Framework
Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented Framework
Jiandong Jin
Xiao Wang
Qian Zhu
Haiyang Wang
Chenglong Li
VLM
18
4
0
19 Aug 2024
5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual
  Recognition Tasks
5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks
Dongshuo Yin
Leiyi Hu
Bin Li
Youqun Zhang
Xue Yang
24
6
0
15 Aug 2024
Masked Image Modeling: A Survey
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
59
6
0
13 Aug 2024
Multi-scale Contrastive Adaptor Learning for Segmenting Anything in
  Underperformed Scenes
Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes
Ke Zhou
Zhongwei Qiu
Dongmei Fu
VLM
27
1
0
12 Aug 2024
Efficient Test-Time Prompt Tuning for Vision-Language Models
Efficient Test-Time Prompt Tuning for Vision-Language Models
Yuhan Zhu
Guozhen Zhang
Chen Xu
Haocheng Shen
Xiaoxin Chen
Gangshan Wu
Limin Wang
VLM
27
2
0
11 Aug 2024
Efficient Diffusion Transformer with Step-wise Dynamic Attention
  Mediators
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
Yifan Pu
Zhuofan Xia
Jiayi Guo
Dongchen Han
Qixiu Li
...
Ji Li
Yizeng Han
Shiji Song
Gao Huang
Xiu Li
53
11
0
11 Aug 2024
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond
  Scaling
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling
Haider Al-Tahan
Q. Garrido
Randall Balestriero
Diane Bouchacourt
C. Hazirbas
Mark Ibrahim
VLM
44
10
0
09 Aug 2024
How Well Can Vision Language Models See Image Details?
How Well Can Vision Language Models See Image Details?
Chenhui Gou
Abdulwahab Felemban
Faizan Farooq Khan
Deyao Zhu
Jianfei Cai
Hamid Rezatofighi
Mohamed Elhoseiny
VLM
MLLM
47
4
0
07 Aug 2024
A Novel Evaluation Framework for Image2Text Generation
A Novel Evaluation Framework for Image2Text Generation
Jia-Hong Huang
Hongyi Zhu
Yixian Shen
S. Rudinac
A. M. Pacces
Evangelos Kanoulas
29
7
0
03 Aug 2024
Multi-Frame Vision-Language Model for Long-form Reasoning in Driver
  Behavior Analysis
Multi-Frame Vision-Language Model for Long-form Reasoning in Driver Behavior Analysis
Hiroshi Takato
Hiroshi Tsutsui
Komei Soda
Hidetaka Kamigaito
VLM
16
0
0
03 Aug 2024
EZSR: Event-based Zero-Shot Recognition
EZSR: Event-based Zero-Shot Recognition
Yan Yang
Sehwan Kim
Dongxu Li
Y. Sun
26
0
0
31 Jul 2024
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual
  Question Answering for Autonomous Driving
SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving
Peiru Zheng
Yun Zhao
Zhan Gong
Hong Zhu
Shaohua Wu
MLLM
19
6
0
31 Jul 2024
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language
  Models
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models
Ali Abdollahi
Mahdi Ghaznavi
Mohammad Reza Karimi Nejad
Arash Mari Oriyad
Reza Abbasi
Ali Salesi
Melika Behjati
M. Rohban
M. Baghshah
CoGe
26
1
0
30 Jul 2024
UniProcessor: A Text-induced Unified Low-level Image Processor
UniProcessor: A Text-induced Unified Low-level Image Processor
Huiyu Duan
Xiongkuo Min
Sijing Wu
Wei Shen
Guangtao Zhai
DiffM
34
8
0
30 Jul 2024
Learning Spectral-Decomposed Tokens for Domain Generalized Semantic
  Segmentation
Learning Spectral-Decomposed Tokens for Domain Generalized Semantic Segmentation
Jingjun Yi
Qi Bi
Hao Zheng
Haolan Zhan
Wei Ji
Yawen Huang
Yuexiang Li
Yefeng Zheng
18
8
0
26 Jul 2024
Cost-effective Instruction Learning for Pathology Vision and Language
  Analysis
Cost-effective Instruction Learning for Pathology Vision and Language Analysis
Kaitao Chen
Mianxin Liu
Fang Yan
Lei Ma
Xiaoming Shi
...
Xiaosong Wang
Lifeng Zhu
Zhe Wang
Mu Zhou
Shaoting Zhang
30
3
0
25 Jul 2024
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li
Junfeng Wu
Weizhi Zhao
Song Bai
Xiang Bai
25
0
0
23 Jul 2024
QPT V2: Masked Image Modeling Advances Visual Scoring
QPT V2: Masked Image Modeling Advances Visual Scoring
Qizhi Xie
Kun Yuan
Yunpeng Qu
Mingda Wu
Ming-hui Sun
Chao Zhou
Jihong Zhu
19
3
0
23 Jul 2024
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
Pengfei Chen
Lingxi Xie
Xinyue Huo
Xuehui Yu
Xiaopeng Zhang
Yingfei Sun
Zhenjun Han
Qi Tian
VLM
53
1
0
23 Jul 2024
Previous
123456...91011
Next