Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.07636
Cited By
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"
50 / 507 papers shown
Title
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
64
41
0
23 May 2024
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
34
2
0
22 May 2024
Influence of Water Droplet Contamination for Transparency Segmentation
Volker Knauthe
Paul Weitz
Thomas Pollabauer
Tristan Wirth
Arne Rak
Arjan Kuijper
Dieter W. Fellner
28
1
0
21 May 2024
OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models
Zhaojian Yu
Yinghao Wu
Zhuotao Deng
Yansong Tang
Xiao-Ping Zhang
39
2
0
21 May 2024
Hierarchical Selective Classification
Shani Goren
Ido Galil
Ran El-Yaniv
BDL
44
1
0
19 May 2024
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
39
44
0
17 May 2024
Compressive Feature Selection for Remote Visual Multi-Task Inference
Saeed Ranjbar Alvar
Ivan V. Bajić
20
0
0
15 May 2024
FreeVA: Offline MLLM as Training-Free Video Assistant
Wenhao Wu
VLM
OffRL
32
19
0
13 May 2024
EVA-X: A Foundation Model for General Chest X-ray Analysis with Self-supervised Learning
Jingfeng Yao
Xinggang Wang
Yuehao Song
Huangxuan Zhao
Jun Ma
Yajie Chen
Wenyu Liu
Bo Wang
ViT
21
5
0
08 May 2024
Selective Classification Under Distribution Shifts
Hengyue Liang
Le Peng
Ju Sun
UQCV
31
1
0
08 May 2024
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Prannay Kaul
Zhizhong Li
Hao-Yu Yang
Yonatan Dukler
Ashwin Swaminathan
C. Taylor
Stefano Soatto
HILM
46
15
0
08 May 2024
Auto-Encoding Morph-Tokens for Multimodal LLM
Kaihang Pan
Siliang Tang
Juncheng Li
Zhaoyu Fan
Wei Chow
Shuicheng Yan
Tat-Seng Chua
Yueting Zhuang
Hanwang Zhang
MLLM
28
16
0
03 May 2024
Multi-modal Learnable Queries for Image Aesthetics Assessment
Zhiwei Xiong
Yunfan Zhang
Zhiqi Shen
Peiran Ren
Han Yu
EGVM
28
1
0
02 May 2024
Towards Incremental Learning in Large Language Models: A Critical Review
M. Jovanovic
Peter Voss
ELM
CLL
KELM
26
5
0
28 Apr 2024
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
Enxin Song
Wenhao Chai
Tianbo Ye
Jenq-Neng Hwang
Xi Li
Gaoang Wang
VLM
MLLM
22
28
0
26 Apr 2024
Leveraging Large Language Models for Multimodal Search
Oriol Barbany
Michael Huang
Xinliang Zhu
Arnab Dhua
20
8
0
24 Apr 2024
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval
Young Kyun Jang
Donghyun Kim
Zihang Meng
Dat Huynh
Ser-Nam Lim
35
11
0
23 Apr 2024
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
36
20
0
22 Apr 2024
Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering
Dongze Hao
Qunbo Wang
Longteng Guo
Jie Jiang
Jing Liu
31
0
0
22 Apr 2024
EventLens: Leveraging Event-Aware Pretraining and Cross-modal Linking Enhances Visual Commonsense Reasoning
Mingjie Ma
Zhihuan Yu
Yichao Ma
Guohui Li
LRM
33
1
0
22 Apr 2024
Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers
Georgios Pantazopoulos
Alessandro Suglia
Oliver Lemon
Arash Eshghi
VLM
16
4
0
21 Apr 2024
Dynamic in Static: Hybrid Visual Correspondence for Self-Supervised Video Object Segmentation
Gensheng Pei
Yazhou Yao
Jianbo Jiao
Wenguan Wang
Liqiang Nie
Jinhui Tang
VOS
32
1
0
21 Apr 2024
BLINK: Multimodal Large Language Models Can See but Not Perceive
Xingyu Fu
Yushi Hu
Bangzheng Li
Yu Feng
Haoyu Wang
Xudong Lin
Dan Roth
Noah A. Smith
Wei-Chiu Ma
Ranjay Krishna
VLM
LRM
MLLM
41
107
0
18 Apr 2024
Semantic-Based Active Perception for Humanoid Visual Tasks with Foveal Sensors
Joao Luzio
Alexandre Bernardino
Plinio Moreno
22
0
0
16 Apr 2024
MEEL: Multi-Modal Event Evolution Learning
Zhengwei Tao
Zhi Jin
Junqiang Huang
Xiancai Chen
Xiaoying Bai
Haiyan Zhao
Yifan Zhang
Chongyang Tao
21
1
0
16 Apr 2024
HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision
Siddhant Bansal
Michael Wray
Dima Damen
31
3
0
15 Apr 2024
GLID: Pre-training a Generalist Encoder-Decoder Vision Model
Jihao Liu
Jinliang Zheng
Yu Liu
Hongsheng Li
VLM
19
3
0
11 Apr 2024
BRAVE: Broadening the visual encoding of vision-language models
Ouguzhan Fatih Kar
A. Tonioni
Petra Poklukar
Achin Kulshrestha
Amir Zamir
Federico Tombari
MLLM
VLM
42
25
0
10 Apr 2024
SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving
Diankun Zhang
Guoan Wang
Runwen Zhu
Jianbo Zhao
Xiwu Chen
...
Haotian Yao
Chi Zhang
Xiaojun Liu
Xiaoguang Di
Bin Li
26
10
0
10 Apr 2024
Monocular 3D lane detection for Autonomous Driving: Recent Achievements, Challenges, and Outlooks
Fulong Ma
Weiqing Qi
Guoyang Zhao
Linwei Zheng
Sheng Wang
Yuxuan Liu
Ming-Yu Liu
68
8
0
10 Apr 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
34
20
0
09 Apr 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He
Hengduo Li
Young Kyun Jang
Menglin Jia
Xuefei Cao
Ashish Shah
Abhinav Shrivastava
Ser-Nam Lim
MLLM
81
87
0
08 Apr 2024
RoboMP
2
^2
2
: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models
Qi Lv
Haochuan Li
Xiang Deng
Rui Shao
Michael Yu Wang
Liqiang Nie
LRM
LM&Ro
27
1
0
07 Apr 2024
Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation
Xiaoshuang Huang
Hongxiang Li
Meng Cao
Long Chen
Chenyu You
Dong An
VLM
41
5
0
03 Apr 2024
What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases
A. M. H. Tiong
Junqi Zhao
Boyang Albert Li
Junnan Li
S. Hoi
Caiming Xiong
40
7
0
03 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jienneg Chen
Qihang Yu
Xiaohui Shen
Alan L. Yuille
Liang-Chieh Chen
3DV
VLM
28
24
0
02 Apr 2024
Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning
Rongjie Li
Yu Wu
Xuming He
MLLM
LRM
VLM
16
2
0
01 Apr 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
37
5
0
28 Mar 2024
Toward Interactive Regional Understanding in Vision-Large Language Models
Jungbeom Lee
Sanghyuk Chun
Sangdoo Yun
VLM
16
1
0
27 Mar 2024
Elysium: Exploring Object-level Perception in Videos via MLLM
Hang Wang
Yanjie Wang
Yongjie Ye
Yuxiang Nie
Can Huang
MLLM
32
18
0
25 Mar 2024
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
Reza Esfandiarpoor
Cristina Menghini
Stephen H. Bach
CoGe
VLM
27
8
0
25 Mar 2024
A Multimodal Approach for Cross-Domain Image Retrieval
Lucas Iijima
Tania Stathaki
29
1
0
22 Mar 2024
MMIDR: Teaching Large Language Model to Interpret Multimodal Misinformation via Knowledge Distillation
Longzheng Wang
Xiaohan Xu
Lei Zhang
Jiarui Lu
Yongxiu Xu
Hongbo Xu
Minghao Tang
Chuang Zhang
30
4
0
21 Mar 2024
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs
Théophane Vallaeys
Mustafa Shukor
Matthieu Cord
Jakob Verbeek
54
12
0
20 Mar 2024
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Tongtian Yue
Jie Cheng
Longteng Guo
Xingyuan Dai
Zijia Zhao
Xingjian He
Gang Xiong
Yisheng Lv
Jing Liu
38
9
0
20 Mar 2024
When Do We Not Need Larger Vision Models?
Baifeng Shi
Ziyang Wu
Maolin Mao
Xin Wang
Trevor Darrell
VLM
LRM
44
40
0
19 Mar 2024
VisualCritic: Making LMMs Perceive Visual Quality Like Humans
Zhipeng Huang
Zhizheng Zhang
Yiting Lu
Zheng-Jun Zha
Zhibo Chen
Baining Guo
MLLM
47
11
0
19 Mar 2024
ViTGaze: Gaze Following with Interaction Features in Vision Transformers
Yuehao Song
Xinggang Wang
Jingfeng Yao
Wenyu Liu
Jinglin Zhang
Xiangmin Xu
ViT
31
1
0
19 Mar 2024
Fusion Transformer with Object Mask Guidance for Image Forgery Analysis
Dimitrios Karageorgiou
Giorgos Kordopatis-Zilos
Symeon Papadopoulos
ViT
20
5
0
18 Mar 2024
Better (pseudo-)labels for semi-supervised instance segmentation
Franccois Porcher
Camille Couprie
Marc Szafraniec
Jakob Verbeek
ISeg
15
0
0
18 Mar 2024
Previous
1
2
3
4
5
6
...
9
10
11
Next