Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2211.07636
Cited By
v1
v2 (latest)
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Computer Vision and Pattern Recognition (CVPR), 2022
14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (2496★)
Papers citing
"EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"
50 / 579 papers shown
Title
Can Large Multimodal Models Uncover Deep Semantics Behind Images?
Yixin Yang
Zheng Li
Qingxiu Dong
Heming Xia
Zhifang Sui
VLM
168
19
0
17 Feb 2024
II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering
Jihyung Kil
Farideh Tavazoee
Luan Tuyen Chau
Joo-Kyung Kim
LRM
190
6
0
16 Feb 2024
Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
David Romero
Thamar Solorio
280
5
0
16 Feb 2024
Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance
Linxi Zhao
Yihe Deng
Weitong Zhang
Q. Gu
MLLM
297
30
0
13 Feb 2024
VisLingInstruct: Elevating Zero-Shot Learning in Multi-Modal Language Models with Autonomous Instruction Optimization
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Dongsheng Zhu
Xunzhu Tang
Weidong Han
Jinghui Lu
Yukun Zhao
Guoliang Xing
Junfeng Wang
D. Yin
VLM
MLLM
285
16
0
12 Feb 2024
Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy
International Conference on Learning Representations (ICLR), 2024
Simon Ging
M. A. Bravo
Thomas Brox
VLM
384
19
0
11 Feb 2024
Large Language Models for Captioning and Retrieving Remote Sensing Images
João Daniel Silva
João Magalhães
D. Tuia
Bruno Martins
188
39
0
09 Feb 2024
Examining Gender and Racial Bias in Large Vision-Language Models Using a Novel Dataset of Parallel Images
Kathleen C. Fraser
S. Kiritchenko
246
62
0
08 Feb 2024
Question Aware Vision Transformer for Multimodal Reasoning
Roy Ganz
Yair Kittenplon
Aviad Aberdam
Elad Ben Avraham
Oren Nuriel
Shai Mazor
Ron Litman
280
36
0
08 Feb 2024
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Quan-Sen Sun
Jinsheng Wang
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Xinlong Wang
VLM
CLIP
MLLM
288
75
0
06 Feb 2024
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
International Conference on Machine Learning (ICML), 2024
Yang Jin
Zhicheng Sun
Kun Xu
Kun Xu
Liwei Chen
...
Yuliang Liu
Chen Zhang
Yang Song
Kun Gai
Yadong Mu
VGen
229
77
0
05 Feb 2024
Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives
IEEE Transactions on Intelligent Vehicles (TIV), 2024
Sheng Luo
Wei Chen
Wanxin Tian
Rui Liu
Luanxuan Hou
...
Ling Shao
Yi Yang
Bojun Gao
Qun Li
Guobin Wu
395
28
0
05 Feb 2024
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering
Ziyu Ma
Shutao Li
Bin Sun
Jianfei Cai
Zuxiang Long
Fuyan Ma
244
8
0
04 Feb 2024
Region-Based Representations Revisited
Michal Shlapentokh-Rothman
Ansel Blume
Yao Xiao
Yuqun Wu
TV Sethuraman
Heyi Tao
Jae Yong Lee
Wilfredo Torres
Yu-Xiong Wang
Derek Hoiem
444
14
0
04 Feb 2024
Can MLLMs Perform Text-to-Image In-Context Learning?
Yuchen Zeng
Wonjun Kang
Yicong Chen
Hyung Il Koo
Kangwook Lee
MLLM
247
14
0
02 Feb 2024
Hybrid Quantum Vision Transformers for Event Classification in High Energy Physics
Eyup B. Unlu
Marçal Comajoan Cara
Gopal Ramesh Dahale
Zhongtian Dong
Roy T. Forestano
...
Daniel Justice
Kyoungchul Kong
Tom Magorsch
Konstantin T. Matchev
Katia Matcheva
278
13
0
01 Feb 2024
ControlCap: Controllable Region-level Captioning
Yuzhong Zhao
Yue Liu
Zonghao Guo
Weijia Wu
Chen Gong
Fang Wan
QiXiang Ye
396
14
0
31 Jan 2024
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
Florentin Wörgötter
Alexander S. Ecker
396
14
0
29 Jan 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Sijin Yu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Yuan Liu
VLM
MLLM
354
342
0
29 Jan 2024
VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models
Yi Zhao
Yilin Zhang
Rong Xiang
Jing Li
Hillming Li
313
26
0
29 Jan 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
484
331
0
24 Jan 2024
STICKERCONV: Generating Multimodal Empathetic Responses from Scratch
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yiqun Zhang
Fanheng Kong
Peidong Wang
Shuang Sun
Lingshuai Wang
Shi Feng
Daling Wang
Yifei Zhang
Kaisong Song
179
31
0
20 Jan 2024
Image Safeguarding: Reasoning with Conditional Vision Language Model and Obfuscating Unsafe Content Counterfactually
AAAI Conference on Artificial Intelligence (AAAI), 2024
Mazal Bethany
Brandon Wherry
Nishant Vishwamitra
Peyman Najafirad
DiffM
134
8
0
19 Jan 2024
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering
Haibo Wang
Chenghang Lai
Yixuan Sun
Weifeng Ge
364
12
0
19 Jan 2024
OMG-Seg: Is One Model Good Enough For All Segmentation?
Xiangtai Li
Haobo Yuan
Wei Li
Henghui Ding
Size Wu
Wenwei Zhang
Yining Li
Kai Chen
Chen Change Loy
VLM
MLLM
ViT
302
103
0
18 Jan 2024
Supervised Fine-tuning in turn Improves Visual Foundation Models
Xiaohu Jiang
Yixiao Ge
Yuying Ge
Dachuan Shi
Chun Yuan
Ying Shan
VLM
CLIP
226
13
0
18 Jan 2024
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Changyao Tian
Xizhou Zhu
Yuwen Xiong
Weiyun Wang
Zhe Chen
...
Tong Lu
Jie Zhou
Jiaming Song
Yu Qiao
Jifeng Dai
AuLLM
235
70
0
18 Jan 2024
SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
MLLM
239
111
0
18 Jan 2024
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
International Conference on Machine Learning (ICML), 2024
Lianghui Zhu
Bencheng Liao
Qian Zhang
Xinlong Wang
Wenyu Liu
Xinggang Wang
Mamba
443
1,344
0
17 Jan 2024
Beyond Anti-Forgetting: Multimodal Continual Instruction Tuning with Positive Forward Transfer
Junhao Zheng
Qianli Ma
Zhen Liu
Binquan Wu
Hu Feng
CLL
317
25
0
17 Jan 2024
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
European Conference on Computer Vision (ECCV), 2024
Bowen Shi
Peisen Zhao
Zichen Wang
Yuhang Zhang
Yaoming Wang
...
Wenrui Dai
Junni Zou
Hongkai Xiong
Qi Tian
Xiaopeng Zhang
VLM
176
12
0
12 Jan 2024
Video Anomaly Detection and Explanation via Large Language Models
Hui Lv
Qianru Sun
242
49
0
11 Jan 2024
Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics
Beiwen Tian
Huan-ang Gao
Leiyao Cui
Yupeng Zheng
Lan Luo
Baofeng Wang
Rong Zhi
Guyue Zhou
Hao Zhao
207
6
0
10 Jan 2024
Revisiting Adversarial Training at Scale
Computer Vision and Pattern Recognition (CVPR), 2024
Zeyu Wang
Xianhang Li
Hongru Zhu
Cihang Xie
407
30
0
09 Jan 2024
Effective pruning of web-scale datasets based on complexity of concept clusters
International Conference on Learning Representations (ICLR), 2024
Amro Abbas
E. Rusak
Kushal Tirumala
Wieland Brendel
Kamalika Chaudhuri
Ari S. Morcos
VLM
CLIP
285
28
0
09 Jan 2024
Denoising Vision Transformers
Jiawei Yang
Katie Z Luo
Jie Li
Kilian Q. Weinberger
Yonglong Tian
Yue Wang
DiffM
218
29
0
05 Jan 2024
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
Computer Vision and Pattern Recognition (CVPR), 2024
Yiran Song
Qianyu Zhou
Hefei Ling
Deng-Ping Fan
Xuequan Lu
Lizhuang Ma
VLM
490
20
0
04 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
275
28
0
31 Dec 2023
FerKD: Surgical Label Adaptation for Efficient Distillation
IEEE International Conference on Computer Vision (ICCV), 2023
Zhiqiang Shen
253
4
0
29 Dec 2023
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Chenliang Xu
Jiebo Luo
Chenliang Xu
VLM
683
162
0
29 Dec 2023
Learning Vision from Models Rivals Learning Vision from Data
Computer Vision and Pattern Recognition (CVPR), 2023
Yonglong Tian
Lijie Fan
Kaifeng Chen
Dina Katabi
Dilip Krishnan
Phillip Isola
266
72
0
28 Dec 2023
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu
Limeng Qiao
Xinyang Lin
Shuang Xu
Yang Yang
...
Fei Wei
Xinyu Zhang
Bo Zhang
Xiaolin Wei
Chunhua Shen
MLLM
292
68
0
28 Dec 2023
ChartBench: A Benchmark for Complex Visual Reasoning in Charts
Zhengzhuo Xu
Sinan Du
Yiyan Qi
Chengjin Xu
Chun Yuan
Jian Guo
408
85
0
26 Dec 2023
FoodLMM: A Versatile Food Assistant using Large Multi-modal Model
Yuehao Yin
Huiyan Qi
B. Zhu
Yue Yu
Yu-Gang Jiang
Chong-Wah Ngo
256
39
0
22 Dec 2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
614
2,148
0
21 Dec 2023
GSVA: Generalized Segmentation via Multimodal Large Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Zhuofan Xia
Dongchen Han
Yizeng Han
Xuran Pan
Shiji Song
Gao Huang
VLM
595
125
0
15 Dec 2023
General Object Foundation Model for Images and Videos at Scale
Computer Vision and Pattern Recognition (CVPR), 2023
Junfeng Wu
Yi Jiang
Qihao Liu
Zehuan Yuan
Xiang Bai
Song Bai
VOS
VLM
328
77
0
14 Dec 2023
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
Wenhai Wang
Jiangwei Xie
ChuanYang Hu
Haoming Zou
Jianan Fan
Wenwen Tong
Yang Wen
Silei Wu
Hanming Deng
Zhiqi Li
351
216
0
14 Dec 2023
ViLA: Efficient Video-Language Alignment for Video Question Answering
European Conference on Computer Vision (ECCV), 2023
Xijun Wang
Junbang Liang
Chun-Kai Wang
Kenan Deng
Yu Lou
Ming-Chyuan Lin
Shan Yang
305
21
0
13 Dec 2023
Building Universal Foundation Models for Medical Image Analysis with Spatially Adaptive Networks
Lingxiao Luo
Xuanzhong Chen
Bingda Tang
Xinsheng Chen
Rong Han
Chengpeng Hu
Yujiang Li
Ting Chen
MedIm
208
3
0
12 Dec 2023
Previous
1
2
3
...
10
11
12
7
8
9
Next