Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.07636
Cited By
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"
50 / 507 papers shown
Title
Improving Pixel-based MIM by Reducing Wasted Modeling Capability
Yuan Liu
Songyang Zhang
Jiacheng Chen
Zhaohui Yu
Kai-xiang Chen
Dahua Lin
16
27
0
01 Aug 2023
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Enxin Song
Wenhao Chai
Guanhong Wang
Yucheng Zhang
Haoyang Zhou
...
Tianbo Ye
Yanting Zhang
Yang Lu
Jenq-Neng Hwang
Gaoang Wang
VLM
MLLM
17
259
0
31 Jul 2023
CLIP Brings Better Features to Visual Aesthetics Learners
Liwu Xu
Jinjin Xu
Yuzhe Yang
Yi-Jie Huang
Yanchun Xie
Yaqian Li
VLM
17
3
0
28 Jul 2023
Human-centric Scene Understanding for 3D Large-scale Scenarios
Yiteng Xu
Peishan Cong
Yichen Yao
Runnan Chen
Yuenan Hou
Xinge Zhu
Xuming He
Jingyi Yu
Yuexin Ma
3DV
24
23
0
26 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
18
116
0
25 Jul 2023
CLIP-KD: An Empirical Study of CLIP Model Distillation
Chuanguang Yang
Zhulin An
Libo Huang
Junyu Bi
Xinqiang Yu
Hansheng Yang
Boyu Diao
Yongjun Xu
VLM
16
25
0
24 Jul 2023
COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts
Xiaofeng Mao
YueFeng Chen
Yao Zhu
Da Chen
Hang Su
Rong Zhang
H. Xue
ObjD
OOD
21
18
0
24 Jul 2023
GEM: Boost Simple Network for Glass Surface Segmentation via Vision Foundation Models
Jing Hao
Xinyu Li
Liang Gao
Shumin Han
VLM
DiffM
14
2
0
22 Jul 2023
CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots
D. Rivkin
Nikhil Kakodkar
F. Hogan
Bobak H. Baghi
Gregory Dudek
LM&Ro
11
3
0
21 Jul 2023
Watch out Venomous Snake Species: A Solution to SnakeCLEF2023
Feiran Hu
P. Wang
Yangyang Li
Chenlong Duan
Zijian Zhu
Fei Wang
Faen Zhang
Yong Li
Xiu-Shen Wei
24
6
0
19 Jul 2023
MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results
Yuki Kondo
Norimichi Ukita
Takayuki Yamaguchi
Haoran Hou
Mu-Yi Shen
...
Ichiro Ide
Yosuke Shinya
Xinyao Liu
Guang Liang
S. Yasui
16
13
0
18 Jul 2023
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLM
MLLM
19
25
0
13 Jul 2023
Self-regulating Prompts: Foundational Model Adaptation without Forgetting
Muhammad Uzair Khattak
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Ming Yang
F. Khan
VLM
17
162
0
13 Jul 2023
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
Gregor Geigle
Abhay Jain
Radu Timofte
Goran Glavavs
VLM
MLLM
13
29
0
13 Jul 2023
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
Yan Zeng
Hanbo Zhang
Jiani Zheng
Jiangnan Xia
Guoqiang Wei
Yang Wei
Yuchen Zhang
Tao Kong
MLLM
16
71
0
05 Jul 2023
Surgical fine-tuning for Grape Bunch Segmentation under Visual Domain Shifts
Agnese Chiatti
R. Bertoglio
Nicolás Catalano
Matteo Gatti
Matteo Matteucci
10
4
0
03 Jul 2023
Stitched ViTs are Flexible Vision Backbones
Zizheng Pan
Jing Liu
Haoyu He
Jianfei Cai
Bohan Zhuang
11
2
0
30 Jun 2023
End-to-end Autonomous Driving: Challenges and Frontiers
Li Chen
Peng Wu
Kashyap Chitta
Bernhard Jaeger
Andreas Geiger
Hongyang Li
3DV
30
260
0
29 Jun 2023
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
William Berrios
Gautam Mittal
Tristan Thrush
Douwe Kiela
Amanpreet Singh
MLLM
VLM
11
59
0
28 Jun 2023
Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners
Bowen Shi
Xiaopeng Zhang
Yaoming Wang
Jin Li
Wenrui Dai
Junni Zou
H. Xiong
Qi Tian
22
3
0
28 Jun 2023
Are aligned neural networks adversarially aligned?
Nicholas Carlini
Milad Nasr
Christopher A. Choquette-Choo
Matthew Jagielski
Irena Gao
...
Pang Wei Koh
Daphne Ippolito
Katherine Lee
Florian Tramèr
Ludwig Schmidt
AAML
18
221
0
26 Jun 2023
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Bill Xu
Enhong Chen
MLLM
LRM
33
551
0
23 Jun 2023
Visual Adversarial Examples Jailbreak Aligned Large Language Models
Xiangyu Qi
Kaixuan Huang
Ashwinee Panda
Peter Henderson
Mengdi Wang
Prateek Mittal
AAML
17
136
0
22 Jun 2023
Pushing the Limits of 3D Shape Generation at Scale
Wang Yu
Xuelin Qian
Jingyang Huo
Tiejun Huang
Bo-Lu Zhao
Yanwei Fu
21
11
0
20 Jun 2023
Path to Medical AGI: Unify Domain-specific Medical LLMs with the Lowest Cost
Juexiao Zhou
Xiuying Chen
Xin Gao
LM&MA
AI4CE
82
12
0
19 Jun 2023
Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions
Dongshuo Yin
Xueting Han
Bin Li
Hao Feng
Jinghua Bai
VPVLM
26
16
0
16 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
Peng-Tao Xu
Wenqi Shao
Kaipeng Zhang
Peng Gao
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELM
MLLM
23
158
0
15 Jun 2023
Transferring Knowledge for Food Image Segmentation using Transformers and Convolutions
Grant Sinha
Krishna Parmar
Hilda Azimi
Chi-en Amy Tai
Yuhao Chen
A. Wong
Pengcheng Xi
ViT
15
4
0
15 Jun 2023
MOFI: Learning Image Representations from Noisy Entity Annotated Images
Wentao Wu
Aleksei Timofeev
Chen Chen
Bowen Zhang
Kun Duan
...
Yantao Zheng
Jonathon Shlens
Xianzhi Du
Zhe Gan
Yinfei Yang
VLM
10
7
0
13 Jun 2023
VISION Datasets: A Benchmark for Vision-based InduStrial InspectiON
Haoping Bai
Shancong Mou
Tatiana Likhomanenko
R. G. Cinbis
Oncel Tuzel
Ping-Chia Huang
Jiulong Shan
Jianjun Shi
Mengsi Cao
VLM
6
23
0
13 Jun 2023
Scalable 3D Captioning with Pretrained Models
Tiange Luo
C. Rockwell
Honglak Lee
Justin Johnson
11
151
0
12 Jun 2023
Beyond Detection: Visual Realism Assessment of Deepfakes
Luka Dragar
Peter Peer
Vitomir Štruc
Borut Batagelj
11
3
0
09 Jun 2023
Customizing General-Purpose Foundation Models for Medical Report Generation
Bang-ju Yang
Asif Raza
Yuexian Zou
Tong Zhang
MedIm
11
11
0
09 Jun 2023
Large-scale Dataset Pruning with Dynamic Uncertainty
Muyang He
Shuo Yang
Tiejun Huang
Bo-Lu Zhao
20
25
0
08 Jun 2023
Fine-Grained Visual Prompting
Lingfeng Yang
Yueze Wang
Xiang Li
Xinlong Wang
Jian Yang
ObjD
VLM
16
59
0
07 Jun 2023
Semantic Segmentation on VSPW Dataset through Contrastive Loss and Multi-dataset Training Approach
Min Yan
Qianxiong Ning
Qian Wang
12
1
0
06 Jun 2023
Adversarial alignment: Breaking the trade-off between the strength of an attack and its relevance to human perception
Drew Linsley
Pinyuan Feng
Thibaut Boissin
A. Ashok
Thomas Fel
Stephanie Olaiya
Thomas Serre
AAML
20
6
0
05 Jun 2023
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Hang Zhang
Xin Li
Lidong Bing
MLLM
11
944
0
05 Jun 2023
Revisiting the Role of Language Priors in Vision-Language Models
Zhiqiu Lin
Xinyue Chen
Deepak Pathak
Pengchuan Zhang
Deva Ramanan
VLM
15
22
0
02 Jun 2023
Consistency-guided Prompt Learning for Vision-Language Models
Shuvendu Roy
Ali Etemad
VLM
VPVLM
10
50
0
01 Jun 2023
StyleGAN knows Normal, Depth, Albedo, and More
Anand Bhattad
Daniel McKee
Derek Hoiem
David A. Forsyth
GAN
18
33
0
01 Jun 2023
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Yizhi Li
Ruibin Yuan
Ge Zhang
Yi Ma
Xingran Chen
...
Yemin Shi
Wen-Fen Huang
Zili Wang
Yi-Ting Guo
Jie Fu
17
104
0
31 May 2023
AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Chuhao Jin
Wenhui Tan
Jiange Yang
Bei Liu
Ruihua Song
Limin Wang
Jianlong Fu
LM&Ro
LRM
17
24
0
30 May 2023
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
Dachuan Shi
Chaofan Tao
Anyi Rao
Zhendong Yang
Chun Yuan
Jiaqi Wang
VLM
23
22
0
27 May 2023
ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers
J. Yao
Xinggang Wang
Shusheng Yang
Baoyuan Wang
ViT
14
57
0
24 May 2023
Delving Deeper into Data Scaling in Masked Image Modeling
Cheng Lu
Xiaojie Jin
Qibin Hou
Jun Hao Liew
Mingg-Ming Cheng
Jiashi Feng
17
2
0
24 May 2023
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
Yao Mu
Qinglong Zhang
Mengkang Hu
Wen Wang
Mingyu Ding
Jun Jin
Bin Wang
Jifeng Dai
Yu Qiao
Ping Luo
LM&Ro
LRM
17
212
0
24 May 2023
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
Dongxu Li
Junnan Li
Steven C. H. Hoi
17
298
0
24 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
92
76
0
22 May 2023
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending
Xingjian He
Sihan Chen
Fan Ma
Zhicheng Huang
Xiaojie Jin
Zikang Liu
Dongmei Fu
Yi Yang
J. Liu
Jiashi Feng
VLM
CLIP
18
17
0
22 May 2023
Previous
1
2
3
...
10
11
8
9
Next