Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.05442
Cited By
Scaling Vision Transformers to 22 Billion Parameters
10 February 2023
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
Justin Gilmer
Andreas Steiner
Mathilde Caron
Robert Geirhos
Ibrahim M. Alabdulmohsin
Rodolphe Jenatton
Lucas Beyer
Michael Tschannen
Anurag Arnab
Xiao Wang
C. Riquelme
Matthias Minderer
J. Puigcerver
Utku Evci
Manoj Kumar
Sjoerd van Steenkiste
Gamaleldin F. Elsayed
Aravindh Mahendran
F. I. F. Richard Yu
Avital Oliver
Fantine Huot
Jasmijn Bastings
Mark Collier
A. Gritsenko
Vighnesh Birodkar
C. N. Vasconcelos
Yi Tay
Thomas Mensink
Alexander Kolesnikov
Filip Pavetić
Dustin Tran
Thomas Kipf
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Vision Transformers to 22 Billion Parameters"
50 / 416 papers shown
Title
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding
Mu Cai
Zeyi Huang
Yuheng Li
Utkarsh Ojha
Haohan Wang
Yong Jae Lee
VLM
14
2
0
09 Jun 2023
R-MAE: Regions Meet Masked Autoencoders
Duy-Kien Nguyen
Vaibhav Aggarwal
Yanghao Li
Martin R. Oswald
Alexander Kirillov
Cees G. M. Snoek
Xinlei Chen
TPM
14
10
0
08 Jun 2023
Performance-optimized deep neural networks are evolving into worse models of inferotemporal visual cortex
Drew Linsley
I. F. Rodriguez
Thomas Fel
Michael Arcaro
Saloni Sharma
Margaret Livingstone
Thomas Serre
22
18
0
06 Jun 2023
Deep Learning for Day Forecasts from Sparse Observations
Marcin Andrychowicz
L. Espeholt
Di Li
Samier Merchant
Alexander Merose
Fred Zyda
Shreya Agrawal
Nal Kalchbrenner
AI4Cl
23
60
0
06 Jun 2023
Adversarial alignment: Breaking the trade-off between the strength of an attack and its relevance to human perception
Drew Linsley
Pinyuan Feng
Thibaut Boissin
A. Ashok
Thomas Fel
Stephanie Olaiya
Thomas Serre
AAML
20
6
0
05 Jun 2023
Memorization Capacity of Multi-Head Attention in Transformers
Sadegh Mahdavi
Renjie Liao
Christos Thrampoulidis
11
22
0
03 Jun 2023
White-Box Transformers via Sparse Rate Reduction
Yaodong Yu
Sam Buchanan
Druv Pai
Tianzhe Chu
Ziyang Wu
Shengbang Tong
B. Haeffele
Y. Ma
ViT
16
80
0
01 Jun 2023
AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Chuhao Jin
Wenhui Tan
Jiange Yang
Bei Liu
Ruihua Song
Limin Wang
Jianlong Fu
LM&Ro
LRM
17
24
0
30 May 2023
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Xi Chen
Josip Djolonga
Piotr Padlewski
Basil Mustafa
Soravit Changpinyo
...
Mojtaba Seyedhosseini
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
VLM
44
186
0
29 May 2023
Manifold Regularization for Memory-Efficient Training of Deep Neural Networks
Shadi Sartipi
Edgar A. Bernal
20
0
0
26 May 2023
Are Deep Neural Networks Adequate Behavioural Models of Human Visual Perception?
Felix Wichmann
Robert Geirhos
25
25
0
26 May 2023
Scaling Data-Constrained Language Models
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
16
195
0
25 May 2023
TOAST: Transfer Learning via Attention Steering
Baifeng Shi
Siyu Gai
Trevor Darrell
Xin Wang
14
9
0
24 May 2023
Delving Deeper into Data Scaling in Masked Image Modeling
Cheng Lu
Xiaojie Jin
Qibin Hou
Jun Hao Liew
Mingg-Ming Cheng
Jiashi Feng
19
2
0
24 May 2023
HARD: Hard Augmentations for Robust Distillation
Arne F. Nix
Max F. Burg
Fabian H. Sinz
AAML
15
1
0
24 May 2023
Just CHOP: Embarrassingly Simple LLM Compression
A. Jha
Tom Sherborne
Evan Pete Walsh
Dirk Groeneveld
Emma Strubell
Iz Beltagy
20
3
0
24 May 2023
Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and Efficient Pre-LN Transformers
Zixuan Jiang
Jiaqi Gu
Hanqing Zhu
D. Pan
AI4CE
17
15
0
24 May 2023
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Ziyun Zeng
Yixiao Ge
Zhan Tong
Xihui Liu
Shutao Xia
Ying Shan
24
9
0
23 May 2023
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Ibrahim M. Alabdulmohsin
Xiaohua Zhai
Alexander Kolesnikov
Lucas Beyer
VLM
22
56
0
22 May 2023
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model
Siyuan Huang
Zhengkai Jiang
Hao Dong
Yu Qiao
Peng Gao
Hongsheng Li
LM&Ro
22
91
0
18 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
16
113
0
18 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
32
89
0
14 May 2023
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Hassan Akbari
Dan Kondratyuk
Yin Cui
Rachel Hornung
H. Wang
Hartwig Adam
VLM
MoE
20
11
0
10 May 2023
Visual Tuning
Bruce X. B. Yu
Jianlong Chang
Haixin Wang
Lin Liu
Shijie Wang
...
Lingxi Xie
Haojie Li
Zhouchen Lin
Qi Tian
Chang Wen Chen
VLM
39
38
0
10 May 2023
CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task Adaptation
J. Heo
S. Azizi
A. Fayyazi
Massoud Pedram
23
0
0
08 May 2023
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Feilong Chen
Minglun Han
Haozhi Zhao
Qingyang Zhang
Jing Shi
Shuang Xu
Bo Xu
MLLM
25
114
0
07 May 2023
AttentionViz: A Global View of Transformer Attention
Catherine Yeh
Yida Chen
Aoyu Wu
Cynthia Chen
Fernanda Viégas
Martin Wattenberg
ViT
14
51
0
04 May 2023
ZipIt! Merging Models from Different Tasks without Training
George Stoica
Daniel Bolya
J. Bjorner
Pratik Ramesh
Taylor N. Hearn
Judy Hoffman
VLM
MoMe
38
109
0
04 May 2023
Stable and low-precision training for large-scale vision-language models
Mitchell Wortsman
Tim Dettmers
Luke Zettlemoyer
Ari S. Morcos
Ali Farhadi
Ludwig Schmidt
MQ
MLLM
VLM
11
38
0
25 Apr 2023
Distilling from Similar Tasks for Transfer Learning on a Budget
Kenneth Borup
Cheng Perng Phoo
Bharath Hariharan
11
2
0
24 Apr 2023
STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training
Ziyan Huang
Hao Wang
Zhongying Deng
Jin Ye
Yanzhou Su
...
Junjun He
Yun Gu
Lixu Gu
Shaoting Zhang
Yu Qiao
12
74
0
13 Apr 2023
DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning
Enze Xie
Lewei Yao
Han Shi
Zhili Liu
Daquan Zhou
Zhaoqiang Liu
Jiawei Li
Zhenguo Li
16
76
0
13 Apr 2023
A Billion-scale Foundation Model for Remote Sensing Images
Keumgang Cha
Junghoon Seo
Taekyung Lee
30
62
0
11 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
24
39
0
07 Apr 2023
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
Nolan Dey
Gurpreet Gosal
Zhiming Chen
Chen
Hemant Khachane
William Marshall
Ribhu Pathria
Marvin Tom
Joel Hestness
MoE
LRM
25
98
0
06 Apr 2023
The Vector Grounding Problem
Dimitri Coelho Mollo
Raphael Milliere
23
25
0
04 Apr 2023
Exploring Vision-Language Models for Imbalanced Learning
Yidong Wang
Zhuohao Yu
Jindong Wang
Qiang Heng
Haoxing Chen
Wei Ye
Rui Xie
Xingxu Xie
Shi-Bo Zhang
VLM
18
30
0
04 Apr 2023
Specialty-Oriented Generalist Medical AI for Chest CT Screening
Chuang Niu
Qing Lyu
Christopher D. Carothers
P. Kaviani
Josh Tan
P. Yan
M. Kalra
C. Whitlow
Ge Wang
6
6
0
03 Apr 2023
Your Diffusion Model is Secretly a Zero-Shot Classifier
Alexander C. Li
Mihir Prabhudesai
Shivam Duggal
Ellis L Brown
Deepak Pathak
DiffM
VLM
33
221
0
28 Mar 2023
Text-to-Image Diffusion Models are Zero-Shot Classifiers
Kevin Clark
P. Jaini
DiffM
VLM
22
105
0
27 Mar 2023
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
J. Hernandez
Ruben Villegas
Vicente Ordonez
SSL
29
2
0
21 Mar 2023
Large AI Models in Health Informatics: Applications, Challenges, and the Future
Jianing Qiu
Lin Li
Jiankai Sun
Jiachuan Peng
Peilun Shi
...
Bo Xiao
Wu Yuan
Ningli Wang
Dong Xu
Benny P. L. Lo
AI4MH
LM&MA
38
123
0
21 Mar 2023
eP-ALM: Efficient Perceptual Augmentation of Language Models
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLM
VLM
24
29
0
20 Mar 2023
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
E. Azarnasab
Faisal Ahmed
Zicheng Liu
Ce Liu
Michael Zeng
Lijuan Wang
ReLM
KELM
LRM
15
365
0
20 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
38
258
0
20 Mar 2023
Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models
Zangwei Zheng
Mingyu Ma
Kai Wang
Ziheng Qin
Xiangyu Yue
Yang You
CLL
VLM
94
67
0
12 Mar 2023
Adapting Contrastive Language-Image Pretrained (CLIP) Models for Out-of-Distribution Detection
Nikolas Adaloglou
Félix D. P. Michels
Tim Kaiser
M. Kollmann
VLM
11
0
0
10 Mar 2023
InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning
Ziheng Qin
K. Wang
Zangwei Zheng
Jianyang Gu
Xiang Peng
...
Daquan Zhou
Lei Shang
Baigui Sun
Xuansong Xie
Yang You
116
44
0
08 Mar 2023
PaLM-E: An Embodied Multimodal Language Model
Danny Driess
F. Xia
Mehdi S. M. Sajjadi
Corey Lynch
Aakanksha Chowdhery
...
Marc Toussaint
Klaus Greff
Andy Zeng
Igor Mordatch
Peter R. Florence
LM&Ro
20
1,558
0
06 Mar 2023
OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System
Chao Xue
W. Liu
Shunxing Xie
Zhenfang Wang
Jiaxing Li
...
Shi-Yong Chen
Yibing Zhan
Jing Zhang
Chaoyue Wang
Dacheng Tao
27
1
0
01 Mar 2023
Previous
1
2
3
4
5
6
7
8
9
Next