Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.05442
Cited By
Scaling Vision Transformers to 22 Billion Parameters
10 February 2023
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
Justin Gilmer
Andreas Steiner
Mathilde Caron
Robert Geirhos
Ibrahim M. Alabdulmohsin
Rodolphe Jenatton
Lucas Beyer
Michael Tschannen
Anurag Arnab
Xiao Wang
C. Riquelme
Matthias Minderer
J. Puigcerver
Utku Evci
Manoj Kumar
Sjoerd van Steenkiste
Gamaleldin F. Elsayed
Aravindh Mahendran
F. I. F. Richard Yu
Avital Oliver
Fantine Huot
Jasmijn Bastings
Mark Collier
A. Gritsenko
Vighnesh Birodkar
C. N. Vasconcelos
Yi Tay
Thomas Mensink
Alexander Kolesnikov
Filip Pavetić
Dustin Tran
Thomas Kipf
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Vision Transformers to 22 Billion Parameters"
50 / 416 papers shown
Title
To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability
Joonhyung Lee
Jeongin Bae
Byeongwook Kim
S. Kwon
Dongsoo Lee
MQ
38
1
0
29 May 2024
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts
Mohammed Nowaz Rabbani Chowdhury
Meng Wang
K. E. Maghraoui
Naigang Wang
Pin-Yu Chen
Christopher Carothers
MoE
24
4
0
26 May 2024
LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters
Xinyu Zhou
Boris Knyazev
Alexia Jolicoeur-Martineau
Jie Fu
AI4CE
36
0
0
25 May 2024
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran
D. M. Nguyen
Duy M. Nguyen
Trung Thanh Nguyen
Ngan Le
Pengtao Xie
Daniel Sonntag
James Y. Zou
Binh T. Nguyen
Mathias Niepert
32
8
0
25 May 2024
The Road Less Scheduled
Aaron Defazio
Xingyu Yang
Yang
Harsh Mehta
Konstantin Mishchenko
Ahmed Khaled
Ashok Cutkosky
20
45
0
24 May 2024
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach
Huy V. Vo
Vasil Khalidov
Timothée Darcet
Théo Moutakanni
Nikita Smetanin
...
Maxime Oquab
Armand Joulin
Hervé Jégou
Patrick Labatut
Piotr Bojanowski
SSL
48
18
0
24 May 2024
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
Jacob Russin
Sam Whitman McGrath
Danielle J. Williams
Lotem Elber-Dorozko
AI4CE
61
2
0
24 May 2024
Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference
Ting Liu
Xuyang Liu
Liangtao Shi
Zunnan Xu
Siteng Huang
Yi Xin
Quanjun Yin
41
5
0
23 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
67
41
0
23 May 2024
360Zhinao Technical Report
360Zhinao Team
32
0
0
22 May 2024
How to set AdamW's weight decay as you scale model and dataset size
Xi Wang
Laurence Aitchison
38
9
0
22 May 2024
OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models
Zhaojian Yu
Yinghao Wu
Zhuotao Deng
Yansong Tang
Xiao-Ping Zhang
39
2
0
21 May 2024
Du-IN: Discrete units-guided mask modeling for decoding speech from Intracranial Neural signals
Hui Zheng
Haiteng Wang
Wei-Bang Jiang
Zhongtao Chen
Li He
Pei-Yang Lin
Peng-Hu Wei
Guo-Guang Zhao
Yun-Zhe Liu
41
1
0
19 May 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
55
253
0
16 May 2024
Improving Transformers with Dynamically Composable Multi-Head Attention
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
26
2
0
14 May 2024
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Peng Gao
Le Zhuo
Ziyi Lin
Ruoyi Du
Xu Luo
...
Weicai Ye
He Tong
Jingwen He
Yu Qiao
Hongsheng Li
VGen
30
81
0
09 May 2024
Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
Zhuoyi Yang
Heyang Jiang
Wenyi Hong
Jiayan Teng
Wendi Zheng
Yuxiao Dong
Ming Ding
Jie Tang
SupR
27
5
0
07 May 2024
On the Foundations of Earth and Climate Foundation Models
Xiao Xiang Zhu
Zhitong Xiong
Yi Wang
Adam J. Stewart
Konrad Heidler
Yuanyuan Wang
Zhenghang Yuan
Thomas Dujardin
Qingsong Xu
Yilei Shi
AI4Cl
AI4CE
26
20
0
07 May 2024
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
Dingzhe Li
Yixiang Jin
A. Yong
Hongze Yu
Jun Shi
Xiaoshuai Hao
Peng Hao
Huaping Liu
Fuchun Sun
Bin Fang
AI4CE
LM&Ro
64
12
0
28 Apr 2024
EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
Hongxia Xie
Chu-Jun Peng
Yu-Wen Tseng
Hung-Jen Chen
Chan-Feng Hsu
Hong-Han Shuai
Wen-Huang Cheng
37
14
0
25 Apr 2024
How to Parameterize Asymmetric Quantization Ranges for Quantization-Aware Training
Jaeseong You
Minseop Park
Kyunggeun Lee
Seokjun An
Chirag I. Patel
Markus Nagel
MQ
31
1
0
25 Apr 2024
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Sachin Mehta
Maxwell Horton
Fartash Faghri
Mohammad Hossein Sekhavat
Mahyar Najibi
Mehrdad Farajtabar
Oncel Tuzel
Mohammad Rastegari
VLM
CLIP
29
6
0
24 Apr 2024
ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability
Xiao Wang
A. Tsaris
Siyan Liu
Jong Youl Choi
Ming Fan
Wei Zhang
Ju Yin
M. Ashfaq
Dan Lu
Prasanna Balaprakash
22
7
0
23 Apr 2024
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Kevin Slagle
27
3
0
22 Apr 2024
How to Benchmark Vision Foundation Models for Semantic Segmentation?
Tommie Kerssies
Daan de Geus
Gijs Dubbelman
VLM
27
7
0
18 Apr 2024
Pretraining Billion-scale Geospatial Foundational Models on Frontier
A. Tsaris
P. Dias
Abhishek Potnis
Junqi Yin
Feiyi Wang
D. Lunga
AI4CE
19
4
0
17 Apr 2024
Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology
Oren Z. Kraus
Kian Kenyon-Dean
Saber Saberian
Maryam Fallah
Peter McLean
...
Chi Vicky Cheng
Kristen Morse
Maureen Makes
Ben Mabey
Berton A. Earnshaw
22
26
0
16 Apr 2024
Anatomy of Industrial Scale Multilingual ASR
Francis McCann Ramirez
Luka Chkhetiani
Andrew Ehrenberg
R. McHardy
Rami Botros
...
Ahmed Efty
Daniel McCrystal
Sam Flamini
Domenic Donato
Takuya Yoshioka
22
7
0
15 Apr 2024
Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding
Yiwen Tang
Ray Zhang
Jiaming Liu
Zoey Guo
Dong Wang
...
Bin Zhao
Shanghang Zhang
Peng Gao
Hongsheng Li
Xuelong Li
33
10
0
11 Apr 2024
Post-Hoc Reversal: Are We Selecting Models Prematurely?
Rishabh Ranjan
Saurabh Garg
Mrigank Raman
Carlos Guestrin
Zachary Chase Lipton
27
0
0
11 Apr 2024
CLIP-Embed-KD: Computationally Efficient Knowledge Distillation Using Embeddings as Teachers
Lakshmi Nair
VLM
29
0
0
09 Apr 2024
MindSet: Vision. A toolbox for testing DNNs on key psychological experiments
Valerio Biscione
Don Yin
Gaurav Malhotra
Marin Dujmović
M. Montero
...
Rachel Heaton
John E. Hummel
Benjamin D. Evans
Karim G. Habashy
Jeffrey S. Bowers
22
2
0
08 Apr 2024
Radial Networks: Dynamic Layer Routing for High-Performance Large Language Models
Jordan Dotzel
Yash Akhauri
Ahmed S. AbouElhamayed
Carly Jiang
Mohamed S. Abdelfattah
Zhiru Zhang
MoE
15
1
0
07 Apr 2024
Prompting Multi-Modal Tokens to Enhance End-to-End Autonomous Driving Imitation Learning with LLMs
Yiqun Duan
Qiang Zhang
Renjing Xu
36
9
0
07 Apr 2024
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Detai Xin
Xu Tan
Kai Shen
Zeqian Ju
Dongchao Yang
...
Shinnosuke Takamichi
Hiroshi Saruwatari
Shujie Liu
Jinyu Li
Sheng Zhao
29
23
0
04 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jienneg Chen
Qihang Yu
Xiaohui Shen
Alan L. Yuille
Liang-Chieh Chen
3DV
VLM
28
24
0
02 Apr 2024
Can Biases in ImageNet Models Explain Generalization?
Paul Gavrikov
J. Keuper
OOD
VLM
19
11
0
01 Apr 2024
Video Interpolation with Diffusion Models
Siddhant Jain
Daniel Watson
Eric Tabellion
Aleksander Holyñski
Ben Poole
Janne Kontkanen
SupR
VGen
DiffM
22
31
0
01 Apr 2024
The Larger the Better? Improved LLM Code-Generation via Budget Reallocation
Michael Hassid
Tal Remez
Jonas Gehring
Roy Schwartz
Yossi Adi
34
20
0
31 Mar 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
A. Kazerouni
I. Hacihaliloglu
Dorit Merhof
41
7
0
28 Mar 2024
Evaluating Large Language Models for Health-Related Text Classification Tasks with Public Social Media Data
Yuting Guo
Anthony Ovadje
M. Al-garadi
Abeed Sarker
AI4MH
27
6
0
27 Mar 2024
Scaling Laws For Dense Retrieval
Yan Fang
Jingtao Zhan
Qingyao Ai
Jiaxin Mao
Weihang Su
Jia Chen
Yiqun Liu
94
8
0
27 Mar 2024
Heracles: A Hybrid SSM-Transformer Model for High-Resolution Image and Time-Series Analysis
Badri N. Patro
Suhas Ranganath
Vinay P. Namboodiri
Vijay Srinivas Agneeswaran
43
2
0
26 Mar 2024
Tiny Models are the Computational Saver for Large Models
Qingyuan Wang
B. Cardiff
Antoine Frappé
Benoît Larras
Deepu John
24
2
0
26 Mar 2024
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
Yufu Wang
ZiYun Wang
Lingjie Liu
Kostas Daniilidis
37
25
0
26 Mar 2024
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han
Chao Gao
Jinyang Liu
Jeff Zhang
Sai Qian Zhang
139
301
0
21 Mar 2024
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Dominik Wagner
Alexander W. Churchill
Siddharth Sigtia
Panayiotis Georgiou
Matt Mirsamadi
Aarshee Mishra
Erik Marchi
38
6
0
21 Mar 2024
On Pretraining Data Diversity for Self-Supervised Learning
Hasan Hammoud
Tuhin Das
Fabio Pizzati
Philip H. S. Torr
Adel Bibi
Bernard Ghanem
92
2
0
20 Mar 2024
ZigMa: A DiT-style Zigzag Mamba Diffusion Model
Vincent Tao Hu
S. A. Baumann
Ming Gui
Olga Grebenkova
Pingchuan Ma
Johannes S. Fischer
Bjorn Ommer
35
42
0
20 Mar 2024
When Do We Not Need Larger Vision Models?
Baifeng Shi
Ziyang Wu
Maolin Mao
Xin Wang
Trevor Darrell
VLM
LRM
44
40
0
19 Mar 2024
Previous
1
2
3
4
5
6
7
8
9
Next