Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.13013
Cited By
Stable and low-precision training for large-scale vision-language models
25 April 2023
Mitchell Wortsman
Tim Dettmers
Luke Zettlemoyer
Ari S. Morcos
Ali Farhadi
Ludwig Schmidt
MQ
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Stable and low-precision training for large-scale vision-language models"
19 / 19 papers shown
Title
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang
Haotian Hu
Zhenyu (Allen) Zhang
Gaojie Jin
X. Li
...
Tianlong Chen
Lu Liu
Qingsong Wen
Zhangyang Wang
Shiwei Liu
MQ
35
0
0
24 Feb 2025
Sublinear Variational Optimization of Gaussian Mixture Models with Millions to Billions of Parameters
Sebastian Salwig
Till Kahlke
F. Hirschberger
D. Forster
Jorg Lucke
VLM
82
0
0
21 Jan 2025
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Haocheng Xi
Han Cai
Ligeng Zhu
Y. Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
60
9
0
25 Oct 2024
Enhancing Large Language Model-based Speech Recognition by Contextualization for Rare and Ambiguous Words
Kento Nozawa
Takashi Masuko
Toru Taniguchi
43
1
0
15 Aug 2024
Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology
Eric Zimmermann
Eugene Vorontsov
Julian Viret
Adam Casson
Michal Zelechowski
...
Razik Yousfi
Thomas J. Fuchs
Nicolò Fusi
Siqi Liu
Kristen Severson
MedIm
29
27
0
01 Aug 2024
u-
μ
\mu
μ
P: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
51
9
0
24 Jul 2024
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
34
3
0
29 May 2024
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
Jia Guo
Shuai Lu
Weihang Zhang
Huiqi Li
Huiqi Li
Hongen Liao
ViT
59
7
0
23 May 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao
Zhenyu (Allen) Zhang
Beidi Chen
Zhangyang Wang
A. Anandkumar
Yuandong Tian
41
173
0
06 Mar 2024
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
35
80
0
25 Sep 2023
Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Shuangfei Zhai
Tatiana Likhomanenko
Etai Littwin
Dan Busbridge
Jason Ramapuram
Yizhe Zhang
Jiatao Gu
J. Susskind
AAML
38
64
0
11 Mar 2023
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
245
1,071
0
05 Oct 2022
FP8 Formats for Deep Learning
Paulius Micikevicius
Dusan Stosic
N. Burgess
Marius Cornea
Pradeep Dubey
...
Naveen Mellempudi
S. Oberman
M. Shoeybi
Michael Siu
Hao Wu
BDL
VLM
MQ
67
121
0
12 Sep 2022
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,764
0
24 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
220
510
0
11 Feb 2021
FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference
D. Khudia
Jianyu Huang
Protonu Basu
Summer Deng
Haixin Liu
Jongsoo Park
M. Smelyanskiy
FedML
MQ
32
44
0
13 Jan 2021
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
86
336
0
05 Jan 2021
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
138
221
0
31 Dec 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
225
574
0
12 Sep 2019
1