Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.14322
Cited By
Small-scale proxies for large-scale Transformer training instabilities
25 September 2023
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
Ben Adlam
John D. Co-Reyes
Izzeddin Gur
Abhishek Kumar
Roman Novak
Jeffrey Pennington
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Small-scale proxies for large-scale Transformer training instabilities"
22 / 72 papers shown
Title
Computational Tradeoffs in Image Synthesis: Diffusion, Masked-Token, and Next-Token Prediction
Maciej Kilian
Varun Jampani
Luke Zettlemoyer
DiffM
19
8
0
21 May 2024
FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information
Dongseong Hwang
ODL
24
4
0
21 May 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
53
249
0
16 May 2024
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Kevin Slagle
27
3
0
22 Apr 2024
Anatomy of Industrial Scale Multilingual ASR
Francis McCann Ramirez
Luka Chkhetiani
Andrew Ehrenberg
R. McHardy
Rami Botros
...
Ahmed Efty
Daniel McCrystal
Sam Flamini
Domenic Donato
Takuya Yoshioka
19
7
0
15 Apr 2024
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Shengding Hu
Yuge Tu
Xu Han
Chaoqun He
Ganqu Cui
...
Chaochao Jia
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
MoE
38
275
0
09 Apr 2024
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Brandon McKinzie
Zhe Gan
J. Fauconnier
Sam Dodge
Bowen Zhang
...
Zirui Wang
Ruoming Pang
Peter Grasch
Alexander Toshev
Yinfei Yang
MLLM
27
185
0
14 Mar 2024
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALM
ELM
LRM
100
40
0
13 Mar 2024
Batch size invariant Adam
Xi Wang
Laurence Aitchison
31
2
0
29 Feb 2024
Disentangling the Causes of Plasticity Loss in Neural Networks
Clare Lyle
Zeyu Zheng
Khimya Khetarpal
H. V. Hasselt
Razvan Pascanu
James Martens
Will Dabney
AI4CE
53
30
0
29 Feb 2024
Stable LM 2 1.6B Technical Report
Marco Bellagente
J. Tow
Dakota Mahan
Duy Phung
Maksym Zhuravinskyi
...
Paulo Rocha
Harry Saini
H. Teufel
Niccoló Zanichelli
Carlos Riquelme
OSLM
29
52
0
27 Feb 2024
Massive Activations in Large Language Models
Mingjie Sun
Xinlei Chen
J. Zico Kolter
Zhuang Liu
60
64
0
27 Feb 2024
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Ruoyu Sun
Zhimin Luo
16
39
0
26 Feb 2024
Bridging Associative Memory and Probabilistic Modeling
Rylan Schaeffer
Nika Zahedi
Mikail Khona
Dhruv Pai
Sang T. Truong
...
Sarthak Chandra
Andres Carranza
Ila Rani Fiete
Andrey Gromov
Oluwasanmi Koyejo
DiffM
43
4
0
15 Feb 2024
Spike No More: Stabilizing the Pre-training of Large Language Models
Sho Takase
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
13
13
0
28 Dec 2023
On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width
Satoki Ishikawa
Ryo Karakida
10
2
0
19 Dec 2023
StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization
Shida Wang
Qianxiao Li
17
12
0
24 Nov 2023
Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Shuangfei Zhai
Tatiana Likhomanenko
Etai Littwin
Dan Busbridge
Jason Ramapuram
Yizhe Zhang
Jiatao Gu
J. Susskind
AAML
38
64
0
11 Mar 2023
Meta-Principled Family of Hyperparameter Scaling Strategies
Sho Yaida
50
16
0
10 Oct 2022
Transformer Quality in Linear Time
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
71
220
0
21 Feb 2022
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
160
399
0
18 Jan 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
223
4,424
0
23 Jan 2020
Previous
1
2