Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.09871
Cited By
A Theory on Adam Instability in Large-Scale Machine Learning
19 April 2023
Igor Molybog
Peter Albert
Moya Chen
Zach DeVito
David Esiobu
Naman Goyal
Punit Singh Koura
Sharan Narang
Andrew Poulton
Ruan Silva
Binh Tang
Diana Liskovich
Puxin Xu
Yuchen Zhang
Melanie Kambadur
Stephen Roller
Susan Zhang
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Theory on Adam Instability in Large-Scale Machine Learning"
9 / 9 papers shown
Title
Revisiting Transformers through the Lens of Low Entropy and Dynamic Sparsity
Ruifeng Ren
Yong Liu
97
0
0
26 Apr 2025
Stochastic Rounding for LLM Training: Theory and Practice
Kaan Ozkara
Tao Yu
Youngsuk Park
36
0
0
27 Feb 2025
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang
Haotian Hu
Zhenyu (Allen) Zhang
Gaojie Jin
X. Li
...
Tianlong Chen
Lu Liu
Qingsong Wen
Zhangyang Wang
Shiwei Liu
MQ
35
0
0
24 Feb 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zhangyang Wang
Shiwei Liu
42
1
0
12 Jan 2025
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
31
3
0
18 Oct 2024
Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology
Eric Zimmermann
Eugene Vorontsov
Julian Viret
Adam Casson
Michal Zelechowski
...
Razik Yousfi
Thomas J. Fuchs
Nicolò Fusi
Siqi Liu
Kristen Severson
MedIm
29
27
0
01 Aug 2024
Jointly Training Large Autoregressive Multimodal Models
Emanuele Aiello
L. Yu
Yixin Nie
Armen Aghajanyan
Barlas Oğuz
11
29
0
27 Sep 2023
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
35
80
0
25 Sep 2023
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
16
0
0
25 May 2023
1