Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.11321
Cited By
SOAP: Improving and Stabilizing Shampoo using Adam
17 September 2024
Nikhil Vyas
Depen Morwani
Rosie Zhao
Itai Shapira
David Brandfonbrener
Lucas Janson
Sham Kakade
Sham Kakade
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SOAP: Improving and Stabilizing Shampoo using Adam"
19 / 19 papers shown
Title
ASGO: Adaptive Structured Gradient Optimization
Kang An
Yuxing Liu
Rui Pan
Shiqian Ma
D. Goldfarb
Tong Zhang
ODL
82
2
0
26 Mar 2025
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation
Yaxiong Chen
Yujie Wang
Zixuan Zheng
Jingliang Hu
Yilei Shi
Shengwu Xiong
Xiao Xiang Zhu
Lichao Mou
47
1
0
18 Mar 2025
Structured Preconditioners in Adaptive Optimization: A Unified Analysis
Shuo Xie
Tianhao Wang
Sashank J. Reddi
Sanjiv Kumar
Zhiyuan Li
43
1
0
13 Mar 2025
CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models
Wei Dai
Peilin Chen
Malinda Lu
Daniel Li
Haowen Wei
Hejie Cui
Paul Pu Liang
LM&MA
42
1
0
09 Mar 2025
LapLoss: Laplacian Pyramid-based Multiscale loss for Image Translation
Krish Didwania
Ishaan Gakhar
Prakhar Arya
Sanskriti Labroo
54
0
0
07 Mar 2025
DEAL-YOLO: Drone-based Efficient Animal Localization using YOLO
Aditya Prashant Naidu
Hem Gosalia
Ishaan Gakhar
Shaurya Singh Rathore
Krish Didwania
Ujjwal Verma
46
0
0
06 Mar 2025
Deep Learning is Not So Mysterious or Different
Andrew Gordon Wilson
36
1
0
03 Mar 2025
NeoBERT: A Next-Generation BERT
Lola Le Breton
Quentin Fournier
Mariam El Mezouar
Sarath Chandar
AI4TS
58
1
0
26 Feb 2025
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
Jinbo Wang
Mingze Wang
Zhanpeng Zhou
Junchi Yan
Weinan E
Lei Wu
65
1
0
26 Feb 2025
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs
Liming Liu
Zhenghao Xu
Zixuan Zhang
Hao Kang
Zichong Li
Chen Liang
Weizhu Chen
T. Zhao
47
1
0
24 Feb 2025
Spectral-factorized Positive-definite Curvature Learning for NN Training
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Roger B. Grosse
39
0
0
10 Feb 2025
Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective
Sifan Wang
Ananyae Kumar Bhartari
Bowen Li
P. Perdikaris
PINN
49
3
0
02 Feb 2025
Physics of Skill Learning
Ziming Liu
Yizhou Liu
Eric J. Michaud
Jeff Gore
Max Tegmark
38
0
0
21 Jan 2025
FOCUS: First Order Concentrated Updating Scheme
Yizhou Liu
Ziming Liu
Jeff Gore
ODL
104
0
0
21 Jan 2025
Grams: Gradient Descent with Adaptive Momentum Scaling
Yang Cao
Xiaoyu Li
Zhao-quan Song
ODL
83
2
0
22 Dec 2024
Cautious Optimizers: Improving Training with One Line of Code
Kaizhao Liang
Lizhang Chen
B. Liu
Qiang Liu
ODL
90
5
0
25 Nov 2024
Moonshine: Speech Recognition for Live Transcription and Voice Commands
Nat Jeffries
Evan King
M. Kudlur
Guy Nicholson
James Wang
Pete Warden
29
5
0
21 Oct 2024
Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Alireza Makhzani
ODL
44
12
0
05 Feb 2024
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
124
349
0
01 Feb 2024
1