Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2003.02218
Cited By
The large learning rate phase of deep learning: the catapult mechanism
4 March 2020
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The large learning rate phase of deep learning: the catapult mechanism"
50 / 183 papers shown
Which Layer Causes Distribution Deviation? Entropy-Guided Adaptive Pruning for Diffusion and Flow Models
Changlin Li
Jiawei Zhang
Z. Shi
Zongxin Yang
Zhihui Li
Xiaojun Chang
DiffM
VLM
323
0
0
26 Nov 2025
On Measuring Localization of Shortcuts in Deep Networks
Nikita Tsoy
Nikola Konstantinov
234
1
0
30 Oct 2025
From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD
Konstantinos Christopher Tsiolis
Alireza Mousavi-Hosseini
Murat A. Erdogdu
MLT
159
0
0
23 Oct 2025
Training Dynamics Impact Post-Training Quantization Robustness
Albert Catalan-Tatjer
Niccolò Ajroldi
Jonas Geiping
MQ
263
2
0
07 Oct 2025
Topological Invariance and Breakdown in Learning
Yongyi Yang
Tomaso Poggio
Isaac Chuang
Liu Ziyin
155
0
0
03 Oct 2025
Sharpness of Minima in Deep Matrix Factorization
Anil Kamber
Rahul Parhi
FAtt
459
2
0
30 Sep 2025
Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
Shuang Liang
Guido Montúfar
308
3
0
29 Sep 2025
Intuition emerges in Maximum Caliber models at criticality
Lluís Arola-Fernández
204
0
0
08 Aug 2025
What Can Grokking Teach Us About Learning Under Nonstationarity?
Clare Lyle
Gharda Sokar
Razvan Pascanu
András Gyorgy
239
5
0
26 Jul 2025
Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility
Melih Barsbey
Lucas Prieto
Stefanos Zafeiriou
Tolga Birdal
324
0
0
23 Jul 2025
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
Martin Marek
Sanae Lotfi
Aditya Somasundaram
A. Wilson
Micah Goldblum
LRM
494
27
0
09 Jul 2025
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
D. Kunin
Giovanni Luca Marchetti
F. Chen
Dhruva Karkada
James B. Simon
M. DeWeese
Surya Ganguli
Nina Miolane
551
7
0
06 Jun 2025
Adaptive Preconditioners Trigger Loss Spikes in Adam
Zhiwei Bai
Zhangchen Zhou
Jiajie Zhao
Xiaolong Li
Zhiyu Li
Feiyu Xiong
Hongkang Yang
Yaoyu Zhang
Z. Xu
ODL
392
3
0
05 Jun 2025
Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape
Ioannis Bantzis
James B. Simon
Arthur Jacot
ODL
422
2
0
27 May 2025
A Model Zoo on Phase Transitions in Neural Networks
Konstantin Schurholt
Léo Meynent
Yefan Zhou
Haiquan Lu
Yaoqing Yang
Damian Borth
452
3
0
25 Apr 2025
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
426
7
0
05 Apr 2025
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
Computer Vision and Pattern Recognition (CVPR), 2025
Chengxiang Huang
Yake Wei
Zequn Yang
D. Hu
322
13
0
24 Mar 2025
On the Cone Effect in the Learning Dynamics
Zhanpeng Zhou
Yongyi Yang
Jie Ren
Mahito Sugiyama
Junchi Yan
483
1
0
20 Mar 2025
A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Liming Liu
Zixuan Zhang
S. Du
T. Zhao
398
1
0
04 Mar 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
432
13
0
17 Feb 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
503
4
0
15 Jan 2025
A ghost mechanism: An analytical model of abrupt learning in recurrent networks
Fatih Dinc
Ege Cirakman
Yiqi Jiang
Mert Yuksekgonul
Mark J. Schnitzer
Hidenori Tanaka
Hidenori Tanaka
381
4
0
04 Jan 2025
Can Stability be Detrimental? Better Generalization through Gradient Descent Instabilities
Lawrence Wang
Stephen J. Roberts
326
0
0
23 Dec 2024
Proportional infinite-width infinite-depth limit for deep linear neural networks
Federico Bassetti
Lucia Ladelli
P. Rotondo
476
3
0
22 Nov 2024
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Neural Information Processing Systems (NeurIPS), 2024
Jim Zhao
Sidak Pal Singh
Aurelien Lucchi
AI4CE
611
6
0
04 Nov 2024
Where Do Large Learning Rates Lead Us?
Neural Information Processing Systems (NeurIPS), 2024
Ildus Sadrtdinov
M. Kodryan
Eduard Pokonechny
E. Lobacheva
Dmitry Vetrov
AI4CE
378
6
0
29 Oct 2024
Building a Multivariate Time Series Benchmarking Datasets Inspired by Natural Language Processing (NLP)
Mohammad Asif Ibna Mustafa
Ferdinand Heinrich
AI4TS
388
0
0
14 Oct 2024
Collective variables of neural networks: empirical time evolution and scaling laws
S. Tovey
Sven Krippendorf
M. Spannowsky
Konstantin Nikolaou
Christian Holm
234
2
0
09 Oct 2024
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse
International Conference on Learning Representations (ICLR), 2024
Arthur Jacot
Peter Súkeník
Zihan Wang
Marco Mondelli
353
14
0
07 Oct 2024
The Optimization Landscape of SGD Across the Feature Learning Strength
International Conference on Learning Representations (ICLR), 2024
Alexander B. Atanasov
Alexandru Meterez
James B. Simon
Cengiz Pehlevan
495
12
0
06 Oct 2024
Grokking at the Edge of Linear Separability
Alon Beck
Noam Levi
Yohai Bar-Sinai
464
6
0
06 Oct 2024
SGD with memory: fundamental properties and stochastic acceleration
International Conference on Learning Representations (ICLR), 2024
Dmitry Yarotsky
Maksim Velikanov
438
1
0
05 Oct 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
International Conference on Learning Representations (ICLR), 2024
Clémentine Dominé
Nicolas Anguita
A. Proca
Lukas Braun
D. Kunin
P. Mediano
Andrew M. Saxe
506
35
0
22 Sep 2024
Efficient Training of Large Vision Models via Advanced Automated Progressive Learning
Changlin Li
Jiawei Zhang
Sihao Lin
Zongxin Yang
Junwei Liang
Xiaodan Liang
Xiaojun Chang
VLM
307
2
0
06 Sep 2024
Do Sharpness-based Optimizers Improve Generalization in Medical Image Analysis?
IEEE Access (IEEE Access), 2024
Mohamed Hassan
Aleksandar Vakanski
Min Xian
AAML
MedIm
434
3
0
07 Aug 2024
Stepping on the Edge: Curvature Aware Learning Rate Tuners
Vincent Roulet
Atish Agarwala
Jean-Bastien Grill
Grzegorz Swirszcz
Mathieu Blondel
Fabian Pedregosa
459
6
0
08 Jul 2024
Normalization and effective learning rates in reinforcement learning
Clare Lyle
Zeyu Zheng
Khimya Khetarpal
James Martens
H. V. Hasselt
Razvan Pascanu
Will Dabney
394
43
0
01 Jul 2024
Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
Dayal Singh Kalra
M. Barkeshli
540
63
0
13 Jun 2024
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes
Dan Qiao
Kaiqi Zhang
Esha Singh
Daniel Soudry
Yu-Xiang Wang
NoLa
351
9
0
10 Jun 2024
Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning
Neural Information Processing Systems (NeurIPS), 2024
D. Kunin
Allan Raventós
Clémentine Dominé
Feng Chen
David Klindt
Andrew M. Saxe
Surya Ganguli
MLT
384
34
0
10 Jun 2024
Error Bounds of Supervised Classification from Information-Theoretic Perspective
Binchuan Qi
Wei Gong
Li Li
319
0
0
07 Jun 2024
From Spikes to Heavy Tails: Unveiling the Spectral Evolution of Neural Networks
Vignesh Kothapalli
Tianyu Pang
Shenyang Deng
Zongmin Liu
Yaoqing Yang
464
4
0
07 Jun 2024
Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks
Stefano Sarao Mannelli
Yaraslau Ivashinka
Andrew M. Saxe
Luca Saglietti
286
10
0
03 Jun 2024
Understanding Token Probability Encoding in Output Embeddings
Hakaze Cho
Yoshihiro Sakai
Kenshiro Tanaka
Mariko Kato
Naoya Inoue
365
4
0
03 Jun 2024
Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes
Zhenfeng Tu
Santiago Aranguri
Arthur Jacot
258
16
0
27 May 2024
Scalable Optimization in the Modular Norm
Neural Information Processing Systems (NeurIPS), 2024
Tim Large
Yang Liu
Minyoung Huh
Hyojin Bahng
Phillip Isola
Jeremy Bernstein
316
42
0
23 May 2024
Deep linear networks for regression are implicitly regularized towards flat minima
Pierre Marion
Lénaic Chizat
ODL
366
17
0
22 May 2024
Learning in PINNs: Phase transition, total diffusion, and generalization
Sokratis J. Anagnostopoulos
Juan Diego Toscano
Nikolaos Stergiopulos
George Karniadakis
279
17
0
27 Mar 2024
A Survey on Evaluation of Out-of-Distribution Generalization
Han Yu
Tianyu Wang
Xingxuan Zhang
Jiayun Wu
Peng Cui
OOD
398
23
0
04 Mar 2024
Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning
Lorenzo Noci
Alexandru Meterez
Thomas Hofmann
Antonio Orvieto
268
1
0
27 Feb 2024
1
2
3
4
Next
Page 1 of 4