The large learning rate phase of deep learning: the catapult mechanism

4 March 2020

Aitor Lewkowycz

Yasaman Bahri

Ethan Dyer

Jascha Narain Sohl-Dickstein

Guy Gur-Ari

ODL

ArXiv (abs)PDF HTML

Papers citing "The large learning rate phase of deep learning: the catapult mechanism"

50 / 183 papers shown

Which Layer Causes Distribution Deviation? Entropy-Guided Adaptive Pruning for Diffusion and Flow Models

323

26 Nov 2025

On Measuring Localization of Shortcuts in Deep Networks

Nikita Tsoy

Nikola Konstantinov

234

30 Oct 2025

From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD

Konstantinos Christopher Tsiolis

Alireza Mousavi-Hosseini

Murat A. Erdogdu

MLT

159

23 Oct 2025

Training Dynamics Impact Post-Training Quantization Robustness

Albert Catalan-Tatjer

Niccolò Ajroldi

Jonas Geiping

263

07 Oct 2025

Topological Invariance and Breakdown in Learning

155

03 Oct 2025

Sharpness of Minima in Deep Matrix Factorization

Anil Kamber

Rahul Parhi

FAtt

459

30 Sep 2025

Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region

Shuang Liang

Guido Montúfar

308

29 Sep 2025

Intuition emerges in Maximum Caliber models at criticality

Lluís Arola-Fernández

204

08 Aug 2025

What Can Grokking Teach Us About Learning Under Nonstationarity?

239

26 Jul 2025

Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility

324

23 Jul 2025

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful

494

09 Jul 2025

Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks

D. Kunin

Giovanni Luca Marchetti

551

06 Jun 2025

Adaptive Preconditioners Trigger Loss Spikes in Adam

392

05 Jun 2025

Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape

422

27 May 2025

A Model Zoo on Phase Transitions in Neural Networks

452

25 Apr 2025

Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes

426

05 Apr 2025

Adaptive Unimodal Regulation for Balanced Multimodal Information AcquisitionComputer Vision and Pattern Recognition (CVPR), 2025

322

24 Mar 2025

On the Cone Effect in the Learning Dynamics

483

20 Mar 2025

A Minimalist Example of Edge-of-Stability and Progressive Sharpening

398

04 Mar 2025

Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos

Dayal Singh Kalra

Tianyu He

M. Barkeshli

432

17 Feb 2025

Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks

Pierfrancesco Beneventano

Blake Woodworth

MLT

503

15 Jan 2025

A ghost mechanism: An analytical model of abrupt learning in recurrent networks

381

04 Jan 2025

Can Stability be Detrimental? Better Generalization through Gradient Descent Instabilities

Lawrence Wang

Stephen J. Roberts

326

23 Dec 2024

Proportional infinite-width infinite-depth limit for deep linear neural networks

Federico Bassetti

Lucia Ladelli

P. Rotondo

476

22 Nov 2024

Theoretical characterisation of the Gauss-Newton conditioning in Neural NetworksNeural Information Processing Systems (NeurIPS), 2024

611

04 Nov 2024

Where Do Large Learning Rates Lead Us?Neural Information Processing Systems (NeurIPS), 2024

378

29 Oct 2024

Building a Multivariate Time Series Benchmarking Datasets Inspired by Natural Language Processing (NLP)

Mohammad Asif Ibna Mustafa

Ferdinand Heinrich

AI4TS

388

14 Oct 2024

Collective variables of neural networks: empirical time evolution and scaling laws

Sven Krippendorf

234

09 Oct 2024

Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural CollapseInternational Conference on Learning Representations (ICLR), 2024

353

07 Oct 2024

The Optimization Landscape of SGD Across the Feature Learning StrengthInternational Conference on Learning Representations (ICLR), 2024

Alexander B. Atanasov

Alexandru Meterez

James B. Simon

Cengiz Pehlevan

495

06 Oct 2024

Grokking at the Edge of Linear Separability

Alon Beck

Noam Levi

Yohai Bar-Sinai

464

06 Oct 2024

SGD with memory: fundamental properties and stochastic accelerationInternational Conference on Learning Representations (ICLR), 2024

Dmitry Yarotsky

Maksim Velikanov

438

05 Oct 2024

From Lazy to Rich: Exact Learning Dynamics in Deep Linear NetworksInternational Conference on Learning Representations (ICLR), 2024

506

22 Sep 2024

Efficient Training of Large Vision Models via Advanced Automated Progressive Learning

Changlin Li

307

06 Sep 2024

Do Sharpness-based Optimizers Improve Generalization in Medical Image Analysis?IEEE Access (IEEE Access), 2024

Mohamed Hassan

Aleksandar Vakanski

Min Xian

AAML MedIm

434

07 Aug 2024

Stepping on the Edge: Curvature Aware Learning Rate Tuners

459

08 Jul 2024

Normalization and effective learning rates in reinforcement learning

Will Dabney

394

01 Jul 2024

Why Warmup the Learning Rate? Underlying Mechanisms and Improvements

Dayal Singh Kalra

M. Barkeshli

540

13 Jun 2024

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Kaiqi Zhang

351

10 Jun 2024

Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learningNeural Information Processing Systems (NeurIPS), 2024

Feng Chen

384

10 Jun 2024

Error Bounds of Supervised Classification from Information-Theoretic Perspective

Binchuan Qi

Wei Gong

Li Li

319

07 Jun 2024

From Spikes to Heavy Tails: Unveiling the Spectral Evolution of Neural Networks

464

07 Jun 2024

Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks

Stefano Sarao Mannelli

Yaraslau Ivashinka

Andrew M. Saxe

Luca Saglietti

286

03 Jun 2024

Understanding Token Probability Encoding in Output Embeddings

365

03 Jun 2024

Mixed Dynamics In Linear Networks: Unifying the Lazy and Active Regimes

Zhenfeng Tu

Santiago Aranguri

Arthur Jacot

258

27 May 2024

Scalable Optimization in the Modular NormNeural Information Processing Systems (NeurIPS), 2024

Yang Liu

316

23 May 2024

Deep linear networks for regression are implicitly regularized towards flat minima

Pierre Marion

Lénaic Chizat

ODL

366

22 May 2024

Learning in PINNs: Phase transition, total diffusion, and generalization

Sokratis J. Anagnostopoulos

Juan Diego Toscano

Nikolaos Stergiopulos

George Karniadakis

279

27 Mar 2024

A Survey on Evaluation of Out-of-Distribution Generalization

Peng Cui

398

04 Mar 2024

Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning

Lorenzo Noci

Alexandru Meterez

Thomas Hofmann

Antonio Orvieto

268

27 Feb 2024