The Two Regimes of Deep Network Training

24 February 2020

Guillaume Leclerc

Aleksander Madry

ArXiv (abs)PDF HTML

Papers citing "The Two Regimes of Deep Network Training"

34 / 34 papers shown

Deep Progressive Training: scaling up depth capacity of zero/one-layer models

Zhiqi Bu

AI4CE

172

07 Nov 2025

Heavy-Ball Momentum Method in Continuous Time and Discretization Error Analysis

325

03 Jun 2025

Enlightenment Period Improving DNN Performance

Tiantian Liu

Meng Wan

Jue Wang

275

02 Apr 2025

Collective variables of neural networks: empirical time evolution and scaling laws

Sven Krippendorf

234

09 Oct 2024

The AdEMAMix Optimizer: Better, Faster, OlderInternational Conference on Learning Representations (ICLR), 2024

Matteo Pagliardini

Pierre Ablin

David Grangier

ODL

384

05 Sep 2024

Can Optimization Trajectories Explain Multi-Task Transfer?

David Mueller

Mark Dredze

Nicholas Andrews

504

26 Aug 2024

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear NetworksInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024

Hristo Papazov

Scott Pesme

Nicolas Flammarion

375

08 Mar 2024

Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent

Naoki Sato

Hideaki Iiduka

ODL

538

04 Feb 2024

Signal Processing Meets SGD: From Momentum to Filter

769

06 Nov 2023

When and Why Momentum Accelerates SGD:An Empirical Study

384

15 Jun 2023

A Rainbow in Deep Network Black Boxes

454

29 May 2023

Effective Neural Network

L_0

Regularization With BinMask

Kai Jia

Martin Rinard

353

21 Apr 2023

TRAK: Attributing Model Behavior at ScaleInternational Conference on Machine Learning (ICML), 2023

Kristian Georgiev

453

255

24 Mar 2023

Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and widthNeural Information Processing Systems (NeurIPS), 2023

Dayal Singh Kalra

M. Barkeshli

352

23 Feb 2023

Continuized Acceleration for Quasar Convex Functions in Non-Convex OptimizationInternational Conference on Learning Representations (ICLR), 2023

Jun-Kun Wang

Andre Wibisono

253

15 Feb 2023

Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

A. Vanderschueren

Christophe De Vleeschouwer

200

02 Dec 2022

Spectral Evolution and Invariance in Linear-width Neural NetworksNeural Information Processing Systems (NeurIPS), 2022

309

11 Nov 2022

Towards understanding how momentum improves generalization in deep learningInternational Conference on Machine Learning (ICML), 2022

Samy Jelassi

Yuanzhi Li

ODL MLT AI4CE

231

13 Jul 2022

Training Your Sparse Neural Network Better with Any MaskInternational Conference on Machine Learning (ICML), 2022

391

26 Jun 2022

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the RepresentationNeural Information Processing Systems (NeurIPS), 2022

Jimmy Ba

309

143

03 May 2022

On generalization bounds for deep networks based on loss surface implicit regularizationIEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2022

Masaaki Imaizumi

Johannes Schmidt-Hieber

ODL

437

12 Jan 2022

How I Learned to Stop Worrying and Love RetrainingInternational Conference on Learning Representations (ICLR), 2021

291

01 Nov 2021

Improved architectures and training algorithms for deep operator networks

398

149

04 Oct 2021

A Generalizable Approach to Learning Optimizers

393

02 Jun 2021

Noether's Learning Dynamics: Role of Symmetry Breaking in Neural NetworksNeural Information Processing Systems (NeurIPS), 2021

Hidenori Tanaka

D. Kunin

383

06 May 2021

Learning to Optimize: A Primer and A BenchmarkJournal of machine learning research (JMLR), 2021

709

319

23 Mar 2021

Pufferfish: Communication-efficient Models At No Extra CostConference on Machine Learning and Systems (MLSys), 2021

Hongyi Wang

Saurabh Agarwal

Dimitris Papailiopoulos

183

05 Mar 2021

Provable Super-Convergence with a Large Cyclical Learning RateIEEE Signal Processing Letters (IEEE SPL), 2021

Samet Oymak

312

22 Feb 2021

Implicit bias of deep linear networks in the large learning rate phase

Wei Huang

Weitao Du

R. Xu

Chunrui Liu

198

25 Nov 2020

Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate

Jingfeng Wu

Difan Zou

Vladimir Braverman

Quanquan Gu

364

04 Nov 2020

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent KernelNeural Information Processing Systems (NeurIPS), 2020

Stanislav Fort

Gintare Karolina Dziugaite

430

225

28 Oct 2020

Deep Networks and the Multiple Manifold ProblemInternational Conference on Learning Representations (ICLR), 2020

Sam Buchanan

D. Gilboa

John N. Wright

615

25 Aug 2020

Adaptive Gradient Methods for Constrained Convex Optimization and Variational InequalitiesAAAI Conference on Artificial Intelligence (AAAI), 2020

340

17 Jul 2020

The large learning rate phase of deep learning: the catapult mechanism

Aitor Lewkowycz

Yasaman Bahri

Ethan Dyer

Jascha Narain Sohl-Dickstein

Guy Gur-Ari

ODL

650

273

04 Mar 2020