ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.10376
  4. Cited By
The Two Regimes of Deep Network Training

The Two Regimes of Deep Network Training

24 February 2020
Guillaume Leclerc
Aleksander Madry
ArXiv (abs)PDFHTML

Papers citing "The Two Regimes of Deep Network Training"

34 / 34 papers shown
Deep Progressive Training: scaling up depth capacity of zero/one-layer models
Deep Progressive Training: scaling up depth capacity of zero/one-layer models
Zhiqi Bu
AI4CE
172
0
0
07 Nov 2025
Heavy-Ball Momentum Method in Continuous Time and Discretization Error Analysis
Heavy-Ball Momentum Method in Continuous Time and Discretization Error Analysis
Bochen Lyu
Xiaojing Zhang
Fangyi Zheng
He Wang
Zheng Wang
Zhanxing Zhu
325
0
0
03 Jun 2025
Enlightenment Period Improving DNN Performance
Enlightenment Period Improving DNN Performance
Tiantian Liu
Meng Wan
Meng Wan
Jue Wang
275
0
0
02 Apr 2025
Collective variables of neural networks: empirical time evolution and
  scaling laws
Collective variables of neural networks: empirical time evolution and scaling laws
S. Tovey
Sven Krippendorf
M. Spannowsky
Konstantin Nikolaou
Christian Holm
234
2
0
09 Oct 2024
The AdEMAMix Optimizer: Better, Faster, Older
The AdEMAMix Optimizer: Better, Faster, OlderInternational Conference on Learning Representations (ICLR), 2024
Matteo Pagliardini
Pierre Ablin
David Grangier
ODL
384
31
0
05 Sep 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
504
2
0
26 Aug 2024
Leveraging Continuous Time to Understand Momentum When Training Diagonal
  Linear Networks
Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear NetworksInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Hristo Papazov
Scott Pesme
Nicolas Flammarion
375
10
0
08 Mar 2024
Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent
Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent
Naoki Sato
Hideaki Iiduka
ODL
538
1
0
04 Feb 2024
Signal Processing Meets SGD: From Momentum to Filter
Signal Processing Meets SGD: From Momentum to Filter
Zhipeng Yao
Guisong Chang
Jiaqi Zhang
Qi Zhang
Dazhou Li
Yu Zhang
ODL
769
0
0
06 Nov 2023
When and Why Momentum Accelerates SGD:An Empirical Study
When and Why Momentum Accelerates SGD:An Empirical Study
Jingwen Fu
Bohan Wang
Huishuai Zhang
Zhizheng Zhang
Wei Chen
Na Zheng
384
16
0
15 Jun 2023
A Rainbow in Deep Network Black Boxes
A Rainbow in Deep Network Black Boxes
Florentin Guth
Brice Ménard
G. Rochette
S. Mallat
454
20
0
29 May 2023
Effective Neural Network $L_0$ Regularization With BinMask
Effective Neural Network L0L_0L0​ Regularization With BinMask
Kai Jia
Martin Rinard
353
3
0
21 Apr 2023
TRAK: Attributing Model Behavior at Scale
TRAK: Attributing Model Behavior at ScaleInternational Conference on Machine Learning (ICML), 2023
Sung Min Park
Kristian Georgiev
Andrew Ilyas
Guillaume Leclerc
Aleksander Madry
TDI
453
255
0
24 Mar 2023
Phase diagram of early training dynamics in deep neural networks: effect
  of the learning rate, depth, and width
Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and widthNeural Information Processing Systems (NeurIPS), 2023
Dayal Singh Kalra
M. Barkeshli
352
18
0
23 Feb 2023
Continuized Acceleration for Quasar Convex Functions in Non-Convex
  Optimization
Continuized Acceleration for Quasar Convex Functions in Non-Convex OptimizationInternational Conference on Learning Representations (ICLR), 2023
Jun-Kun Wang
Andre Wibisono
253
20
0
15 Feb 2023
Are Straight-Through gradients and Soft-Thresholding all you need for
  Sparse Training?
Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
A. Vanderschueren
Christophe De Vleeschouwer
MQ
200
14
0
02 Dec 2022
Spectral Evolution and Invariance in Linear-width Neural Networks
Spectral Evolution and Invariance in Linear-width Neural NetworksNeural Information Processing Systems (NeurIPS), 2022
Zhichao Wang
A. Engel
Anand D. Sarwate
Ioana Dumitriu
Tony Chiang
309
28
0
11 Nov 2022
Towards understanding how momentum improves generalization in deep
  learning
Towards understanding how momentum improves generalization in deep learningInternational Conference on Machine Learning (ICML), 2022
Samy Jelassi
Yuanzhi Li
ODLMLTAI4CE
231
53
0
13 Jul 2022
Training Your Sparse Neural Network Better with Any Mask
Training Your Sparse Neural Network Better with Any MaskInternational Conference on Machine Learning (ICML), 2022
Ajay Jaiswal
Haoyu Ma
Tianlong Chen
Ying Ding
Zinan Lin
CVBM
391
39
0
26 Jun 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step
  Improves the Representation
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the RepresentationNeural Information Processing Systems (NeurIPS), 2022
Jimmy Ba
Murat A. Erdogdu
Taiji Suzuki
Zhichao Wang
Denny Wu
Greg Yang
MLT
309
143
0
03 May 2022
On generalization bounds for deep networks based on loss surface
  implicit regularization
On generalization bounds for deep networks based on loss surface implicit regularizationIEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2022
Masaaki Imaizumi
Johannes Schmidt-Hieber
ODL
437
5
0
12 Jan 2022
How I Learned to Stop Worrying and Love Retraining
How I Learned to Stop Worrying and Love RetrainingInternational Conference on Learning Representations (ICLR), 2021
Max Zimmer
Christoph Spiegel
Sebastian Pokutta
CLL
291
15
0
01 Nov 2021
Improved architectures and training algorithms for deep operator
  networks
Improved architectures and training algorithms for deep operator networks
Sizhuang He
Hanwen Wang
P. Perdikaris
AI4CE
398
149
0
04 Oct 2021
A Generalizable Approach to Learning Optimizers
A Generalizable Approach to Learning Optimizers
Diogo Almeida
Clemens Winter
Jie Tang
Wojciech Zaremba
AI4CE
393
35
0
02 Jun 2021
Noether's Learning Dynamics: Role of Symmetry Breaking in Neural
  Networks
Noether's Learning Dynamics: Role of Symmetry Breaking in Neural NetworksNeural Information Processing Systems (NeurIPS), 2021
Hidenori Tanaka
D. Kunin
383
45
0
06 May 2021
Learning to Optimize: A Primer and A Benchmark
Learning to Optimize: A Primer and A BenchmarkJournal of machine learning research (JMLR), 2021
Tianlong Chen
Xiaohan Chen
Wuyang Chen
Howard Heaton
Jialin Liu
Zinan Lin
W. Yin
709
319
0
23 Mar 2021
Pufferfish: Communication-efficient Models At No Extra Cost
Pufferfish: Communication-efficient Models At No Extra CostConference on Machine Learning and Systems (MLSys), 2021
Hongyi Wang
Saurabh Agarwal
Dimitris Papailiopoulos
183
75
0
05 Mar 2021
Provable Super-Convergence with a Large Cyclical Learning Rate
Provable Super-Convergence with a Large Cyclical Learning RateIEEE Signal Processing Letters (IEEE SPL), 2021
Samet Oymak
312
16
0
22 Feb 2021
Implicit bias of deep linear networks in the large learning rate phase
Implicit bias of deep linear networks in the large learning rate phase
Wei Huang
Weitao Du
R. Xu
Chunrui Liu
198
3
0
25 Nov 2020
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
  with Moderate Learning Rate
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate
Jingfeng Wu
Difan Zou
Vladimir Braverman
Quanquan Gu
364
18
0
04 Nov 2020
Deep learning versus kernel learning: an empirical study of loss
  landscape geometry and the time evolution of the Neural Tangent Kernel
Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent KernelNeural Information Processing Systems (NeurIPS), 2020
Stanislav Fort
Gintare Karolina Dziugaite
Mansheej Paul
Sepideh Kharaghani
Daniel M. Roy
Surya Ganguli
430
225
0
28 Oct 2020
Deep Networks and the Multiple Manifold Problem
Deep Networks and the Multiple Manifold ProblemInternational Conference on Learning Representations (ICLR), 2020
Sam Buchanan
D. Gilboa
John N. Wright
615
44
0
25 Aug 2020
Adaptive Gradient Methods for Constrained Convex Optimization and
  Variational Inequalities
Adaptive Gradient Methods for Constrained Convex Optimization and Variational InequalitiesAAAI Conference on Artificial Intelligence (AAAI), 2020
Alina Ene
Huy Le Nguyen
Adrian Vladu
ODL
340
34
0
17 Jul 2020
The large learning rate phase of deep learning: the catapult mechanism
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
650
273
0
04 Mar 2020
1
Page 1 of 1