Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.10376
Cited By
The Two Regimes of Deep Network Training
24 February 2020
Guillaume Leclerc
Aleksander Madry
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The Two Regimes of Deep Network Training"
34 / 34 papers shown
Deep Progressive Training: scaling up depth capacity of zero/one-layer models
Zhiqi Bu
AI4CE
172
0
0
07 Nov 2025
Heavy-Ball Momentum Method in Continuous Time and Discretization Error Analysis
Bochen Lyu
Xiaojing Zhang
Fangyi Zheng
He Wang
Zheng Wang
Zhanxing Zhu
325
0
0
03 Jun 2025
Enlightenment Period Improving DNN Performance
Tiantian Liu
Meng Wan
Meng Wan
Jue Wang
275
0
0
02 Apr 2025
Collective variables of neural networks: empirical time evolution and scaling laws
S. Tovey
Sven Krippendorf
M. Spannowsky
Konstantin Nikolaou
Christian Holm
234
2
0
09 Oct 2024
The AdEMAMix Optimizer: Better, Faster, Older
International Conference on Learning Representations (ICLR), 2024
Matteo Pagliardini
Pierre Ablin
David Grangier
ODL
384
31
0
05 Sep 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
504
2
0
26 Aug 2024
Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Hristo Papazov
Scott Pesme
Nicolas Flammarion
375
10
0
08 Mar 2024
Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent
Naoki Sato
Hideaki Iiduka
ODL
538
1
0
04 Feb 2024
Signal Processing Meets SGD: From Momentum to Filter
Zhipeng Yao
Guisong Chang
Jiaqi Zhang
Qi Zhang
Dazhou Li
Yu Zhang
ODL
769
0
0
06 Nov 2023
When and Why Momentum Accelerates SGD:An Empirical Study
Jingwen Fu
Bohan Wang
Huishuai Zhang
Zhizheng Zhang
Wei Chen
Na Zheng
384
16
0
15 Jun 2023
A Rainbow in Deep Network Black Boxes
Florentin Guth
Brice Ménard
G. Rochette
S. Mallat
454
20
0
29 May 2023
Effective Neural Network
L
0
L_0
L
0
Regularization With BinMask
Kai Jia
Martin Rinard
353
3
0
21 Apr 2023
TRAK: Attributing Model Behavior at Scale
International Conference on Machine Learning (ICML), 2023
Sung Min Park
Kristian Georgiev
Andrew Ilyas
Guillaume Leclerc
Aleksander Madry
TDI
453
255
0
24 Mar 2023
Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width
Neural Information Processing Systems (NeurIPS), 2023
Dayal Singh Kalra
M. Barkeshli
352
18
0
23 Feb 2023
Continuized Acceleration for Quasar Convex Functions in Non-Convex Optimization
International Conference on Learning Representations (ICLR), 2023
Jun-Kun Wang
Andre Wibisono
253
20
0
15 Feb 2023
Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
A. Vanderschueren
Christophe De Vleeschouwer
MQ
200
14
0
02 Dec 2022
Spectral Evolution and Invariance in Linear-width Neural Networks
Neural Information Processing Systems (NeurIPS), 2022
Zhichao Wang
A. Engel
Anand D. Sarwate
Ioana Dumitriu
Tony Chiang
309
28
0
11 Nov 2022
Towards understanding how momentum improves generalization in deep learning
International Conference on Machine Learning (ICML), 2022
Samy Jelassi
Yuanzhi Li
ODL
MLT
AI4CE
231
53
0
13 Jul 2022
Training Your Sparse Neural Network Better with Any Mask
International Conference on Machine Learning (ICML), 2022
Ajay Jaiswal
Haoyu Ma
Tianlong Chen
Ying Ding
Zinan Lin
CVBM
391
39
0
26 Jun 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
Neural Information Processing Systems (NeurIPS), 2022
Jimmy Ba
Murat A. Erdogdu
Taiji Suzuki
Zhichao Wang
Denny Wu
Greg Yang
MLT
309
143
0
03 May 2022
On generalization bounds for deep networks based on loss surface implicit regularization
IEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2022
Masaaki Imaizumi
Johannes Schmidt-Hieber
ODL
437
5
0
12 Jan 2022
How I Learned to Stop Worrying and Love Retraining
International Conference on Learning Representations (ICLR), 2021
Max Zimmer
Christoph Spiegel
Sebastian Pokutta
CLL
291
15
0
01 Nov 2021
Improved architectures and training algorithms for deep operator networks
Sizhuang He
Hanwen Wang
P. Perdikaris
AI4CE
398
149
0
04 Oct 2021
A Generalizable Approach to Learning Optimizers
Diogo Almeida
Clemens Winter
Jie Tang
Wojciech Zaremba
AI4CE
393
35
0
02 Jun 2021
Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks
Neural Information Processing Systems (NeurIPS), 2021
Hidenori Tanaka
D. Kunin
383
45
0
06 May 2021
Learning to Optimize: A Primer and A Benchmark
Journal of machine learning research (JMLR), 2021
Tianlong Chen
Xiaohan Chen
Wuyang Chen
Howard Heaton
Jialin Liu
Zinan Lin
W. Yin
709
319
0
23 Mar 2021
Pufferfish: Communication-efficient Models At No Extra Cost
Conference on Machine Learning and Systems (MLSys), 2021
Hongyi Wang
Saurabh Agarwal
Dimitris Papailiopoulos
183
75
0
05 Mar 2021
Provable Super-Convergence with a Large Cyclical Learning Rate
IEEE Signal Processing Letters (IEEE SPL), 2021
Samet Oymak
312
16
0
22 Feb 2021
Implicit bias of deep linear networks in the large learning rate phase
Wei Huang
Weitao Du
R. Xu
Chunrui Liu
198
3
0
25 Nov 2020
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate
Jingfeng Wu
Difan Zou
Vladimir Braverman
Quanquan Gu
364
18
0
04 Nov 2020
Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel
Neural Information Processing Systems (NeurIPS), 2020
Stanislav Fort
Gintare Karolina Dziugaite
Mansheej Paul
Sepideh Kharaghani
Daniel M. Roy
Surya Ganguli
430
225
0
28 Oct 2020
Deep Networks and the Multiple Manifold Problem
International Conference on Learning Representations (ICLR), 2020
Sam Buchanan
D. Gilboa
John N. Wright
615
44
0
25 Aug 2020
Adaptive Gradient Methods for Constrained Convex Optimization and Variational Inequalities
AAAI Conference on Artificial Intelligence (AAAI), 2020
Alina Ene
Huy Le Nguyen
Adrian Vladu
ODL
340
34
0
17 Jul 2020
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
650
273
0
04 Mar 2020
1
Page 1 of 1