Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.03776
Cited By
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
9 May 2019
Daniel S. Park
Jascha Narain Sohl-Dickstein
Quoc V. Le
Samuel L. Smith
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study"
17 / 17 papers shown
Title
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
39
1
0
15 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
36
3
0
10 Jan 2025
Controlled Descent Training
Viktor Andersson
B. Varga
Vincent Szolnoky
Andreas Syrén
Rebecka Jörnsten
Balázs Kulcsár
43
1
0
16 Mar 2023
Bayesian Generational Population-Based Training
Xingchen Wan
Cong Lu
Jack Parker-Holder
Philip J. Ball
Vu-Linh Nguyen
Binxin Ru
Michael A. Osborne
OffRL
31
15
0
19 Jul 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
26
148
0
07 Mar 2022
Optimal learning rate schedules in high-dimensional non-convex optimization problems
Stéphane dÁscoli
Maria Refinetti
Giulio Biroli
16
7
0
09 Feb 2022
Resource-Aware Pareto-Optimal Automated Machine Learning Platform
Yao Yang
Andrew Nam
M. Nasr-Azadani
Teresa Tung
16
6
0
30 Oct 2020
It's Hard for Neural Networks To Learn the Game of Life
Jacob Mitchell Springer
Garrett Kenyon
16
21
0
03 Sep 2020
New Interpretations of Normalization Methods in Deep Learning
Jiacheng Sun
Xiangyong Cao
Hanwen Liang
Weiran Huang
Zewei Chen
Zhenguo Li
21
34
0
16 Jun 2020
To Each Optimizer a Norm, To Each Norm its Generalization
Sharan Vaswani
Reza Babanezhad
Jose Gallego
Aaron Mishkin
Simon Lacoste-Julien
Nicolas Le Roux
26
8
0
11 Jun 2020
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
159
234
0
04 Mar 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
Soham De
Samuel L. Smith
ODL
14
20
0
24 Feb 2020
On the infinite width limit of neural networks with a standard parameterization
Jascha Narain Sohl-Dickstein
Roman Novak
S. Schoenholz
Jaehoon Lee
24
47
0
21 Jan 2020
Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks
Sanjeev Arora
S. Du
Zhiyuan Li
Ruslan Salakhutdinov
Ruosong Wang
Dingli Yu
AAML
14
161
0
03 Oct 2019
The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks
Ryo Karakida
S. Akaho
S. Amari
21
39
0
07 Jun 2019
A Style-Based Generator Architecture for Generative Adversarial Networks
Tero Karras
S. Laine
Timo Aila
282
10,354
0
12 Dec 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
287
2,890
0
15 Sep 2016
1