The Effect of Network Width on Stochastic Gradient Descent and
Generalization: an Empirical Study

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study

9 May 2019

Jascha Narain Sohl-Dickstein

Samuel L. Smith

Papers citing "The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study"

17 / 17 papers shown

Title
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks Pierfrancesco Beneventano Blake Woodworth MLT 39 1 0 15 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit Oleg Filatov Jan Ebert Jiangtao Wang Stefan Kesselheim 36 3 0 10 Jan 2025
Controlled Descent Training Viktor Andersson B. Varga Vincent Szolnoky Andreas Syrén Rebecka Jörnsten Balázs Kulcsár 43 1 0 16 Mar 2023
Bayesian Generational Population-Based Training Xingchen Wan Cong Lu Jack Parker-Holder Philip J. Ball Vu-Linh Nguyen Binxin Ru Michael A. Osborne OffRL 31 15 0 19 Jul 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer Greg Yang J. E. Hu Igor Babuschkin Szymon Sidor Xiaodong Liu David Farhi Nick Ryder J. Pachocki Weizhu Chen Jianfeng Gao 26 148 0 07 Mar 2022
Optimal learning rate schedules in high-dimensional non-convex optimization problems Stéphane dÁscoli Maria Refinetti Giulio Biroli 16 7 0 09 Feb 2022
Resource-Aware Pareto-Optimal Automated Machine Learning Platform Yao Yang Andrew Nam M. Nasr-Azadani Teresa Tung 16 6 0 30 Oct 2020
It's Hard for Neural Networks To Learn the Game of Life Jacob Mitchell Springer Garrett Kenyon 16 21 0 03 Sep 2020
New Interpretations of Normalization Methods in Deep Learning Jiacheng Sun Xiangyong Cao Hanwen Liang Weiran Huang Zewei Chen Zhenguo Li 21 34 0 16 Jun 2020
To Each Optimizer a Norm, To Each Norm its Generalization Sharan Vaswani Reza Babanezhad Jose Gallego Aaron Mishkin Simon Lacoste-Julien Nicolas Le Roux 26 8 0 11 Jun 2020
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 159 234 0 04 Mar 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks Soham De Samuel L. Smith ODL 14 20 0 24 Feb 2020
On the infinite width limit of neural networks with a standard parameterization Jascha Narain Sohl-Dickstein Roman Novak S. Schoenholz Jaehoon Lee 24 47 0 21 Jan 2020
Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks Sanjeev Arora S. Du Zhiyuan Li Ruslan Salakhutdinov Ruosong Wang Dingli Yu AAML 14 161 0 03 Oct 2019
The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks Ryo Karakida S. Akaho S. Amari 21 39 0 07 Jun 2019
A Style-Based Generator Architecture for Generative Adversarial Networks Tero Karras S. Laine Timo Aila 282 10,354 0 12 Dec 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 287 2,890 0 15 Sep 2016