ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1710.06451
  4. Cited By
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
v1v2v3 (latest)

A Bayesian Perspective on Generalization and Stochastic Gradient Descent

17 October 2017
Samuel L. Smith
Quoc V. Le
    BDL
ArXiv (abs)PDFHTML

Papers citing "A Bayesian Perspective on Generalization and Stochastic Gradient Descent"

50 / 108 papers shown
Title
Finite Versus Infinite Neural Networks: an Empirical Study
Finite Versus Infinite Neural Networks: an Empirical Study
Jaehoon Lee
S. Schoenholz
Jeffrey Pennington
Ben Adlam
Lechao Xiao
Roman Novak
Jascha Narain Sohl-Dickstein
84
214
0
31 Jul 2020
Adaptive Periodic Averaging: A Practical Approach to Reducing
  Communication in Distributed Learning
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning
Peng Jiang
G. Agrawal
47
5
0
13 Jul 2020
Learning Rate Annealing Can Provably Help Generalization, Even for
  Convex Problems
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
64
21
0
15 May 2020
Pipelined Backpropagation at Scale: Training Large Models without
  Batches
Pipelined Backpropagation at Scale: Training Large Models without Batches
Atli Kosson
Vitaliy Chiley
Abhinav Venigalla
Joel Hestness
Urs Koster
107
33
0
25 Mar 2020
The large learning rate phase of deep learning: the catapult mechanism
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
215
241
0
04 Mar 2020
Rethinking Parameter Counting in Deep Models: Effective Dimensionality
  Revisited
Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited
Wesley J. Maddox
Gregory W. Benton
A. Wilson
132
61
0
04 Mar 2020
The Implicit and Explicit Regularization Effects of Dropout
The Implicit and Explicit Regularization Effects of Dropout
Colin Wei
Sham Kakade
Tengyu Ma
116
118
0
28 Feb 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function
  in Deep Networks
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks
Soham De
Samuel L. Smith
ODL
106
20
0
24 Feb 2020
The Two Regimes of Deep Network Training
The Two Regimes of Deep Network Training
Guillaume Leclerc
Aleksander Madry
94
45
0
24 Feb 2020
Rethinking the Hyperparameters for Fine-tuning
Rethinking the Hyperparameters for Fine-tuning
Hao Li
Pratik Chaudhari
Hao Yang
Michael Lam
Avinash Ravichandran
Rahul Bhotika
Stefano Soatto
VLM
93
130
0
19 Feb 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient
  Descent Exponentially Favors Flat Minima
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke Xie
Issei Sato
Masashi Sugiyama
ODL
127
17
0
10 Feb 2020
Optimized Generic Feature Learning for Few-shot Classification across
  Domains
Optimized Generic Feature Learning for Few-shot Classification across Domains
Tonmoy Saikia
Thomas Brox
Cordelia Schmid
VLM
77
49
0
22 Jan 2020
'Place-cell' emergence and learning of invariant data with restricted
  Boltzmann machines: breaking and dynamical restoration of continuous
  symmetries in the weight space
'Place-cell' emergence and learning of invariant data with restricted Boltzmann machines: breaking and dynamical restoration of continuous symmetries in the weight space
Moshir Harsh
J. Tubiana
Simona Cocco
R. Monasson
49
15
0
30 Dec 2019
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Jonathan Frankle
Gintare Karolina Dziugaite
Daniel M. Roy
Michael Carbin
MoMe
181
630
0
11 Dec 2019
Fantastic Generalization Measures and Where to Find Them
Fantastic Generalization Measures and Where to Find Them
Yiding Jiang
Behnam Neyshabur
H. Mobahi
Dilip Krishnan
Samy Bengio
AI4CE
148
611
0
04 Dec 2019
Orchestrating the Development Lifecycle of Machine Learning-Based IoT
  Applications: A Taxonomy and Survey
Orchestrating the Development Lifecycle of Machine Learning-Based IoT Applications: A Taxonomy and Survey
Bin Qian
Jie Su
Z. Wen
D. N. Jha
Yinhao Li
...
Albert Y. Zomaya
Omer F. Rana
Lizhe Wang
Maciej Koutny
R. Ranjan
56
4
0
11 Oct 2019
Beyond Human-Level Accuracy: Computational Challenges in Deep Learning
Beyond Human-Level Accuracy: Computational Challenges in Deep Learning
Joel Hestness
Newsha Ardalani
G. Diamos
61
68
0
03 Sep 2019
Deep Learning Theory Review: An Optimal Control and Dynamical Systems
  Perspective
Deep Learning Theory Review: An Optimal Control and Dynamical Systems Perspective
Guan-Horng Liu
Evangelos A. Theodorou
AI4CE
118
72
0
28 Aug 2019
Towards Better Generalization: BP-SVRG in Training Deep Neural Networks
Towards Better Generalization: BP-SVRG in Training Deep Neural Networks
Hao Jin
Dachao Lin
Zhihua Zhang
ODL
35
2
0
18 Aug 2019
Towards Explaining the Regularization Effect of Initial Large Learning
  Rate in Training Neural Networks
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
Yuanzhi Li
Colin Wei
Tengyu Ma
90
299
0
10 Jul 2019
How to Initialize your Network? Robust Initialization for WeightNorm &
  ResNets
How to Initialize your Network? Robust Initialization for WeightNorm & ResNets
Devansh Arpit
Victor Campos
Yoshua Bengio
73
59
0
05 Jun 2019
Dimensionality compression and expansion in Deep Neural Networks
Dimensionality compression and expansion in Deep Neural Networks
Stefano Recanatesi
M. Farrell
Madhu S. Advani
Timothy Moore
Guillaume Lajoie
E. Shea-Brown
77
74
0
02 Jun 2019
Are All Layers Created Equal?
Are All Layers Created Equal?
Chiyuan Zhang
Samy Bengio
Y. Singer
111
140
0
06 Feb 2019
Asymmetric Valleys: Beyond Sharp and Flat Local Minima
Asymmetric Valleys: Beyond Sharp and Flat Local Minima
Haowei He
Gao Huang
Yang Yuan
ODLMLT
79
150
0
02 Feb 2019
An Empirical Model of Large-Batch Training
An Empirical Model of Large-Batch Training
Sam McCandlish
Jared Kaplan
Dario Amodei
OpenAI Dota Team
76
280
0
14 Dec 2018
Parameter Re-Initialization through Cyclical Batch Size Schedules
Parameter Re-Initialization through Cyclical Batch Size Schedules
Norman Mu
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
ODL
70
8
0
04 Dec 2018
On the Computational Inefficiency of Large Batch Sizes for Stochastic
  Gradient Descent
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent
Noah Golmant
N. Vemuri
Z. Yao
Vladimir Feinberg
A. Gholami
Kai Rothauge
Michael W. Mahoney
Joseph E. Gonzalez
92
73
0
30 Nov 2018
Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash
Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash
Hiroaki Mikami
Hisahiro Suganuma
Pongsakorn U-chupala
Yoshiki Tanaka
Yuichi Kageyama
72
77
0
13 Nov 2018
Measuring the Effects of Data Parallelism on Neural Network Training
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
104
409
0
08 Nov 2018
Approximate Fisher Information Matrix to Characterise the Training of
  Deep Neural Networks
Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks
Zhibin Liao
Tom Drummond
Ian Reid
G. Carneiro
68
23
0
16 Oct 2018
Large batch size training of neural networks with adversarial training
  and second-order information
Large batch size training of neural networks with adversarial training and second-order information
Z. Yao
A. Gholami
Daiyaan Arfeen
Richard Liaw
Joseph E. Gonzalez
Kurt Keutzer
Michael W. Mahoney
ODL
96
42
0
02 Oct 2018
Fluctuation-dissipation relations for stochastic gradient descent
Fluctuation-dissipation relations for stochastic gradient descent
Sho Yaida
113
75
0
28 Sep 2018
Deep Bilevel Learning
Deep Bilevel Learning
Simon Jenni
Paolo Favaro
NoLa
69
115
0
05 Sep 2018
Don't Use Large Mini-Batches, Use Local SGD
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
121
432
0
22 Aug 2018
TherML: Thermodynamics of Machine Learning
TherML: Thermodynamics of Machine Learning
Alexander A. Alemi
Ian S. Fischer
DRLAI4CE
58
29
0
11 Jul 2018
Stochastic natural gradient descent draws posterior samples in function
  space
Stochastic natural gradient descent draws posterior samples in function space
Samuel L. Smith
Daniel Duckworth
Semon Rezchikov
Quoc V. Le
Jascha Narain Sohl-Dickstein
BDL
80
6
0
25 Jun 2018
PCA of high dimensional random walks with comparison to neural network
  training
PCA of high dimensional random walks with comparison to neural network training
J. Antognini
Jascha Narain Sohl-Dickstein
OOD
62
29
0
22 Jun 2018
Universal Statistics of Fisher Information in Deep Neural Networks: Mean
  Field Approach
Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach
Ryo Karakida
S. Akaho
S. Amari
FedML
191
146
0
04 Jun 2018
Understanding Batch Normalization
Understanding Batch Normalization
Johan Bjorck
Carla P. Gomes
B. Selman
Kilian Q. Weinberger
177
617
0
01 Jun 2018
Amortized Inference Regularization
Amortized Inference Regularization
Rui Shu
Hung Bui
Shengjia Zhao
Mykel J. Kochenderfer
Stefano Ermon
DRL
57
82
0
23 May 2018
Deep learning generalizes because the parameter-function map is biased
  towards simple functions
Deep learning generalizes because the parameter-function map is biased towards simple functions
Guillermo Valle Pérez
Chico Q. Camargo
A. Louis
MLTAI4CE
122
232
0
22 May 2018
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep
  Learning
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning
W. Wen
Yandan Wang
Feng Yan
Cong Xu
Chunpeng Wu
Yiran Chen
H. Li
79
51
0
21 May 2018
DNN or k-NN: That is the Generalize vs. Memorize Question
DNN or k-NN: That is the Generalize vs. Memorize Question
Gilad Cohen
Guillermo Sapiro
Raja Giryes
60
38
0
17 May 2018
Gaussian Process Behaviour in Wide Deep Neural Networks
Gaussian Process Behaviour in Wide Deep Neural Networks
A. G. Matthews
Mark Rowland
Jiri Hron
Richard Turner
Zoubin Ghahramani
BDL
177
561
0
30 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks
Revisiting Small Batch Training for Deep Neural Networks
Dominic Masters
Carlo Luschi
ODL
80
669
0
20 Apr 2018
A Study on Overfitting in Deep Reinforcement Learning
A Study on Overfitting in Deep Reinforcement Learning
Chiyuan Zhang
Oriol Vinyals
Rémi Munos
Samy Bengio
OffRLOnRL
61
391
0
18 Apr 2018
Training Tips for the Transformer Model
Training Tips for the Transformer Model
Martin Popel
Ondrej Bojar
86
312
0
01 Apr 2018
A disciplined approach to neural network hyper-parameters: Part 1 --
  learning rate, batch size, momentum, and weight decay
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay
L. Smith
295
1,036
0
26 Mar 2018
Gradient Descent Quantizes ReLU Network Features
Gradient Descent Quantizes ReLU Network Features
Hartmut Maennel
Olivier Bousquet
Sylvain Gelly
MLT
74
82
0
22 Mar 2018
A Walk with SGD
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
Yoshua Bengio
96
119
0
24 Feb 2018
Previous
123
Next