Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1710.06451
Cited By
v1
v2
v3 (latest)
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
17 October 2017
Samuel L. Smith
Quoc V. Le
BDL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Bayesian Perspective on Generalization and Stochastic Gradient Descent"
50 / 108 papers shown
Title
Variational Learning Finds Flatter Solutions at the Edge of Stability
Avrajit Ghosh
Bai Cong
Rio Yokota
S. Ravishankar
Rongrong Wang
Molei Tao
Mohammad Emtiyaz Khan
Thomas Möllenhoff
MLT
9
0
0
15 Jun 2025
Characterising the Inductive Biases of Neural Networks on Boolean Data
Chris Mingard
Lukas Seier
Niclas Goring
Andrei-Vlad Badelita
Charles London
Ard A. Louis
AI4CE
33
0
0
29 May 2025
SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training
Ildus Sadrtdinov
Ivan Klimov
E. Lobacheva
Dmitry Vetrov
25
0
0
29 May 2025
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
Xinyue Zeng
Haohui Wang
Junhong Lin
Jun Wu
Tyler Cody
Dawei Zhou
429
0
0
01 May 2025
Generalization through variance: how noise shapes inductive biases in diffusion models
John J. Vastola
DiffM
486
5
0
16 Apr 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
94
1
0
15 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
115
4
0
10 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
464
0
0
30 Dec 2024
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
179
18
0
29 Oct 2024
Continual learning with the neural tangent ensemble
Ari S. Benjamin
Christian Pehle
Kyle Daruwalla
UQCV
135
1
0
30 Aug 2024
Spring-block theory of feature learning in deep neural networks
Chengzhi Shi
Liming Pan
Ivan Dokmanić
AI4CE
132
1
0
28 Jul 2024
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Shuaipeng Li
Penghao Zhao
Hailin Zhang
Xingwu Sun
Hao Wu
...
Zheng Fang
Jinbao Xue
Yangyu Tao
Tengjiao Wang
Di Wang
80
9
0
23 May 2024
Variational Stochastic Gradient Descent for Deep Neural Networks
Haotian Chen
Anna Kuzina
Babak Esmaeili
Jakub M. Tomczak
104
0
0
09 Apr 2024
Information-Theoretic Generalization Bounds for Deep Neural Networks
Haiyun He
Christina Lee Yu
101
6
0
04 Apr 2024
Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms
Toki Tahmid Inan
Mingrui Liu
Amarda Shehu
59
0
0
01 Mar 2024
Emergence of heavy tails in homogenized stochastic gradient descent
Zhe Jiao
Martin Keller-Ressel
51
1
0
02 Feb 2024
FisherRF: Active View Selection and Uncertainty Quantification for Radiance Fields using Fisher Information
Wen Jiang
Boshu Lei
Kostas Daniilidis
3DGS
87
35
0
29 Nov 2023
The Pursuit of Human Labeling: A New Perspective on Unsupervised Learning
Artyom Gadetsky
Maria Brbić
69
7
0
06 Nov 2023
Flatness-Aware Minimization for Domain Generalization
Xingxuan Zhang
Renzhe Xu
Han Yu
Yancheng Dong
Pengfei Tian
Peng Cu
83
22
0
20 Jul 2023
Taming Resource Heterogeneity In Distributed ML Training With Dynamic Batching
S. Tyagi
Prateek Sharma
81
22
0
20 May 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
59
7
0
19 Feb 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Antonio Sclocchi
Mario Geiger
Matthieu Wyart
64
6
0
31 Jan 2023
Effects of Data Geometry in Early Deep Learning
Saket Tiwari
George Konidaris
148
7
0
29 Dec 2022
Likelihood-based generalization of Markov parameter estimation and multiple shooting objectives in system identification
Nicholas Galioto
Alex Arkady Gorodetsky
141
1
0
20 Dec 2022
Error-aware Quantization through Noise Tempering
Zheng Wang
Juncheng Billy Li
Shuhui Qu
Florian Metze
Emma Strubell
MQ
38
2
0
11 Dec 2022
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
Shachar Don-Yehiya
Elad Venezian
Colin Raffel
Noam Slonim
Yoav Katz
Leshem Choshen
MoMe
102
55
0
02 Dec 2022
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Andrei Atanov
Andrei Filatov
Teresa Yeo
Ajay Sohmshetty
Amir Zamir
OOD
132
10
0
01 Dec 2022
On the Maximum Hessian Eigenvalue and Generalization
Simran Kaur
Jérémy E. Cohen
Zachary Chase Lipton
101
43
0
21 Jun 2022
Sharpness-Aware Minimization Improves Language Model Generalization
Dara Bahri
H. Mobahi
Yi Tay
180
104
0
16 Oct 2021
Implicit Gradient Alignment in Distributed and Federated Learning
Yatin Dandi
Luis Barba
Martin Jaggi
FedML
131
35
0
25 Jun 2021
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Xiangning Chen
Cho-Jui Hsieh
Boqing Gong
ViT
106
329
0
03 Jun 2021
Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error
Stanislav Fort
Andrew Brock
Razvan Pascanu
Soham De
Samuel L. Smith
64
32
0
27 May 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
104
80
0
24 Feb 2021
On the Origin of Implicit Regularization in Stochastic Gradient Descent
Samuel L. Smith
Benoit Dherin
David Barrett
Soham De
MLT
52
204
0
28 Jan 2021
Robustness, Privacy, and Generalization of Adversarial Training
Fengxiang He
Shaopeng Fu
Bohan Wang
Dacheng Tao
114
10
0
25 Dec 2020
Recent advances in deep learning theory
Fengxiang He
Dacheng Tao
AI4CE
128
51
0
20 Dec 2020
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics
D. Kunin
Javier Sagastuy-Breña
Surya Ganguli
Daniel L. K. Yamins
Hidenori Tanaka
167
80
0
08 Dec 2020
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent
Kangqiao Liu
Liu Ziyin
Masakuni Ueda
MLT
143
39
0
07 Dec 2020
Inductive Biases for Deep Learning of Higher-Level Cognition
Anirudh Goyal
Yoshua Bengio
AI4CE
103
365
0
30 Nov 2020
Contrastive Weight Regularization for Large Minibatch SGD
Qiwei Yuan
Weizhe Hua
Yi Zhou
Cunxi Yu
OffRL
72
1
0
17 Nov 2020
Data-efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions
Jianan Wang
Boyang Albert Li
Xiangyu Fan
Jing-Hua Lin
Yanwei Fu
34
2
0
15 Nov 2020
A Bayesian Perspective on Training Speed and Model Selection
Clare Lyle
Lisa Schut
Binxin Ru
Y. Gal
Mark van der Wilk
99
24
0
27 Oct 2020
Deep Learning is Singular, and That's Good
Daniel Murfet
Susan Wei
Biwei Huang
Hui Li
Jesse Gell-Redman
T. Quella
UQCV
79
29
0
22 Oct 2020
How Data Augmentation affects Optimization for Linear Regression
Boris Hanin
Yi Sun
81
16
0
21 Oct 2020
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
Zhao Chen
Jiquan Ngiam
Yanping Huang
Thang Luong
Henrik Kretzschmar
Yuning Chai
Dragomir Anguelov
90
221
0
14 Oct 2020
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
Zhiyuan Li
Kaifeng Lyu
Sanjeev Arora
112
75
0
06 Oct 2020
Implicit Gradient Regularization
David Barrett
Benoit Dherin
98
152
0
23 Sep 2020
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
Aurick Qiao
Sang Keun Choe
Suhas Jayaram Subramanya
Willie Neiswanger
Qirong Ho
Hao Zhang
G. Ganger
Eric Xing
VLM
77
182
0
27 Aug 2020
HydaLearn: Highly Dynamic Task Weighting for Multi-task Learning with Auxiliary Tasks
Sam Verboven
M. H. Chaudhary
Jeroen Berrevoets
Wouter Verbeke
52
7
0
26 Aug 2020
Intelligence plays dice: Stochasticity is essential for machine learning
M. Sabuncu
127
6
0
17 Aug 2020
1
2
3
Next