ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1710.11029
  4. Cited By
Stochastic gradient descent performs variational inference, converges to
  limit cycles for deep networks
v1v2 (latest)

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

30 October 2017
Pratik Chaudhari
Stefano Soatto
    MLT
ArXiv (abs)PDFHTML

Papers citing "Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks"

50 / 112 papers shown
Title
Models of Heavy-Tailed Mechanistic Universality
Models of Heavy-Tailed Mechanistic Universality
Liam Hodgkinson
Zhichao Wang
Michael W. Mahoney
60
1
0
04 Jun 2025
SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training
SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training
Ildus Sadrtdinov
Ivan Klimov
E. Lobacheva
Dmitry Vetrov
22
0
0
29 May 2025
An Analytical Characterization of Sloppiness in Neural Networks: Insights from Linear Models
An Analytical Characterization of Sloppiness in Neural Networks: Insights from Linear Models
Jialin Mao
Itay Griniasty
Yan Sun
Mark K. Transtrum
James P. Sethna
Pratik Chaudhari
105
0
0
13 May 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zhangyang Wang
Shiwei Liu
114
6
0
12 Jan 2025
Extended convexity and smoothness and their applications in deep learning
Extended convexity and smoothness and their applications in deep learning
Binchuan Qi
Wei Gong
Li Li
105
0
0
08 Oct 2024
Enhancing selectivity using Wasserstein distance based reweighing
Enhancing selectivity using Wasserstein distance based reweighing
Pratik Worah
OOD
114
0
0
21 Jan 2024
Machine learning in and out of equilibrium
Machine learning in and out of equilibrium
Shishir Adhikari
Alkan Kabakcciouglu
A. Strang
Deniz Yuret
M. Hinczewski
60
5
0
06 Jun 2023
The Training Process of Many Deep Networks Explores the Same
  Low-Dimensional Manifold
The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold
Jialin Mao
Itay Griniasty
H. Teoh
Rahul Ramesh
Rubing Yang
Mark K. Transtrum
James P. Sethna
Pratik Chaudhari
3DPC
85
16
0
02 May 2023
Revisiting the Noise Model of Stochastic Gradient Descent
Revisiting the Noise Model of Stochastic Gradient Descent
Barak Battash
Ofir Lindenbaum
56
11
0
05 Mar 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Antonio Sclocchi
Mario Geiger
Matthieu Wyart
64
6
0
31 Jan 2023
An SDE for Modeling SAM: Theory and Insights
An SDE for Modeling SAM: Theory and Insights
Enea Monzio Compagnoni
Luca Biggio
Antonio Orvieto
F. Proske
Hans Kersting
Aurelien Lucchi
108
15
0
19 Jan 2023
Training trajectories, mini-batch losses and the curious role of the
  learning rate
Training trajectories, mini-batch losses and the curious role of the learning rate
Mark Sandler
A. Zhmoginov
Max Vladymyrov
Nolan Miller
ODL
81
12
0
05 Jan 2023
Accelerating Self-Supervised Learning via Efficient Training Strategies
Accelerating Self-Supervised Learning via Efficient Training Strategies
Mustafa Taha Koccyiugit
Timothy M. Hospedales
Hakan Bilen
SSL
66
8
0
11 Dec 2022
A picture of the space of typical learnable tasks
A picture of the space of typical learnable tasks
Rahul Ramesh
Jialin Mao
Itay Griniasty
Rubing Yang
H. Teoh
Mark K. Transtrum
James P. Sethna
Pratik Chaudhari
SSLDRL
95
5
0
31 Oct 2022
A note on diffusion limits for stochastic gradient descent
A note on diffusion limits for stochastic gradient descent
Alberto Lanconelli
Christopher S. A. Lauria
DiffM
55
1
0
20 Oct 2022
On Quantum Speedups for Nonconvex Optimization via Quantum Tunneling
  Walks
On Quantum Speedups for Nonconvex Optimization via Quantum Tunneling Walks
Yizhou Liu
Weijie J. Su
Tongyang Li
86
18
0
29 Sep 2022
PoF: Post-Training of Feature Extractor for Improving Generalization
PoF: Post-Training of Feature Extractor for Improving Generalization
Ikuro Sato
Ryota Yamada
Masayuki Tanaka
Nakamasa Inoue
Rei Kawakami
37
4
0
05 Jul 2022
Automatic Clipping: Differentially Private Deep Learning Made Easier and
  Stronger
Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger
Zhiqi Bu
Yu Wang
Sheng Zha
George Karypis
130
72
0
14 Jun 2022
Trajectory-dependent Generalization Bounds for Deep Neural Networks via
  Fractional Brownian Motion
Trajectory-dependent Generalization Bounds for Deep Neural Networks via Fractional Brownian Motion
Chengli Tan
Jiang Zhang
Junmin Liu
75
1
0
09 Jun 2022
Deep neural networks with dependent weights: Gaussian Process mixture
  limit, heavy tails, sparsity and compressibility
Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibility
Hoileong Lee
Fadhel Ayed
Paul Jung
Juho Lee
Hongseok Yang
François Caron
102
10
0
17 May 2022
Balanced Multimodal Learning via On-the-fly Gradient Modulation
Balanced Multimodal Learning via On-the-fly Gradient Modulation
Xiaokang Peng
Yake Wei
Andong Deng
Dong Wang
Di Hu
89
215
0
29 Mar 2022
Deep Networks on Toroids: Removing Symmetries Reveals the Structure of
  Flat Regions in the Landscape Geometry
Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry
Fabrizio Pittorino
Antonio Ferraro
Gabriele Perugini
Christoph Feinauer
Carlo Baldassi
R. Zecchina
263
26
0
07 Feb 2022
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective
Xiaowu Dai
Yuhua Zhu
42
4
0
02 Dec 2021
Kalman filters as the steady-state solution of gradient descent on
  variational free energy
Kalman filters as the steady-state solution of gradient descent on variational free energy
M. Baltieri
Takuya Isomura
46
6
0
20 Nov 2021
Does the Data Induce Capacity Control in Deep Learning?
Does the Data Induce Capacity Control in Deep Learning?
Rubing Yang
Jialin Mao
Pratik Chaudhari
117
16
0
27 Oct 2021
On the Regularization of Autoencoders
On the Regularization of Autoencoders
Harald Steck
Dario Garcia-Garcia
SSLAI4CE
51
4
0
21 Oct 2021
Imitating Deep Learning Dynamics via Locally Elastic Stochastic
  Differential Equations
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations
Jiayao Zhang
Hua Wang
Weijie J. Su
89
8
0
11 Oct 2021
Stochastic Training is Not Necessary for Generalization
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
173
76
0
29 Sep 2021
Neural TMDlayer: Modeling Instantaneous flow of features via SDE
  Generators
Neural TMDlayer: Modeling Instantaneous flow of features via SDE Generators
Zihang Meng
Vikas Singh
Sathya Ravi
44
1
0
19 Aug 2021
On the Hyperparameters in Stochastic Gradient Descent with Momentum
On the Hyperparameters in Stochastic Gradient Descent with Momentum
Bin Shi
95
14
0
09 Aug 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
  and Anomalous Diffusion
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
93
20
0
19 Jul 2021
The Bayesian Learning Rule
The Bayesian Learning Rule
Mohammad Emtiyaz Khan
Håvard Rue
BDL
159
82
0
09 Jul 2021
Implicit Gradient Alignment in Distributed and Federated Learning
Implicit Gradient Alignment in Distributed and Federated Learning
Yatin Dandi
Luis Barba
Martin Jaggi
FedML
131
35
0
25 Jun 2021
Repulsive Deep Ensembles are Bayesian
Repulsive Deep Ensembles are Bayesian
Francesco DÁngelo
Vincent Fortuin
UQCVBDL
125
101
0
22 Jun 2021
Drawing Multiple Augmentation Samples Per Image During Training
  Efficiently Decreases Test Error
Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error
Stanislav Fort
Andrew Brock
Razvan Pascanu
Soham De
Samuel L. Smith
64
32
0
27 May 2021
Lifelong Learning with Sketched Structural Regularization
Lifelong Learning with Sketched Structural Regularization
Haoran Li
A. Krishnan
Jingfeng Wu
Soheil Kolouri
Praveen K. Pilly
Vladimir Braverman
CLL
63
17
0
17 Apr 2021
On the Validity of Modeling SGD with Stochastic Differential Equations
  (SDEs)
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
104
80
0
24 Feb 2021
SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize
  Criticality
SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality
Courtney Paquette
Kiwon Lee
Fabian Pedregosa
Elliot Paquette
59
35
0
08 Feb 2021
On the Origin of Implicit Regularization in Stochastic Gradient Descent
On the Origin of Implicit Regularization in Stochastic Gradient Descent
Samuel L. Smith
Benoit Dherin
David Barrett
Soham De
MLT
49
204
0
28 Jan 2021
Phases of learning dynamics in artificial neural networks: with or
  without mislabeled data
Phases of learning dynamics in artificial neural networks: with or without mislabeled data
Yu Feng
Y. Tu
39
2
0
16 Jan 2021
Recent advances in deep learning theory
Recent advances in deep learning theory
Fengxiang He
Dacheng Tao
AI4CE
128
51
0
20 Dec 2020
Emergent Quantumness in Neural Networks
Emergent Quantumness in Neural Networks
M. Katsnelson
V. Vanchurin
91
23
0
09 Dec 2020
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning
  Dynamics
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics
D. Kunin
Javier Sagastuy-Breña
Surya Ganguli
Daniel L. K. Yamins
Hidenori Tanaka
167
80
0
08 Dec 2020
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient
  Descent
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent
Kangqiao Liu
Liu Ziyin
Masakuni Ueda
MLT
143
39
0
07 Dec 2020
Inductive Biases for Deep Learning of Higher-Level Cognition
Inductive Biases for Deep Learning of Higher-Level Cognition
Anirudh Goyal
Yoshua Bengio
AI4CE
103
365
0
30 Nov 2020
Positive-Congruent Training: Towards Regression-Free Model Updates
Positive-Congruent Training: Towards Regression-Free Model Updates
Sijie Yan
Yuanjun Xiong
Kaustav Kundu
Shuo Yang
Siqi Deng
Meng Wang
Wei Xia
Stefano Soatto
BDL
93
53
0
18 Nov 2020
Chaos and Complexity from Quantum Neural Network: A study with Diffusion
  Metric in Machine Learning
Chaos and Complexity from Quantum Neural Network: A study with Diffusion Metric in Machine Learning
S. Choudhury
Ankan Dutta
Debisree Ray
51
21
0
16 Nov 2020
Geometry Perspective Of Estimating Learning Capability Of Neural
  Networks
Geometry Perspective Of Estimating Learning Capability Of Neural Networks
Ankan Dutta
Arnab Rakshit
28
1
0
03 Nov 2020
Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM
  in Deep Learning
Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning
Pan Zhou
Jiashi Feng
Chao Ma
Caiming Xiong
Guosheng Lin
E. Weinan
101
235
0
12 Oct 2020
Reconciling Modern Deep Learning with Traditional Optimization Analyses:
  The Intrinsic Learning Rate
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
Zhiyuan Li
Kaifeng Lyu
Sanjeev Arora
112
75
0
06 Oct 2020
123
Next