v1v2 (latest)

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

30 October 2017

Papers citing "Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks"

50 / 112 papers shown

Title
Models of Heavy-Tailed Mechanistic Universality Liam Hodgkinson Zhichao Wang Michael W. Mahoney 60 1 0 04 Jun 2025
SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training Ildus Sadrtdinov Ivan Klimov E. Lobacheva Dmitry Vetrov 22 0 0 29 May 2025
An Analytical Characterization of Sloppiness in Neural Networks: Insights from Linear Models Jialin Mao Itay Griniasty Yan Sun Mark K. Transtrum James P. Sethna Pratik Chaudhari 105 0 0 13 May 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training Tianjin Huang Ziquan Zhu Gaojie Jin Lu Liu Zhangyang Wang Shiwei Liu 114 6 0 12 Jan 2025
Extended convexity and smoothness and their applications in deep learning Binchuan Qi Wei Gong Li Li 105 0 0 08 Oct 2024
Enhancing selectivity using Wasserstein distance based reweighing Pratik Worah OOD 114 0 0 21 Jan 2024
Machine learning in and out of equilibrium Shishir Adhikari Alkan Kabakcciouglu A. Strang Deniz Yuret M. Hinczewski 60 5 0 06 Jun 2023
The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold Jialin Mao Itay Griniasty H. Teoh Rahul Ramesh Rubing Yang Mark K. Transtrum James P. Sethna Pratik Chaudhari 3DPC 85 16 0 02 May 2023
Revisiting the Noise Model of Stochastic Gradient Descent Barak Battash Ofir Lindenbaum 56 11 0 05 Mar 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning Antonio Sclocchi Mario Geiger Matthieu Wyart 64 6 0 31 Jan 2023
An SDE for Modeling SAM: Theory and Insights Enea Monzio Compagnoni Luca Biggio Antonio Orvieto F. Proske Hans Kersting Aurelien Lucchi 108 15 0 19 Jan 2023
Training trajectories, mini-batch losses and the curious role of the learning rate Mark Sandler A. Zhmoginov Max Vladymyrov Nolan Miller ODL 81 12 0 05 Jan 2023
Accelerating Self-Supervised Learning via Efficient Training Strategies Mustafa Taha Koccyiugit Timothy M. Hospedales Hakan Bilen SSL 66 8 0 11 Dec 2022
A picture of the space of typical learnable tasks Rahul Ramesh Jialin Mao Itay Griniasty Rubing Yang H. Teoh Mark K. Transtrum James P. Sethna Pratik Chaudhari SSL DRL 95 5 0 31 Oct 2022
A note on diffusion limits for stochastic gradient descent Alberto Lanconelli Christopher S. A. Lauria DiffM 55 1 0 20 Oct 2022
On Quantum Speedups for Nonconvex Optimization via Quantum Tunneling Walks Yizhou Liu Weijie J. Su Tongyang Li 86 18 0 29 Sep 2022
PoF: Post-Training of Feature Extractor for Improving Generalization Ikuro Sato Ryota Yamada Masayuki Tanaka Nakamasa Inoue Rei Kawakami 37 4 0 05 Jul 2022
Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger Zhiqi Bu Yu Wang Sheng Zha George Karypis 130 72 0 14 Jun 2022
Trajectory-dependent Generalization Bounds for Deep Neural Networks via Fractional Brownian Motion Chengli Tan Jiang Zhang Junmin Liu 75 1 0 09 Jun 2022
Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibility Hoileong Lee Fadhel Ayed Paul Jung Juho Lee Hongseok Yang François Caron 102 10 0 17 May 2022
Balanced Multimodal Learning via On-the-fly Gradient Modulation Xiaokang Peng Yake Wei Andong Deng Dong Wang Di Hu 89 215 0 29 Mar 2022
Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry Fabrizio Pittorino Antonio Ferraro Gabriele Perugini Christoph Feinauer Carlo Baldassi R. Zecchina 263 26 0 07 Feb 2022
On Large Batch Training and Sharp Minima: A Fokker-Planck Perspective Xiaowu Dai Yuhua Zhu 42 4 0 02 Dec 2021
Kalman filters as the steady-state solution of gradient descent on variational free energy M. Baltieri Takuya Isomura 46 6 0 20 Nov 2021
Does the Data Induce Capacity Control in Deep Learning? Rubing Yang Jialin Mao Pratik Chaudhari 117 16 0 27 Oct 2021
On the Regularization of Autoencoders Harald Steck Dario Garcia-Garcia SSL AI4CE 51 4 0 21 Oct 2021
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations Jiayao Zhang Hua Wang Weijie J. Su 89 8 0 11 Oct 2021
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 173 76 0 29 Sep 2021
Neural TMDlayer: Modeling Instantaneous flow of features via SDE Generators Zihang Meng Vikas Singh Sathya Ravi 44 1 0 19 Aug 2021
On the Hyperparameters in Stochastic Gradient Descent with Momentum Bin Shi 95 14 0 09 Aug 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion D. Kunin Javier Sagastuy-Breña Lauren Gillespie Eshed Margalit Hidenori Tanaka Surya Ganguli Daniel L. K. Yamins 93 20 0 19 Jul 2021
The Bayesian Learning Rule Mohammad Emtiyaz Khan Håvard Rue BDL 159 82 0 09 Jul 2021
Implicit Gradient Alignment in Distributed and Federated Learning Yatin Dandi Luis Barba Martin Jaggi FedML 131 35 0 25 Jun 2021
Repulsive Deep Ensembles are Bayesian Francesco DÁngelo Vincent Fortuin UQCV BDL 125 101 0 22 Jun 2021
Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error Stanislav Fort Andrew Brock Razvan Pascanu Soham De Samuel L. Smith 64 32 0 27 May 2021
Lifelong Learning with Sketched Structural Regularization Haoran Li A. Krishnan Jingfeng Wu Soheil Kolouri Praveen K. Pilly Vladimir Braverman CLL 63 17 0 17 Apr 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) Zhiyuan Li Sadhika Malladi Sanjeev Arora 104 80 0 24 Feb 2021
SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality Courtney Paquette Kiwon Lee Fabian Pedregosa Elliot Paquette 59 35 0 08 Feb 2021
On the Origin of Implicit Regularization in Stochastic Gradient Descent Samuel L. Smith Benoit Dherin David Barrett Soham De MLT 49 204 0 28 Jan 2021
Phases of learning dynamics in artificial neural networks: with or without mislabeled data Yu Feng Y. Tu 39 2 0 16 Jan 2021
Recent advances in deep learning theory Fengxiang He Dacheng Tao AI4CE 128 51 0 20 Dec 2020
Emergent Quantumness in Neural Networks M. Katsnelson V. Vanchurin 91 23 0 09 Dec 2020
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics D. Kunin Javier Sagastuy-Breña Surya Ganguli Daniel L. K. Yamins Hidenori Tanaka 167 80 0 08 Dec 2020
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent Kangqiao Liu Liu Ziyin Masakuni Ueda MLT 143 39 0 07 Dec 2020
Inductive Biases for Deep Learning of Higher-Level Cognition Anirudh Goyal Yoshua Bengio AI4CE 103 365 0 30 Nov 2020
Positive-Congruent Training: Towards Regression-Free Model Updates Sijie Yan Yuanjun Xiong Kaustav Kundu Shuo Yang Siqi Deng Meng Wang Wei Xia Stefano Soatto BDL 93 53 0 18 Nov 2020
Chaos and Complexity from Quantum Neural Network: A study with Diffusion Metric in Machine Learning S. Choudhury Ankan Dutta Debisree Ray 51 21 0 16 Nov 2020
Geometry Perspective Of Estimating Learning Capability Of Neural Networks Ankan Dutta Arnab Rakshit 28 1 0 03 Nov 2020
Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning Pan Zhou Jiashi Feng Chao Ma Caiming Xiong Guosheng Lin E. Weinan 101 235 0 12 Oct 2020
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate Zhiyuan Li Kaifeng Lyu Sanjeev Arora 112 75 0 06 Oct 2020