v1v2v3v4 (latest)

Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent

7 December 2020

Papers citing "Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent"

27 / 27 papers shown

LLM-Assisted Modeling of Semantic Web-Enabled Multi-Agents Systems with AJAN

249

08 Oct 2025

SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training

290

29 May 2025

Homeostatic Ubiquity of Hebbian Dynamics in Regularized Learning Rules

403

23 May 2025

Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in TrainingInternational Conference on Learning Representations (ICLR), 2024

601

14 Oct 2024

How Learning Dynamics Drive Adversarially Robust Generalization?

Yuelin Xu

Xiao Zhang

AAML

572

10 Oct 2024

Formation of Representations in Neural NetworksInternational Conference on Learning Representations (ICLR), 2024

603

03 Oct 2024

Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum

H. Cai

Sulaiman A. Alghunaim

Ali H.Sayed

503

18 Jun 2024

Do Parameters Reveal More than Loss for Membership Inference?

524

17 Jun 2024

Towards Understanding Inductive Bias in Transformers: A View From Infinity

Itay Lavie

Guy Gur-Ari

Zohar Ringel

383

07 Feb 2024

Weight fluctuations in (deep) linear neural networks and a derivation of the inverse-variance flatness relationPhysical Review Research (Phys. Rev. Res.), 2023

Markus Gross

A. Raulf

Christoph Räth

611

23 Nov 2023

A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent

Mingze Wang

Lei Wu

521

01 Oct 2023

Exact Mean Square Linear Stability Analysis for SGDAnnual Conference Computational Learning Theory (COLT), 2023

Rotem Mulayoff

T. Michaeli

MLT

319

13 Jun 2023

Anti-Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances in Flat Directions

Marcel Kühn

B. Rosenow

422

08 Jun 2023

Decentralized SGD and Average-direction SAM are Asymptotically EquivalentInternational Conference on Machine Learning (ICML), 2023

774

05 Jun 2023

Enhance Diffusion to Improve Robust GeneralizationKnowledge Discovery and Data Mining (KDD), 2023

Jianhui Sun

Sanchit Sinha

Aidong Zhang

355

05 Jun 2023

The Implicit Regularization of Dynamical Stability in Stochastic Gradient DescentInternational Conference on Machine Learning (ICML), 2023

Lei Wu

Weijie J. Su

MLT

376

27 May 2023

On a continuous time model of gradient descent dynamics and instability in deep learning

506

03 Feb 2023

On the Lipschitz Constant of Deep Networks and Double DescentBritish Machine Vision Conference (BMVC), 2023

Matteo Gamba

Hossein Azizpour

Mårten Björkman

616

28 Jan 2023

On the Overlooked Structure of Stochastic GradientsNeural Information Processing Systems (NeurIPS), 2022

Zeke Xie

Qian-Yuan Tang

Mingming Sun

P. Li

339

05 Dec 2022

Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal StatesConference on Uncertainty in Artificial Intelligence (UAI), 2022

Ziqiao Wang

Yongyi Mao

383

19 Nov 2022

Exact Solutions of a Deep Linear NetworkNeural Information Processing Systems (NeurIPS), 2022

701

10 Feb 2022

Stochastic Neural Networks with Infinite Width are Deterministic

335

30 Jan 2022

SGD with a Constant Large Learning Rate Can Converge to Local Maxima

316

25 Jul 2021

The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous DiffusionNeural Computation (Neural Comput.), 2021

D. Kunin

Javier Sagastuy-Breña

603

19 Jul 2021

Power-law escape rate of SGDInternational Conference on Machine Learning (ICML), 2021

265

20 May 2021

On the Distributional Properties of Adaptive GradientsConference on Uncertainty in Artificial Intelligence (UAI), 2021

Z. Zhiyi

Liu Ziyin

207

15 May 2021

Strength of Minibatch Noise in SGDInternational Conference on Learning Representations (ICLR), 2021

403

10 Feb 2021