ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.04740
  4. Cited By
The Heavy-Tail Phenomenon in SGD
v1v2v3v4v5 (latest)

The Heavy-Tail Phenomenon in SGD

8 June 2020
Mert Gurbuzbalaban
Umut Simsekli
Lingjiong Zhu
ArXiv (abs)PDFHTML

Papers citing "The Heavy-Tail Phenomenon in SGD"

38 / 38 papers shown
Title
Variational Learning Finds Flatter Solutions at the Edge of Stability
Variational Learning Finds Flatter Solutions at the Edge of Stability
Avrajit Ghosh
Bai Cong
Rio Yokota
S. Ravishankar
Rongrong Wang
Molei Tao
Mohammad Emtiyaz Khan
Thomas Möllenhoff
MLT
16
0
0
15 Jun 2025
Complexity of normalized stochastic first-order methods with momentum under heavy-tailed noise
Complexity of normalized stochastic first-order methods with momentum under heavy-tailed noise
Chuan He
Zhaosong Lu
Defeng Sun
Zhanwang Deng
18
0
0
12 Jun 2025
Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias
Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias
Yuanzhe Hu
Kinshuk Goel
Vlad Killiakov
Yaoqing Yang
43
2
0
06 Jun 2025
Models of Heavy-Tailed Mechanistic Universality
Models of Heavy-Tailed Mechanistic Universality
Liam Hodgkinson
Zhichao Wang
Michael W. Mahoney
63
1
0
04 Jun 2025
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees
Nonlinear Stochastic Gradient Descent and Heavy-tailed Noise: A Unified Framework and High-probability Guarantees
Aleksandar Armacki
Shuhua Yu
Pranay Sharma
Gauri Joshi
Dragana Bajović
D. Jakovetić
S. Kar
115
2
0
17 Oct 2024
Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets
Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets
Benjamin Dupuis
Paul Viallard
George Deligiannidis
Umut Simsekli
129
5
0
26 Apr 2024
Emergence of heavy tails in homogenized stochastic gradient descent
Emergence of heavy tails in homogenized stochastic gradient descent
Zhe Jiao
Martin Keller-Ressel
51
1
0
02 Feb 2024
A Heavy-Tailed Algebra for Probabilistic Programming
A Heavy-Tailed Algebra for Probabilistic Programming
Feynman T. Liang
Liam Hodgkinson
Michael W. Mahoney
67
3
0
15 Jun 2023
Type-II Saddles and Probabilistic Stability of Stochastic Gradient
  Descent
Type-II Saddles and Probabilistic Stability of Stochastic Gradient Descent
Liu Ziyin
Botao Li
Tomer Galanti
Masakuni Ueda
90
7
0
23 Mar 2023
Distributionally Robust Learning with Weakly Convex Losses: Convergence
  Rates and Finite-Sample Guarantees
Distributionally Robust Learning with Weakly Convex Losses: Convergence Rates and Finite-Sample Guarantees
Landi Zhu
Mert Gurbuzbalaban
A. Ruszczynski
74
7
0
16 Jan 2023
On the Overlooked Structure of Stochastic Gradients
On the Overlooked Structure of Stochastic Gradients
Zeke Xie
Qian-Yuan Tang
Mingming Sun
P. Li
92
6
0
05 Dec 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of
  SGD via Training Trajectories and via Terminal States
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Ziqiao Wang
Yongyi Mao
111
12
0
19 Nov 2022
Taming Fat-Tailed ("Heavier-Tailed'' with Potentially Infinite Variance)
  Noise in Federated Learning
Taming Fat-Tailed ("Heavier-Tailed'' with Potentially Infinite Variance) Noise in Federated Learning
Haibo Yang
Pei-Yuan Qiu
Jia Liu
FedML
74
12
0
03 Oct 2022
Tailoring to the Tails: Risk Measures for Fine-Grained Tail Sensitivity
Tailoring to the Tails: Risk Measures for Fine-Grained Tail Sensitivity
Christian Frohlich
Robert C. Williamson
53
5
0
05 Aug 2022
Deep neural networks with dependent weights: Gaussian Process mixture
  limit, heavy tails, sparsity and compressibility
Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibility
Hoileong Lee
Fadhel Ayed
Paul Jung
Juho Lee
Hongseok Yang
François Caron
102
10
0
17 May 2022
Heavy-Tail Phenomenon in Decentralized SGD
Heavy-Tail Phenomenon in Decentralized SGD
Mert Gurbuzbalaban
Yuanhan Hu
Umut Simsekli
Kun Yuan
Lingjiong Zhu
98
9
0
13 May 2022
Nonlinear gradient mappings and stochastic optimization: A general
  framework with applications to heavy-tail noise
Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise
D. Jakovetić
Dragana Bajović
Anit Kumar Sahu
S. Kar
Nemanja Milošević
Dusan Stamenkovic
57
14
0
06 Apr 2022
A Local Convergence Theory for the Stochastic Gradient Descent Method in
  Non-Convex Optimization With Non-isolated Local Minima
A Local Convergence Theory for the Stochastic Gradient Descent Method in Non-Convex Optimization With Non-isolated Local Minima
Tae-Eon Ko
Xiantao Li
55
2
0
21 Mar 2022
Anticorrelated Noise Injection for Improved Generalization
Anticorrelated Noise Injection for Improved Generalization
Antonio Orvieto
Hans Kersting
F. Proske
Francis R. Bach
Aurelien Lucchi
116
48
0
06 Feb 2022
Impact of classification difficulty on the weight matrices spectra in
  Deep Learning and application to early-stopping
Impact of classification difficulty on the weight matrices spectra in Deep Learning and application to early-stopping
Xuran Meng
Jianfeng Yao
91
7
0
26 Nov 2021
Intrinsic Dimension, Persistent Homology and Generalization in Neural
  Networks
Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks
Tolga Birdal
Aaron Lou
Leonidas Guibas
Umut cSimcsekli
73
65
0
25 Nov 2021
A Unified and Refined Convergence Analysis for Non-Convex Decentralized
  Learning
A Unified and Refined Convergence Analysis for Non-Convex Decentralized Learning
Sulaiman A. Alghunaim
Kun Yuan
85
63
0
19 Oct 2021
SGD with a Constant Large Learning Rate Can Converge to Local Maxima
SGD with a Constant Large Learning Rate Can Converge to Local Maxima
Liu Ziyin
Botao Li
James B. Simon
Masakuni Ueda
88
9
0
25 Jul 2021
On the Sample Complexity and Metastability of Heavy-tailed Policy Search
  in Continuous Control
On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control
Amrit Singh Bedi
Anjaly Parayil
Junyu Zhang
Mengdi Wang
Alec Koppel
88
15
0
15 Jun 2021
Fractal Structure and Generalization Properties of Stochastic
  Optimization Algorithms
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms
A. Camuto
George Deligiannidis
Murat A. Erdogdu
Mert Gurbuzbalaban
Umut cSimcsekli
Lingjiong Zhu
77
29
0
09 Jun 2021
Heavy Tails in SGD and Compressibility of Overparametrized Neural
  Networks
Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks
Melih Barsbey
Romain Chor
Murat A. Erdogdu
Gaël Richard
Umut Simsekli
66
41
0
07 Jun 2021
Characterization of Generalizability of Spike Timing Dependent
  Plasticity trained Spiking Neural Networks
Characterization of Generalizability of Spike Timing Dependent Plasticity trained Spiking Neural Networks
Biswadeep Chakraborty
Saibal Mukhopadhyay
125
15
0
31 May 2021
A Fully Spiking Hybrid Neural Network for Energy-Efficient Object
  Detection
A Fully Spiking Hybrid Neural Network for Energy-Efficient Object Detection
Biswadeep Chakraborty
Xueyuan She
Saibal Mukhopadhyay
98
51
0
21 Apr 2021
Hessian Eigenspectra of More Realistic Nonlinear Models
Hessian Eigenspectra of More Realistic Nonlinear Models
Zhenyu Liao
Michael W. Mahoney
95
31
0
02 Mar 2021
Convergence Rates of Stochastic Gradient Descent under Infinite Noise
  Variance
Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance
Hongjian Wang
Mert Gurbuzbalaban
Lingjiong Zhu
Umut cSimcsekli
Murat A. Erdogdu
83
42
0
20 Feb 2021
Convergence of stochastic gradient descent schemes for
  Lojasiewicz-landscapes
Convergence of stochastic gradient descent schemes for Lojasiewicz-landscapes
Steffen Dereich
Sebastian Kassing
108
27
0
16 Feb 2021
Bayesian Neural Network Priors Revisited
Bayesian Neural Network Priors Revisited
Vincent Fortuin
Adrià Garriga-Alonso
Sebastian W. Ober
F. Wenzel
Gunnar Rätsch
Richard Turner
Mark van der Wilk
Laurence Aitchison
BDLUQCV
133
141
0
12 Feb 2021
SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize
  Criticality
SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality
Courtney Paquette
Kiwon Lee
Fabian Pedregosa
Elliot Paquette
59
35
0
08 Feb 2021
Robust, Accurate Stochastic Optimization for Variational Inference
Robust, Accurate Stochastic Optimization for Variational Inference
Akash Kumar Dhaka
Alejandro Catalina
Michael Riis Andersen
Maans Magnusson
Jonathan H. Huggins
Aki Vehtari
71
34
0
01 Sep 2020
Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks
Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks
Umut Simsekli
Ozan Sener
George Deligiannidis
Murat A. Erdogdu
86
56
0
16 Jun 2020
Sharp Concentration Results for Heavy-Tailed Distributions
Sharp Concentration Results for Heavy-Tailed Distributions
Milad Bakhshizadeh
A. Maleki
Víctor Pena
73
23
0
30 Mar 2020
Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum
  under Heavy-Tailed Gradient Noise
Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise
Umut Simsekli
Lingjiong Zhu
Yee Whye Teh
Mert Gurbuzbalaban
82
50
0
13 Feb 2020
Implicit Self-Regularization in Deep Neural Networks: Evidence from
  Random Matrix Theory and Implications for Learning
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Charles H. Martin
Michael W. Mahoney
AI4CE
134
201
0
02 Oct 2018
1