ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.03495
  4. Cited By
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient
  Descent Exponentially Favors Flat Minima
v1v2v3v4v5v6v7v8v9v10v11v12v13v14 (latest)

A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima

10 February 2020
Zeke Xie
Issei Sato
Masashi Sugiyama
    ODL
ArXiv (abs)PDFHTML

Papers citing "A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima"

12 / 12 papers shown
Power-law Dynamic arising from machine learning
Power-law Dynamic arising from machine learning
Wei Chen
Weitao Du
Zhi-Ming Ma
Qi Meng
217
0
0
16 Jun 2023
An SDE for Modeling SAM: Theory and Insights
An SDE for Modeling SAM: Theory and InsightsInternational Conference on Machine Learning (ICML), 2023
Enea Monzio Compagnoni
Luca Biggio
Antonio Orvieto
F. Proske
Hans Kersting
Aurelien Lucchi
326
24
0
19 Jan 2023
Trajectory-dependent Generalization Bounds for Deep Neural Networks via
  Fractional Brownian Motion
Trajectory-dependent Generalization Bounds for Deep Neural Networks via Fractional Brownian Motion
Chengli Tan
Jiang Zhang
Junmin Liu
237
1
0
09 Jun 2022
Optimizing Information-theoretical Generalization Bounds via Anisotropic
  Noise in SGLD
Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD
Bohan Wang
Huishuai Zhang
Jieyu Zhang
Qi Meng
Wei Chen
Tie-Yan Liu
138
1
0
26 Oct 2021
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
457
118
0
13 Oct 2021
AdaL: Adaptive Gradient Transformation Contributes to Convergences and
  Generalizations
AdaL: Adaptive Gradient Transformation Contributes to Convergences and Generalizations
Hongwei Zhang
Weidong Zou
Hongbo Zhao
Qi Ming
Tijin Yan
Yuanqing Xia
Weipeng Cao
ODL
120
0
0
04 Jul 2021
Combining resampling and reweighting for faithful stochastic
  optimization
Combining resampling and reweighting for faithful stochastic optimizationCommunications in Mathematical Sciences (CMS), 2021
Jing An
Lexing Ying
219
1
0
31 May 2021
Partial local entropy and anisotropy in deep weight spaces
Partial local entropy and anisotropy in deep weight spacesPhysical Review E (PRE), 2020
Daniele Musso
292
3
0
17 Jul 2020
Dynamic of Stochastic Gradient Descent with State-Dependent Noise
Dynamic of Stochastic Gradient Descent with State-Dependent Noise
Qi Meng
Shiqi Gong
Wei Chen
Zhi-Ming Ma
Tie-Yan Liu
318
16
0
24 Jun 2020
Understanding the Role of Training Regimes in Continual Learning
Understanding the Role of Training Regimes in Continual LearningNeural Information Processing Systems (NeurIPS), 2020
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Razvan Pascanu
H. Ghasemzadeh
CLL
279
266
0
12 Jun 2020
The large learning rate phase of deep learning: the catapult mechanism
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
614
272
0
04 Mar 2020
Where is the Information in a Deep Neural Network?
Where is the Information in a Deep Neural Network?
Alessandro Achille
Giovanni Paolini
Stefano Soatto
509
92
0
29 May 2019
1
Page 1 of 1