ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.15739
  4. Cited By
On the Periodic Behavior of Neural Network Training with Batch
  Normalization and Weight Decay
v1v2v3 (latest)

On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay

Neural Information Processing Systems (NeurIPS), 2021
29 June 2021
E. Lobacheva
M. Kodryan
Nadezhda Chirkova
A. Malinin
Dmitry Vetrov
ArXiv (abs)PDFHTML

Papers citing "On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay"

18 / 18 papers shown
Can Training Dynamics of Scale-Invariant Neural Networks Be Explained by the Thermodynamics of an Ideal Gas?
Can Training Dynamics of Scale-Invariant Neural Networks Be Explained by the Thermodynamics of an Ideal Gas?
Ildus Sadrtdinov
E. Lobacheva
Ivan Klimov
Mikhail I. Katsnelson
Dmitry Vetrov
AI4CE
246
0
0
10 Nov 2025
SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training
SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training
Ildus Sadrtdinov
Ivan Klimov
E. Lobacheva
Dmitry Vetrov
290
3
0
29 May 2025
Where Do Large Learning Rates Lead Us?
Where Do Large Learning Rates Lead Us?Neural Information Processing Systems (NeurIPS), 2024
Ildus Sadrtdinov
M. Kodryan
Eduard Pokonechny
E. Lobacheva
Dmitry Vetrov
AI4CE
378
6
0
29 Oct 2024
Normalization and effective learning rates in reinforcement learning
Normalization and effective learning rates in reinforcement learning
Clare Lyle
Zeyu Zheng
Khimya Khetarpal
James Martens
H. V. Hasselt
Razvan Pascanu
Will Dabney
394
43
0
01 Jul 2024
Future Directions in the Theory of Graph Machine Learning
Future Directions in the Theory of Graph Machine Learning
Christopher Morris
Fabrizio Frasca
Nadav Dym
Haggai Maron
.Ismail .Ilkan Ceylan
Ron Levie
Derek Lim
Michael M. Bronstein
Martin Grohe
Stefanie Jegelka
AI4CE
700
8
0
03 Feb 2024
Large Learning Rates Improve Generalization: But How Large Are We
  Talking About?
Large Learning Rates Improve Generalization: But How Large Are We Talking About?
E. Lobacheva
Eduard Pockonechnyy
M. Kodryan
Dmitry Vetrov
AI4CE
249
1
0
19 Nov 2023
From Stability to Chaos: Analyzing Gradient Descent Dynamics in
  Quadratic Regression
From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression
Xuxing Chen
Krishnakumar Balasubramanian
Promit Ghosal
Bhavya Agrawalla
291
12
0
02 Oct 2023
Exploring Weight Balancing on Long-Tailed Recognition Problem
Exploring Weight Balancing on Long-Tailed Recognition ProblemInternational Conference on Learning Representations (ICLR), 2023
Naoya Hasegawa
Issei Sato
768
10
0
26 May 2023
Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A
  Contemporary Survey
Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A Contemporary Survey
Yulong Wang
Tong Sun
Shenghong Li
Xinnan Yuan
W. Ni
Ekram Hossain
H. Vincent Poor
AAML
351
33
0
11 Mar 2023
On the Training Instability of Shuffling SGD with Batch Normalization
On the Training Instability of Shuffling SGD with Batch NormalizationInternational Conference on Machine Learning (ICML), 2023
David Wu
Chulhee Yun
S. Sra
516
6
0
24 Feb 2023
Batchless Normalization: How to Normalize Activations with just one
  Instance in Memory
Batchless Normalization: How to Normalize Activations with just one Instance in Memory
Benjamin Berger
BDL
97
0
0
30 Dec 2022
Scale-invariant Bayesian Neural Networks with Connectivity Tangent
  Kernel
Scale-invariant Bayesian Neural Networks with Connectivity Tangent KernelInternational Conference on Learning Representations (ICLR), 2022
Sungyub Kim
Si-hun Park
Kyungsu Kim
Eunho Yang
BDL
224
7
0
30 Sep 2022
Training Scale-Invariant Neural Networks on the Sphere Can Happen in
  Three Regimes
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three RegimesNeural Information Processing Systems (NeurIPS), 2022
M. Kodryan
E. Lobacheva
M. Nakhodnov
Dmitry Vetrov
372
19
0
08 Sep 2022
On the generalization of learning algorithms that do not converge
On the generalization of learning algorithms that do not convergeNeural Information Processing Systems (NeurIPS), 2022
N. Chandramoorthy
Andreas Loukas
Khashayar Gatmiry
Stefanie Jegelka
MLT
421
12
0
16 Aug 2022
Adapting the Linearised Laplace Model Evidence for Modern Deep Learning
Adapting the Linearised Laplace Model Evidence for Modern Deep LearningInternational Conference on Machine Learning (ICML), 2022
Javier Antorán
David Janz
J. Allingham
Erik A. Daxberger
Riccardo Barbano
Eric T. Nalisnick
José Miguel Hernández-Lobato
UQCVBDL
306
34
0
17 Jun 2022
Understanding the Generalization Benefit of Normalization Layers:
  Sharpness Reduction
Understanding the Generalization Benefit of Normalization Layers: Sharpness ReductionNeural Information Processing Systems (NeurIPS), 2022
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
474
92
0
14 Jun 2022
Robust Training of Neural Networks Using Scale Invariant Architectures
Robust Training of Neural Networks Using Scale Invariant ArchitecturesInternational Conference on Machine Learning (ICML), 2022
Zhiyuan Li
Srinadh Bhojanapalli
Manzil Zaheer
Sashank J. Reddi
Surinder Kumar
324
33
0
02 Feb 2022
Neural Network Weights Do Not Converge to Stationary Points: An
  Invariant Measure Perspective
Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure PerspectiveInternational Conference on Machine Learning (ICML), 2021
J.N. Zhang
Haochuan Li
S. Sra
Ali Jadbabaie
283
14
0
12 Oct 2021
1
Page 1 of 1