A Study of Gradient Variance in Deep Learning

9 July 2020

Fartash Faghri

David Duvenaud

David J. Fleet

Jimmy Ba

FedML

ODL

ArXiv (abs)PDF HTML

Papers citing "A Study of Gradient Variance in Deep Learning"

22 / 22 papers shown

Layer-Aware Influence for Online Data Valuation Estimation

312

14 Oct 2025

Insights from Gradient Dynamics: Gradient Autoscaled Normalization

Vincent-Daniel Yun

268

03 Sep 2025

FedDuA: Doubly Adaptive Federated Learning

336

16 May 2025

Data value estimation on private gradients

Zijian Zhou

Xinyi Xu

Daniela Rus

Bryan Kian Hsiang Low

397

22 Dec 2024

Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in TransformersNeural Information Processing Systems (NeurIPS), 2024

412

01 Nov 2024

Stable Language Model Pre-training by Reducing Embedding VariabilityConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

James Thorne

217

12 Sep 2024

Multiple Importance Sampling for Stochastic Gradient Estimation

Niloy J. Mitra

278

22 Jul 2024

On the Limitations of Compute Thresholds as a Governance Strategy

Sara Hooker

508

08 Jul 2024

Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning

457

29 May 2024

Grad Queue : A probabilistic framework to reinforce sparse gradients

Irfan Mohammad Al Hasib

260

25 Apr 2024

Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation

256

02 Aug 2023

An Experimental Study of Byzantine-Robust Aggregation Schemes in Federated LearningIEEE Transactions on Big Data (IEEE Trans. Big Data), 2023

306

101

14 Feb 2023

Low-Variance Forward Gradients using Direct Feedback Alignment and MomentumNeural Networks (NN), 2022

Florian Bacho

Dominique F. Chu

364

14 Dec 2022

Metadata Archaeology: Unearthing Data Subsets by Leveraging Training DynamicsInternational Conference on Learning Representations (ICLR), 2022

Shoaib Ahmed Siddiqui

305

20 Sep 2022

On the Interpretability of Regularisation for Neural Networks Through Model Gradient SimilarityNeural Information Processing Systems (NeurIPS), 2022

175

25 May 2022

MSTGD:A Memory Stochastic sTratified Gradient Descent Method with an Exponential Convergence Rate

244

21 Feb 2022

On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications

Ziqiao Wang

Yongyi Mao

FedML MLT

341

07 Oct 2021

Fishr: Invariant Gradient Variances for Out-of-Distribution GeneralizationInternational Conference on Machine Learning (ICML), 2021

479

259

07 Sep 2021

A Tale Of Two Long Tails

201

27 Jul 2021

Rethinking Adam: A Twofold Exponential Moving Average Approach

Huan Wang

242

22 Jun 2021

Cockpit: A Practical Debugging Tool for the Training of Deep Neural NetworksNeural Information Processing Systems (NeurIPS), 2021

Frank Schneider

Felix Dangel

Philipp Hennig

273

12 Feb 2021

Estimating Example Difficulty Using Variance of GradientsComputer Vision and Pattern Recognition (CVPR), 2020

Chirag Agarwal

Daniel D'souza

Sara Hooker

747

130

26 Aug 2020