ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.04532
  4. Cited By
A Study of Gradient Variance in Deep Learning

A Study of Gradient Variance in Deep Learning

9 July 2020
Fartash Faghri
David Duvenaud
David J. Fleet
Jimmy Ba
    FedMLODL
ArXiv (abs)PDFHTML

Papers citing "A Study of Gradient Variance in Deep Learning"

22 / 22 papers shown
Layer-Aware Influence for Online Data Valuation Estimation
Layer-Aware Influence for Online Data Valuation Estimation
Ziao Yang
Longbo Huang
Hongfu Liu
TDI
312
0
0
14 Oct 2025
Insights from Gradient Dynamics: Gradient Autoscaled Normalization
Insights from Gradient Dynamics: Gradient Autoscaled Normalization
Vincent-Daniel Yun
268
0
0
03 Sep 2025
FedDuA: Doubly Adaptive Federated Learning
FedDuA: Doubly Adaptive Federated Learning
Shokichi Takakura
Seng Pei Liew
Satoshi Hasegawa
FedML
336
0
0
16 May 2025
Data value estimation on private gradients
Data value estimation on private gradients
Zijian Zhou
Xinyi Xu
Daniela Rus
Bryan Kian Hsiang Low
397
1
0
22 Dec 2024
Normalization Layer Per-Example Gradients are Sufficient to Predict
  Gradient Noise Scale in Transformers
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in TransformersNeural Information Processing Systems (NeurIPS), 2024
Gavia Gray
Aman Tiwari
Shane Bergsma
Joel Hestness
412
5
0
01 Nov 2024
Stable Language Model Pre-training by Reducing Embedding Variability
Stable Language Model Pre-training by Reducing Embedding VariabilityConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Woojin Chung
Jiwoo Hong
Na Min An
James Thorne
Se-Young Yun
217
6
0
12 Sep 2024
Multiple Importance Sampling for Stochastic Gradient Estimation
Multiple Importance Sampling for Stochastic Gradient Estimation
Corentin Salaün
Xingchang Huang
Iliyan Georgiev
Niloy J. Mitra
Gurprit Singh
278
2
0
22 Jul 2024
On the Limitations of Compute Thresholds as a Governance Strategy
On the Limitations of Compute Thresholds as a Governance Strategy
Sara Hooker
508
30
0
08 Jul 2024
Critical Learning Periods: Leveraging Early Training Dynamics for
  Efficient Data Pruning
Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning
E. Chimoto
Jay Gala
Orevaoghene Ahia
Julia Kreutzer
Bruce A. Bassett
Sara Hooker
VLM
457
6
0
29 May 2024
Grad Queue : A probabilistic framework to reinforce sparse gradients
Grad Queue : A probabilistic framework to reinforce sparse gradients
Irfan Mohammad Al Hasib
260
0
0
25 Apr 2024
Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical
  Image Segmentation
Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation
Yongkang He
Mingjin Chen
Zhi-Yi Yang
Yongyi Lu
256
4
0
02 Aug 2023
An Experimental Study of Byzantine-Robust Aggregation Schemes in
  Federated Learning
An Experimental Study of Byzantine-Robust Aggregation Schemes in Federated LearningIEEE Transactions on Big Data (IEEE Trans. Big Data), 2023
Shenghui Li
Edith C.H. Ngai
Thiemo Voigt
FedMLAAML
306
101
0
14 Feb 2023
Low-Variance Forward Gradients using Direct Feedback Alignment and
  Momentum
Low-Variance Forward Gradients using Direct Feedback Alignment and MomentumNeural Networks (NN), 2022
Florian Bacho
Dominique F. Chu
364
10
0
14 Dec 2022
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training
  Dynamics
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training DynamicsInternational Conference on Learning Representations (ICLR), 2022
Shoaib Ahmed Siddiqui
Nitarshan Rajkumar
Tegan Maharaj
David M. Krueger
Sara Hooker
305
35
0
20 Sep 2022
On the Interpretability of Regularisation for Neural Networks Through
  Model Gradient Similarity
On the Interpretability of Regularisation for Neural Networks Through Model Gradient SimilarityNeural Information Processing Systems (NeurIPS), 2022
Vincent Szolnoky
Viktor Andersson
Balázs Kulcsár
Rebecka Jörnsten
175
6
0
25 May 2022
MSTGD:A Memory Stochastic sTratified Gradient Descent Method with an
  Exponential Convergence Rate
MSTGD:A Memory Stochastic sTratified Gradient Descent Method with an Exponential Convergence Rate
Aixiang Chen
Chen
Jinting Zhang
Zanbo Zhang
Zhihong Li
244
0
0
21 Feb 2022
On the Generalization of Models Trained with SGD: Information-Theoretic
  Bounds and Implications
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications
Ziqiao Wang
Yongyi Mao
FedMLMLT
341
32
0
07 Oct 2021
Fishr: Invariant Gradient Variances for Out-of-Distribution
  Generalization
Fishr: Invariant Gradient Variances for Out-of-Distribution GeneralizationInternational Conference on Machine Learning (ICML), 2021
Alexandre Ramé
Corentin Dancette
Matthieu Cord
OOD
479
259
0
07 Sep 2021
A Tale Of Two Long Tails
A Tale Of Two Long Tails
Daniel D'souza
Zach Nussbaum
Chirag Agarwal
Sara Hooker
201
25
0
27 Jul 2021
Rethinking Adam: A Twofold Exponential Moving Average Approach
Rethinking Adam: A Twofold Exponential Moving Average Approach
Yizhou Wang
Yue Kang
Can Qin
Huan Wang
Yi Xu
Yulun Zhang
Y. Fu
ODL
242
8
0
22 Jun 2021
Cockpit: A Practical Debugging Tool for the Training of Deep Neural
  Networks
Cockpit: A Practical Debugging Tool for the Training of Deep Neural NetworksNeural Information Processing Systems (NeurIPS), 2021
Frank Schneider
Felix Dangel
Philipp Hennig
273
13
0
12 Feb 2021
Estimating Example Difficulty Using Variance of Gradients
Estimating Example Difficulty Using Variance of GradientsComputer Vision and Pattern Recognition (CVPR), 2020
Chirag Agarwal
Daniel D'souza
Sara Hooker
747
130
0
26 Aug 2020
1
Page 1 of 1