Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2007.04532
Cited By
A Study of Gradient Variance in Deep Learning
9 July 2020
Fartash Faghri
David Duvenaud
David J. Fleet
Jimmy Ba
FedML
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Study of Gradient Variance in Deep Learning"
22 / 22 papers shown
Layer-Aware Influence for Online Data Valuation Estimation
Ziao Yang
Longbo Huang
Hongfu Liu
TDI
312
0
0
14 Oct 2025
Insights from Gradient Dynamics: Gradient Autoscaled Normalization
Vincent-Daniel Yun
268
0
0
03 Sep 2025
FedDuA: Doubly Adaptive Federated Learning
Shokichi Takakura
Seng Pei Liew
Satoshi Hasegawa
FedML
336
0
0
16 May 2025
Data value estimation on private gradients
Zijian Zhou
Xinyi Xu
Daniela Rus
Bryan Kian Hsiang Low
397
1
0
22 Dec 2024
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers
Neural Information Processing Systems (NeurIPS), 2024
Gavia Gray
Aman Tiwari
Shane Bergsma
Joel Hestness
412
5
0
01 Nov 2024
Stable Language Model Pre-training by Reducing Embedding Variability
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Woojin Chung
Jiwoo Hong
Na Min An
James Thorne
Se-Young Yun
217
6
0
12 Sep 2024
Multiple Importance Sampling for Stochastic Gradient Estimation
Corentin Salaün
Xingchang Huang
Iliyan Georgiev
Niloy J. Mitra
Gurprit Singh
278
2
0
22 Jul 2024
On the Limitations of Compute Thresholds as a Governance Strategy
Sara Hooker
508
30
0
08 Jul 2024
Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning
E. Chimoto
Jay Gala
Orevaoghene Ahia
Julia Kreutzer
Bruce A. Bassett
Sara Hooker
VLM
457
6
0
29 May 2024
Grad Queue : A probabilistic framework to reinforce sparse gradients
Irfan Mohammad Al Hasib
260
0
0
25 Apr 2024
Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation
Yongkang He
Mingjin Chen
Zhi-Yi Yang
Yongyi Lu
256
4
0
02 Aug 2023
An Experimental Study of Byzantine-Robust Aggregation Schemes in Federated Learning
IEEE Transactions on Big Data (IEEE Trans. Big Data), 2023
Shenghui Li
Edith C.H. Ngai
Thiemo Voigt
FedML
AAML
306
101
0
14 Feb 2023
Low-Variance Forward Gradients using Direct Feedback Alignment and Momentum
Neural Networks (NN), 2022
Florian Bacho
Dominique F. Chu
364
10
0
14 Dec 2022
Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics
International Conference on Learning Representations (ICLR), 2022
Shoaib Ahmed Siddiqui
Nitarshan Rajkumar
Tegan Maharaj
David M. Krueger
Sara Hooker
305
35
0
20 Sep 2022
On the Interpretability of Regularisation for Neural Networks Through Model Gradient Similarity
Neural Information Processing Systems (NeurIPS), 2022
Vincent Szolnoky
Viktor Andersson
Balázs Kulcsár
Rebecka Jörnsten
175
6
0
25 May 2022
MSTGD:A Memory Stochastic sTratified Gradient Descent Method with an Exponential Convergence Rate
Aixiang Chen
Chen
Jinting Zhang
Zanbo Zhang
Zhihong Li
244
0
0
21 Feb 2022
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications
Ziqiao Wang
Yongyi Mao
FedML
MLT
341
32
0
07 Oct 2021
Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization
International Conference on Machine Learning (ICML), 2021
Alexandre Ramé
Corentin Dancette
Matthieu Cord
OOD
479
259
0
07 Sep 2021
A Tale Of Two Long Tails
Daniel D'souza
Zach Nussbaum
Chirag Agarwal
Sara Hooker
201
25
0
27 Jul 2021
Rethinking Adam: A Twofold Exponential Moving Average Approach
Yizhou Wang
Yue Kang
Can Qin
Huan Wang
Yi Xu
Yulun Zhang
Y. Fu
ODL
242
8
0
22 Jun 2021
Cockpit: A Practical Debugging Tool for the Training of Deep Neural Networks
Neural Information Processing Systems (NeurIPS), 2021
Frank Schneider
Felix Dangel
Philipp Hennig
273
13
0
12 Feb 2021
Estimating Example Difficulty Using Variance of Gradients
Computer Vision and Pattern Recognition (CVPR), 2020
Chirag Agarwal
Daniel D'souza
Sara Hooker
747
130
0
26 Aug 2020
1
Page 1 of 1