Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2109.14119
Cited By
Stochastic Training is Not Necessary for Generalization
29 September 2021
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Stochastic Training is Not Necessary for Generalization"
20 / 20 papers shown
Title
Gradient Descent as a Shrinkage Operator for Spectral Bias
Simon Lucey
33
0
0
25 Apr 2025
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
43
1
0
26 Aug 2024
Just How Flexible are Neural Networks in Practice?
Ravid Shwartz-Ziv
Micah Goldblum
Arpit Bansal
C. B. Bruss
Yann LeCun
Andrew Gordon Wilson
16
4
0
17 Jun 2024
Enhancing Domain Adaptation through Prompt Gradient Alignment
Hoang Phan
Lam C. Tran
Quyen Tran
Trung Le
45
0
0
13 Jun 2024
Neural Redshift: Random Networks are not Random Functions
Damien Teney
A. Nicolicioiu
Valentin Hartmann
Ehsan Abbasnejad
86
18
0
04 Mar 2024
Why Does Little Robustness Help? Understanding and Improving Adversarial Transferability from Surrogate Training
Yechao Zhang
Shengshan Hu
Leo Yu Zhang
Junyu Shi
Minghui Li
Xiaogeng Liu
Wei Wan
Hai Jin
AAML
22
20
0
15 Jul 2023
DropCompute: simple and more robust distributed synchronous training via compute variance reduction
Niv Giladi
Shahar Gottlieb
Moran Shkolnik
A. Karnieli
Ron Banner
Elad Hoffer
Kfir Y. Levy
Daniel Soudry
13
2
0
18 Jun 2023
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances
Marcel Kühn
B. Rosenow
6
3
0
08 Jun 2023
Do deep neural networks have an inbuilt Occam's razor?
Chris Mingard
Henry Rees
Guillermo Valle Pérez
A. Louis
UQCV
BDL
16
15
0
13 Apr 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
10
7
0
19 Feb 2023
On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin
8
6
0
03 Feb 2023
Disentangling the Mechanisms Behind Implicit Regularization in SGD
Zachary Novack
Simran Kaur
Tanya Marwah
Saurabh Garg
Zachary Chase Lipton
FedML
17
2
0
29 Nov 2022
How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization
Jonas Geiping
Micah Goldblum
Gowthami Somepalli
Ravid Shwartz-Ziv
Tom Goldstein
A. Wilson
8
34
0
12 Oct 2022
Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD
Konstantinos E. Nikolakakis
Farzin Haddadpour
Amin Karbasi
Dionysios S. Kalogerias
14
17
0
26 Apr 2022
Differentially Private Learning Needs Hidden State (Or Much Faster Convergence)
Jiayuan Ye
Reza Shokri
FedML
20
44
0
10 Mar 2022
Subspace Adversarial Training
Tao Li
Yingwen Wu
Sizhe Chen
Kun Fang
Xiaolin Huang
AAML
OOD
28
56
0
24 Nov 2021
Logit Attenuating Weight Normalization
Aman Gupta
R. Ramanath
Jun Shi
Anika Ramachandran
Sirou Zhou
Mingzhou Zhou
S. Keerthi
19
1
0
12 Aug 2021
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
146
198
0
04 Mar 2020
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
204
1,275
0
04 Dec 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
273
2,696
0
15 Sep 2016
1