Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2406.07712
Cited By
v1
v2 (latest)
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees
11 June 2024
A. Banerjee
Qiaobo Li
Yingxue Zhou
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"Loss Gradient Gaussian Width based Generalization and Optimization Guarantees"
50 / 54 papers shown
Sharpness-Aware Minimization Leads to Low-Rank Features
Neural Information Processing Systems (NeurIPS), 2023
Maksym Andriushchenko
Dara Bahri
H. Mobahi
Nicolas Flammarion
AAML
454
39
0
25 May 2023
Restricted Strong Convexity of Deep Learning Models with Smooth Activations
International Conference on Learning Representations (ICLR), 2022
A. Banerjee
Pedro Cisneros-Velarde
Libin Zhu
M. Belkin
348
11
0
29 Sep 2022
Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization
Neural Information Processing Systems (NeurIPS), 2022
I Zaghloul Amir
Roi Livni
Nathan Srebro
328
7
0
27 Feb 2022
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
248
12
0
31 Jan 2022
The Risks of Invariant Risk Minimization
Elan Rosenfeld
Pradeep Ravikumar
Andrej Risteski
OOD
543
352
0
12 Oct 2020
Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks
Yikai Wu
Xingyu Zhu
Chenwei Wu
Annie Wang
Rong Ge
640
54
0
08 Oct 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization
International Conference on Learning Representations (ICLR), 2020
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
984
1,843
0
03 Oct 2020
On the linearity of large non-linear models: when and why the tangent kernel is constant
Neural Information Processing Systems (NeurIPS), 2020
Chaoyue Liu
Libin Zhu
M. Belkin
636
167
0
02 Oct 2020
FetchSGD: Communication-Efficient Federated Learning with Sketching
International Conference on Machine Learning (ICML), 2020
D. Rothchild
Ashwinee Panda
Enayat Ullah
Nikita Ivkin
Ion Stoica
Vladimir Braverman
Joseph E. Gonzalez
Raman Arora
FedML
358
424
0
15 Jul 2020
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
Applied and Computational Harmonic Analysis (ACHA), 2020
Chaoyue Liu
Libin Zhu
M. Belkin
ODL
418
327
0
29 Feb 2020
Closing the convergence gap of SGD without replacement
International Conference on Machine Learning (ICML), 2020
Shashank Rajput
Anant Gupta
Dimitris Papailiopoulos
838
72
0
24 Feb 2020
PyHessian: Neural Networks Through the Lens of the Hessian
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
ODL
490
367
0
16 Dec 2019
In Defense of Uniform Convergence: Generalization via derandomization with an application to interpolating predictors
International Conference on Machine Learning (ICML), 2019
Jeffrey Negrea
Gintare Karolina Dziugaite
Daniel M. Roy
AI4CE
358
66
0
09 Dec 2019
A Rademacher Complexity Based Method fo rControlling Power and Confidence Level in Adaptive Statistical Analysis
International Conference on Data Science and Advanced Analytics (DSAA), 2019
L. Stefani
E. Upfal
255
9
0
04 Oct 2019
A New Analysis of Differential Privacy's Generalization Guarantees
Information Technology Convergence and Services (ITCS), 2019
Christopher Jung
Katrina Ligett
Seth Neel
Aaron Roth
Saeed Sharifi-Malvajerdi
Moshe Shenfeld
FedML
337
54
0
09 Sep 2019
How Good is SGD with Random Shuffling?
Annual Conference Computational Learning Theory (COLT), 2019
Itay Safran
Ohad Shamir
730
94
0
31 Jul 2019
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
SDM (SDM), 2019
Xinyan Li
Qilong Gu
Yingxue Zhou
Tiancong Chen
A. Banerjee
ODL
290
57
0
24 Jul 2019
Kernel and Rich Regimes in Overparametrized Models
Annual Conference Computational Learning Theory (COLT), 2019
Blake E. Woodworth
Suriya Gunasekar
Pedro H. P. Savarese
E. Moroshko
Itay Golan
Jason D. Lee
Daniel Soudry
Nathan Srebro
546
407
0
13 Jun 2019
On Exact Computation with an Infinitely Wide Neural Net
Sanjeev Arora
S. Du
Wei Hu
Zhiyuan Li
Ruslan Salakhutdinov
Ruosong Wang
828
1,023
0
26 Apr 2019
On the Convergence of Adam and Beyond
Sashank J. Reddi
Satyen Kale
Surinder Kumar
1.3K
2,864
0
19 Apr 2019
Communication-efficient distributed SGD with Sketching
Nikita Ivkin
D. Rothchild
Enayat Ullah
Vladimir Braverman
Ion Stoica
R. Arora
FedML
382
226
0
12 Mar 2019
Uniform convergence may be unable to explain generalization in deep learning
Neural Information Processing Systems (NeurIPS), 2019
Vaishnavh Nagarajan
J. Zico Kolter
MoMe
AI4CE
612
351
0
13 Feb 2019
An Investigation into Neural Net Optimization via Hessian Eigenvalue Density
Behrooz Ghorbani
Shankar Krishnan
Ying Xiao
ODL
510
407
0
29 Jan 2019
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
Sanjeev Arora
S. Du
Wei Hu
Zhiyuan Li
Ruosong Wang
MLT
864
1,050
0
24 Jan 2019
Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians
Vardan Papyan
239
92
0
24 Jan 2019
Gradient Descent Happens in a Tiny Subspace
Guy Gur-Ari
Daniel A. Roberts
Ethan Dyer
387
278
0
12 Dec 2018
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
Difan Zou
Yuan Cao
Dongruo Zhou
Quanquan Gu
ODL
621
450
0
21 Nov 2018
A Convergence Theory for Deep Learning via Over-Parameterization
International Conference on Machine Learning (ICML), 2018
Zeyuan Allen-Zhu
Yuanzhi Li
Zhao Song
AI4CE
ODL
1.8K
1,593
0
09 Nov 2018
Gradient Descent Finds Global Minima of Deep Neural Networks
International Conference on Machine Learning (ICML), 2018
S. Du
Jason D. Lee
Haochuan Li
Liwei Wang
Masayoshi Tomizuka
ODL
1.5K
1,224
0
09 Nov 2018
Uniform Convergence of Gradients for Non-Convex Learning and Optimization
Dylan J. Foster
Ayush Sekhari
Karthik Sridharan
341
78
0
25 Oct 2018
Graphical Convergence of Subgradients in Nonconvex Optimization and Learning
Damek Davis
Dmitriy Drusvyatskiy
208
29
0
17 Oct 2018
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLT
ODL
885
1,358
0
04 Oct 2018
Random Shuffling Beats SGD after Finite Epochs
International Conference on Machine Learning (ICML), 2018
Jeff Z. HaoChen
S. Sra
290
108
0
26 Jun 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
3.5K
3,892
0
20 Jun 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
415
323
0
18 Dec 2017
Size-Independent Sample Complexity of Neural Networks
Noah Golowich
Alexander Rakhlin
Ohad Shamir
661
618
0
18 Dec 2017
Spectrally-normalized margin bounds for neural networks
Peter L. Bartlett
Dylan J. Foster
Matus Telgarsky
ODL
989
1,404
0
26 Jun 2017
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
958
5,031
0
10 Nov 2016
The Landscape of Empirical Risk for Non-convex Losses
Song Mei
Yu Bai
Andrea Montanari
462
326
0
22 Jul 2016
Gaussian Error Linear Units (GELUs)
Dan Hendrycks
Kevin Gimpel
1.7K
6,642
0
27 Jun 2016
Optimization Methods for Large-Scale Machine Learning
Léon Bottou
Frank E. Curtis
J. Nocedal
1.1K
3,746
0
15 Jun 2016
Bounds for Vector-Valued Function Estimation
Andreas Maurer
Massimiliano Pontil
193
7
0
05 Jun 2016
A vector-contraction inequality for Rademacher complexities
Andreas Maurer
345
295
0
01 May 2016
Train faster, generalize better: Stability of stochastic gradient descent
Moritz Hardt
Benjamin Recht
Y. Singer
560
1,400
0
03 Sep 2015
Generalization in Adaptive Data Analysis and Holdout Reuse
Neural Information Processing Systems (NeurIPS), 2015
Cynthia Dwork
Vitaly Feldman
Moritz Hardt
T. Pitassi
Omer Reingold
Aaron Roth
311
252
0
08 Jun 2015
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
VLM
1.4K
20,336
0
06 Feb 2015
An Introduction to Matrix Concentration Inequalities
J. Tropp
864
1,275
0
07 Jan 2015
Preserving Statistical Validity in Adaptive Data Analysis
Symposium on the Theory of Computing (STOC), 2014
Cynthia Dwork
Vitaly Feldman
Moritz Hardt
T. Pitassi
Omer Reingold
Aaron Roth
425
405
0
10 Nov 2014
Interactive Fingerprinting Codes and the Hardness of Preventing False Discovery
Information Theory and Applications Workshop (ITA), 2014
Thomas Steinke
Jonathan R. Ullman
284
115
0
05 Oct 2014
Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds
Raef Bassily
Adam D. Smith
Abhradeep Thakurta
FedML
493
362
0
27 May 2014
1
2
Next
Page 1 of 2