ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.07712
  4. Cited By
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees
v1v2 (latest)

Loss Gradient Gaussian Width based Generalization and Optimization Guarantees

11 June 2024
A. Banerjee
Qiaobo Li
Yingxue Zhou
ArXiv (abs)PDFHTMLGithub

Papers citing "Loss Gradient Gaussian Width based Generalization and Optimization Guarantees"

50 / 54 papers shown
Sharpness-Aware Minimization Leads to Low-Rank Features
Sharpness-Aware Minimization Leads to Low-Rank FeaturesNeural Information Processing Systems (NeurIPS), 2023
Maksym Andriushchenko
Dara Bahri
H. Mobahi
Nicolas Flammarion
AAML
454
39
0
25 May 2023
Restricted Strong Convexity of Deep Learning Models with Smooth
  Activations
Restricted Strong Convexity of Deep Learning Models with Smooth ActivationsInternational Conference on Learning Representations (ICLR), 2022
A. Banerjee
Pedro Cisneros-Velarde
Libin Zhu
M. Belkin
348
11
0
29 Sep 2022
Thinking Outside the Ball: Optimal Learning with Gradient Descent for
  Generalized Linear Stochastic Convex Optimization
Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex OptimizationNeural Information Processing Systems (NeurIPS), 2022
I Zaghloul Amir
Roi Livni
Nathan Srebro
328
7
0
27 Feb 2022
On the Power-Law Hessian Spectrums in Deep Learning
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
248
12
0
31 Jan 2022
The Risks of Invariant Risk Minimization
The Risks of Invariant Risk Minimization
Elan Rosenfeld
Pradeep Ravikumar
Andrej Risteski
OOD
543
352
0
12 Oct 2020
Dissecting Hessian: Understanding Common Structure of Hessian in Neural
  Networks
Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks
Yikai Wu
Xingyu Zhu
Chenwei Wu
Annie Wang
Rong Ge
640
54
0
08 Oct 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization
Sharpness-Aware Minimization for Efficiently Improving GeneralizationInternational Conference on Learning Representations (ICLR), 2020
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
984
1,843
0
03 Oct 2020
On the linearity of large non-linear models: when and why the tangent
  kernel is constant
On the linearity of large non-linear models: when and why the tangent kernel is constantNeural Information Processing Systems (NeurIPS), 2020
Chaoyue Liu
Libin Zhu
M. Belkin
636
167
0
02 Oct 2020
FetchSGD: Communication-Efficient Federated Learning with Sketching
FetchSGD: Communication-Efficient Federated Learning with SketchingInternational Conference on Machine Learning (ICML), 2020
D. Rothchild
Ashwinee Panda
Enayat Ullah
Nikita Ivkin
Ion Stoica
Vladimir Braverman
Joseph E. Gonzalez
Raman Arora
FedML
358
424
0
15 Jul 2020
Loss landscapes and optimization in over-parameterized non-linear
  systems and neural networks
Loss landscapes and optimization in over-parameterized non-linear systems and neural networksApplied and Computational Harmonic Analysis (ACHA), 2020
Chaoyue Liu
Libin Zhu
M. Belkin
ODL
418
327
0
29 Feb 2020
Closing the convergence gap of SGD without replacement
Closing the convergence gap of SGD without replacementInternational Conference on Machine Learning (ICML), 2020
Shashank Rajput
Anant Gupta
Dimitris Papailiopoulos
838
72
0
24 Feb 2020
PyHessian: Neural Networks Through the Lens of the Hessian
PyHessian: Neural Networks Through the Lens of the Hessian
Z. Yao
A. Gholami
Kurt Keutzer
Michael W. Mahoney
ODL
490
367
0
16 Dec 2019
In Defense of Uniform Convergence: Generalization via derandomization
  with an application to interpolating predictors
In Defense of Uniform Convergence: Generalization via derandomization with an application to interpolating predictorsInternational Conference on Machine Learning (ICML), 2019
Jeffrey Negrea
Gintare Karolina Dziugaite
Daniel M. Roy
AI4CE
358
66
0
09 Dec 2019
A Rademacher Complexity Based Method fo rControlling Power and
  Confidence Level in Adaptive Statistical Analysis
A Rademacher Complexity Based Method fo rControlling Power and Confidence Level in Adaptive Statistical AnalysisInternational Conference on Data Science and Advanced Analytics (DSAA), 2019
L. Stefani
E. Upfal
255
9
0
04 Oct 2019
A New Analysis of Differential Privacy's Generalization Guarantees
A New Analysis of Differential Privacy's Generalization GuaranteesInformation Technology Convergence and Services (ITCS), 2019
Christopher Jung
Katrina Ligett
Seth Neel
Aaron Roth
Saeed Sharifi-Malvajerdi
Moshe Shenfeld
FedML
337
54
0
09 Sep 2019
How Good is SGD with Random Shuffling?
How Good is SGD with Random Shuffling?Annual Conference Computational Learning Theory (COLT), 2019
Itay Safran
Ohad Shamir
730
94
0
31 Jul 2019
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
Hessian based analysis of SGD for Deep Nets: Dynamics and GeneralizationSDM (SDM), 2019
Xinyan Li
Qilong Gu
Yingxue Zhou
Tiancong Chen
A. Banerjee
ODL
290
57
0
24 Jul 2019
Kernel and Rich Regimes in Overparametrized ModelsAnnual Conference Computational Learning Theory (COLT), 2019
Blake E. Woodworth
Suriya Gunasekar
Pedro H. P. Savarese
E. Moroshko
Itay Golan
Jason D. Lee
Daniel Soudry
Nathan Srebro
546
407
0
13 Jun 2019
On Exact Computation with an Infinitely Wide Neural Net
On Exact Computation with an Infinitely Wide Neural Net
Sanjeev Arora
S. Du
Wei Hu
Zhiyuan Li
Ruslan Salakhutdinov
Ruosong Wang
828
1,023
0
26 Apr 2019
On the Convergence of Adam and Beyond
On the Convergence of Adam and Beyond
Sashank J. Reddi
Satyen Kale
Surinder Kumar
1.3K
2,864
0
19 Apr 2019
Communication-efficient distributed SGD with Sketching
Communication-efficient distributed SGD with Sketching
Nikita Ivkin
D. Rothchild
Enayat Ullah
Vladimir Braverman
Ion Stoica
R. Arora
FedML
382
226
0
12 Mar 2019
Uniform convergence may be unable to explain generalization in deep
  learning
Uniform convergence may be unable to explain generalization in deep learningNeural Information Processing Systems (NeurIPS), 2019
Vaishnavh Nagarajan
J. Zico Kolter
MoMeAI4CE
612
351
0
13 Feb 2019
An Investigation into Neural Net Optimization via Hessian Eigenvalue
  Density
An Investigation into Neural Net Optimization via Hessian Eigenvalue Density
Behrooz Ghorbani
Shankar Krishnan
Ying Xiao
ODL
510
407
0
29 Jan 2019
Fine-Grained Analysis of Optimization and Generalization for
  Overparameterized Two-Layer Neural Networks
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
Sanjeev Arora
S. Du
Wei Hu
Zhiyuan Li
Ruosong Wang
MLT
864
1,050
0
24 Jan 2019
Measurements of Three-Level Hierarchical Structure in the Outliers in
  the Spectrum of Deepnet Hessians
Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians
Vardan Papyan
239
92
0
24 Jan 2019
Gradient Descent Happens in a Tiny Subspace
Gradient Descent Happens in a Tiny Subspace
Guy Gur-Ari
Daniel A. Roberts
Ethan Dyer
387
278
0
12 Dec 2018
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU
  Networks
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
Difan Zou
Yuan Cao
Dongruo Zhou
Quanquan Gu
ODL
621
450
0
21 Nov 2018
A Convergence Theory for Deep Learning via Over-Parameterization
A Convergence Theory for Deep Learning via Over-ParameterizationInternational Conference on Machine Learning (ICML), 2018
Zeyuan Allen-Zhu
Yuanzhi Li
Zhao Song
AI4CEODL
1.8K
1,593
0
09 Nov 2018
Gradient Descent Finds Global Minima of Deep Neural Networks
Gradient Descent Finds Global Minima of Deep Neural NetworksInternational Conference on Machine Learning (ICML), 2018
S. Du
Jason D. Lee
Haochuan Li
Liwei Wang
Masayoshi Tomizuka
ODL
1.5K
1,224
0
09 Nov 2018
Uniform Convergence of Gradients for Non-Convex Learning and
  Optimization
Uniform Convergence of Gradients for Non-Convex Learning and Optimization
Dylan J. Foster
Ayush Sekhari
Karthik Sridharan
341
78
0
25 Oct 2018
Graphical Convergence of Subgradients in Nonconvex Optimization and
  Learning
Graphical Convergence of Subgradients in Nonconvex Optimization and Learning
Damek Davis
Dmitriy Drusvyatskiy
208
29
0
17 Oct 2018
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLTODL
885
1,358
0
04 Oct 2018
Random Shuffling Beats SGD after Finite Epochs
Random Shuffling Beats SGD after Finite EpochsInternational Conference on Machine Learning (ICML), 2018
Jeff Z. HaoChen
S. Sra
290
108
0
26 Jun 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
3.5K
3,892
0
20 Jun 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in
  Modern Over-parametrized Learning
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
415
323
0
18 Dec 2017
Size-Independent Sample Complexity of Neural Networks
Size-Independent Sample Complexity of Neural Networks
Noah Golowich
Alexander Rakhlin
Ohad Shamir
661
618
0
18 Dec 2017
Spectrally-normalized margin bounds for neural networks
Spectrally-normalized margin bounds for neural networks
Peter L. Bartlett
Dylan J. Foster
Matus Telgarsky
ODL
989
1,404
0
26 Jun 2017
Understanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
958
5,031
0
10 Nov 2016
The Landscape of Empirical Risk for Non-convex Losses
The Landscape of Empirical Risk for Non-convex Losses
Song Mei
Yu Bai
Andrea Montanari
462
326
0
22 Jul 2016
Gaussian Error Linear Units (GELUs)
Gaussian Error Linear Units (GELUs)
Dan Hendrycks
Kevin Gimpel
1.7K
6,642
0
27 Jun 2016
Optimization Methods for Large-Scale Machine Learning
Optimization Methods for Large-Scale Machine Learning
Léon Bottou
Frank E. Curtis
J. Nocedal
1.1K
3,746
0
15 Jun 2016
Bounds for Vector-Valued Function Estimation
Bounds for Vector-Valued Function Estimation
Andreas Maurer
Massimiliano Pontil
193
7
0
05 Jun 2016
A vector-contraction inequality for Rademacher complexities
A vector-contraction inequality for Rademacher complexities
Andreas Maurer
345
295
0
01 May 2016
Train faster, generalize better: Stability of stochastic gradient
  descent
Train faster, generalize better: Stability of stochastic gradient descent
Moritz Hardt
Benjamin Recht
Y. Singer
560
1,400
0
03 Sep 2015
Generalization in Adaptive Data Analysis and Holdout Reuse
Generalization in Adaptive Data Analysis and Holdout ReuseNeural Information Processing Systems (NeurIPS), 2015
Cynthia Dwork
Vitaly Feldman
Moritz Hardt
T. Pitassi
Omer Reingold
Aaron Roth
311
252
0
08 Jun 2015
Delving Deep into Rectifiers: Surpassing Human-Level Performance on
  ImageNet Classification
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
VLM
1.4K
20,336
0
06 Feb 2015
An Introduction to Matrix Concentration Inequalities
An Introduction to Matrix Concentration Inequalities
J. Tropp
864
1,275
0
07 Jan 2015
Preserving Statistical Validity in Adaptive Data Analysis
Preserving Statistical Validity in Adaptive Data AnalysisSymposium on the Theory of Computing (STOC), 2014
Cynthia Dwork
Vitaly Feldman
Moritz Hardt
T. Pitassi
Omer Reingold
Aaron Roth
425
405
0
10 Nov 2014
Interactive Fingerprinting Codes and the Hardness of Preventing False
  Discovery
Interactive Fingerprinting Codes and the Hardness of Preventing False DiscoveryInformation Theory and Applications Workshop (ITA), 2014
Thomas Steinke
Jonathan R. Ullman
284
115
0
05 Oct 2014
Differentially Private Empirical Risk Minimization: Efficient Algorithms
  and Tight Error Bounds
Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds
Raef Bassily
Adam D. Smith
Abhradeep Thakurta
FedML
493
362
0
27 May 2014
12
Next
Page 1 of 2