v1v2 (latest)

Loss Gradient Gaussian Width based Generalization and Optimization Guarantees

11 June 2024

A. Banerjee

Qiaobo Li

Yingxue Zhou

ArXiv (abs)PDF HTML Github

Papers citing "Loss Gradient Gaussian Width based Generalization and Optimization Guarantees"

50 / 54 papers shown

Sharpness-Aware Minimization Leads to Low-Rank FeaturesNeural Information Processing Systems (NeurIPS), 2023

Maksym Andriushchenko

454

25 May 2023

Restricted Strong Convexity of Deep Learning Models with Smooth ActivationsInternational Conference on Learning Representations (ICLR), 2022

A. Banerjee

Pedro Cisneros-Velarde

Libin Zhu

M. Belkin

348

29 Sep 2022

Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex OptimizationNeural Information Processing Systems (NeurIPS), 2022

I Zaghloul Amir

Roi Livni

Nathan Srebro

328

27 Feb 2022

On the Power-Law Hessian Spectrums in Deep Learning

Zeke Xie

248

31 Jan 2022

The Risks of Invariant Risk Minimization

Elan Rosenfeld

Pradeep Ravikumar

Andrej Risteski

OOD

543

352

12 Oct 2020

Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks

640

08 Oct 2020

Sharpness-Aware Minimization for Efficiently Improving GeneralizationInternational Conference on Learning Representations (ICLR), 2020

984

1,843

03 Oct 2020

On the linearity of large non-linear models: when and why the tangent kernel is constantNeural Information Processing Systems (NeurIPS), 2020

Chaoyue Liu

Libin Zhu

M. Belkin

636

167

02 Oct 2020

FetchSGD: Communication-Efficient Federated Learning with SketchingInternational Conference on Machine Learning (ICML), 2020

358

424

15 Jul 2020

Loss landscapes and optimization in over-parameterized non-linear systems and neural networksApplied and Computational Harmonic Analysis (ACHA), 2020

418

327

29 Feb 2020

Closing the convergence gap of SGD without replacementInternational Conference on Machine Learning (ICML), 2020

Shashank Rajput

Anant Gupta

Dimitris Papailiopoulos

838

24 Feb 2020

PyHessian: Neural Networks Through the Lens of the Hessian

490

367

16 Dec 2019

In Defense of Uniform Convergence: Generalization via derandomization with an application to interpolating predictorsInternational Conference on Machine Learning (ICML), 2019

Jeffrey Negrea

Gintare Karolina Dziugaite

Daniel M. Roy

AI4CE

358

09 Dec 2019

A Rademacher Complexity Based Method fo rControlling Power and Confidence Level in Adaptive Statistical AnalysisInternational Conference on Data Science and Advanced Analytics (DSAA), 2019

L. Stefani

E. Upfal

255

04 Oct 2019

A New Analysis of Differential Privacy's Generalization GuaranteesInformation Technology Convergence and Services (ITCS), 2019

Saeed Sharifi-Malvajerdi

Moshe Shenfeld

FedML

337

09 Sep 2019

How Good is SGD with Random Shuffling?Annual Conference Computational Learning Theory (COLT), 2019

Itay Safran

Ohad Shamir

730

31 Jul 2019

Hessian based analysis of SGD for Deep Nets: Dynamics and GeneralizationSDM (SDM), 2019

290

24 Jul 2019

Kernel and Rich Regimes in Overparametrized ModelsAnnual Conference Computational Learning Theory (COLT), 2019

546

407

13 Jun 2019

On Exact Computation with an Infinitely Wide Neural Net

828

1,023

26 Apr 2019

On the Convergence of Adam and Beyond

Sashank J. Reddi

Satyen Kale

Surinder Kumar

1.3K

2,864

19 Apr 2019

Communication-efficient distributed SGD with Sketching

382

226

12 Mar 2019

Uniform convergence may be unable to explain generalization in deep learningNeural Information Processing Systems (NeurIPS), 2019

Vaishnavh Nagarajan

J. Zico Kolter

MoMe AI4CE

612

351

13 Feb 2019

An Investigation into Neural Net Optimization via Hessian Eigenvalue Density

510

407

29 Jan 2019

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

864

1,050

24 Jan 2019

Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians

Vardan Papyan

239

24 Jan 2019

Gradient Descent Happens in a Tiny Subspace

Guy Gur-Ari

Daniel A. Roberts

Ethan Dyer

387

278

12 Dec 2018

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Quanquan Gu

621

450

21 Nov 2018

A Convergence Theory for Deep Learning via Over-ParameterizationInternational Conference on Machine Learning (ICML), 2018

1.8K

1,593

09 Nov 2018

Gradient Descent Finds Global Minima of Deep Neural NetworksInternational Conference on Machine Learning (ICML), 2018

1.5K

1,224

09 Nov 2018

Uniform Convergence of Gradients for Non-Convex Learning and Optimization

Dylan J. Foster

Ayush Sekhari

Karthik Sridharan

341

25 Oct 2018

Graphical Convergence of Subgradients in Nonconvex Optimization and Learning

Damek Davis

Dmitriy Drusvyatskiy

208

17 Oct 2018

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

Aarti Singh

885

1,358

04 Oct 2018

Random Shuffling Beats SGD after Finite EpochsInternational Conference on Machine Learning (ICML), 2018

Jeff Z. HaoChen

S. Sra

290

108

26 Jun 2018

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

Arthur Jacot

Franck Gabriel

Clément Hongler

3.5K

3,892

20 Jun 2018

The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning

Siyuan Ma

Raef Bassily

M. Belkin

415

323

18 Dec 2017

Size-Independent Sample Complexity of Neural Networks

Noah Golowich

Alexander Rakhlin

Ohad Shamir

661

618

18 Dec 2017

Spectrally-normalized margin bounds for neural networks

989

1,404

26 Jun 2017

Understanding deep learning requires rethinking generalization

Benjamin Recht

958

5,031

10 Nov 2016

The Landscape of Empirical Risk for Non-convex Losses

Song Mei

Yu Bai

Andrea Montanari

462

326

22 Jul 2016

Gaussian Error Linear Units (GELUs)

Dan Hendrycks

Kevin Gimpel

1.7K

6,642

27 Jun 2016

Optimization Methods for Large-Scale Machine Learning

Léon Bottou

Frank E. Curtis

J. Nocedal

1.1K

3,746

15 Jun 2016

Bounds for Vector-Valued Function Estimation

Andreas Maurer

Massimiliano Pontil

193

05 Jun 2016

A vector-contraction inequality for Rademacher complexities

Andreas Maurer

345

295

01 May 2016

Train faster, generalize better: Stability of stochastic gradient descent

Moritz Hardt

Benjamin Recht

Y. Singer

560

1,400

03 Sep 2015

Generalization in Adaptive Data Analysis and Holdout ReuseNeural Information Processing Systems (NeurIPS), 2015

311

252

08 Jun 2015

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

1.4K

20,336

06 Feb 2015

An Introduction to Matrix Concentration Inequalities

J. Tropp

864

1,275

07 Jan 2015

Preserving Statistical Validity in Adaptive Data AnalysisSymposium on the Theory of Computing (STOC), 2014

425

405

10 Nov 2014

Interactive Fingerprinting Codes and the Hardness of Preventing False DiscoveryInformation Theory and Applications Workshop (ITA), 2014

Thomas Steinke

Jonathan R. Ullman

284

115

05 Oct 2014

Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds

493

362

27 May 2014