v1v2 (latest)

A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization

7 December 2020

Papers citing "A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization"

35 / 35 papers shown

Low-Rank Curvature for Zeroth-Order Optimization in LLM Fine-Tuning

Hyunseok Seung

Jaewoo Lee

Hyunsuk Ko

11 Nov 2025

Adam or Gauss-Newton? A Comparative Study In Terms of Basis Alignment and SGD Noise

214

15 Oct 2025

AppForge: From Assistant to Independent Developer - Are GPTs Ready for Software Development?

...

161

09 Oct 2025

Flatness-Aware Stochastic Gradient Langevin Dynamics

260

02 Oct 2025

Understanding SOAP from the Perspective of Gradient Whitening

199

26 Sep 2025

Information-Theoretic Framework for Understanding Modern Machine-Learning

M. Feder

Ruediger Urbanke

Yaniv Fogel

270

09 Jun 2025

TRACE for Tracking the Emergence of Semantic Representations in Transformers

Nura Aljaafari

Danilo S. Carvalho

André Freitas

353

23 May 2025

HessFormer: Hessians at Foundation Scale

Diego Granziol

511

16 May 2025

Towards Quantifying the Hessian Structure of Neural Networks

Zhaorui Dong

Yushun Zhang

Jianfeng Yao

372

05 May 2025

Connecting Parameter Magnitudes and Hessian Eigenspaces at Scale using Sketched Methods

434

20 Apr 2025

Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective

Sizhuang He

Ananyae Kumar Bhartari

Bowen Li

P. Perdikaris

PINN

518

02 Feb 2025

A Hessian-informed hyperparameter optimization for differential learning rate

378

12 Jan 2025

Building a Multivariate Time Series Benchmarking Datasets Inspired by Natural Language Processing (NLP)

Mohammad Asif Ibna Mustafa

Ferdinand Heinrich

AI4TS

384

14 Oct 2024

A New Perspective on Shampoo's Preconditioner

386

25 Jun 2024

Adam-mini: Use Fewer Learning Rates To Gain More

Zhi-Quan Luo

554

109

24 Jun 2024

Exact Gauss-Newton Optimization for Training Deep Neural Networks

Mikalai Korbit

Adeyemi Damilare Adeoye

Alberto Bemporad

Mario Zanon

ODL

408

23 May 2024

Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks

438

21 May 2024

Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model

Connall Garrod

Jonathan P. Keating

410

09 Apr 2024

Continual Learning with Weight Interpolation

554

05 Apr 2024

Why Transformers Need Adam: A Hessian Perspective

Ziniu Li

511

105

26 Feb 2024

Ginger: An Efficient Curvature Approximation with Linear Complexity for General Neural Networks

184

05 Feb 2024

Neglected Hessian component explains mysteries in Sharpness regularization

448

19 Jan 2024

FAM: Relative Flatness Aware Minimization

259

05 Jul 2023

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-trainingInternational Conference on Learning Representations (ICLR), 2023

Hong Liu

Zhiyuan Li

David Leo Wright Hall

Abigail Z. Jacobs

Tengyu Ma

VLM

723

259

23 May 2023

A Theory on Adam Instability in Large-Scale Machine Learning

Igor Molybog

...

251

19 Apr 2023

Sketchy: Memory-efficient Adaptive Regularization with Frequent DirectionsNeural Information Processing Systems (NeurIPS), 2023

284

07 Feb 2023

Generalisation under gradient descent via deterministic PAC-BayesInternational Conference on Algorithmic Learning Theory (ALT), 2022

George Deligiannidis

501

06 Sep 2022

Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace

Yucong Liu

Shixing Yu

Tong Lin

299

11 Aug 2022

Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rulesNeural Information Processing Systems (NeurIPS), 2022

609

02 Jun 2022

TorchNTK: A Library for Calculation of Neural Tangent Kernels of PyTorch Models

265

24 May 2022

Neuronal diversity can improve machine learning for physics and beyondScientific Reports (Sci Rep), 2022

244

09 Apr 2022

When Do Flat Minima Optimizers Work?Neural Information Processing Systems (NeurIPS), 2022

638

01 Feb 2022

On the Power-Law Hessian Spectrums in Deep Learning

Zeke Xie

239

31 Jan 2022

Hessian Eigenspectra of More Realistic Nonlinear ModelsNeural Information Processing Systems (NeurIPS), 2021

Zhenyu Liao

Michael W. Mahoney

431

02 Mar 2021

Shallow Univariate ReLu Networks as Splines: Initialization, Loss Surface, Hessian, & Gradient Flow Dynamics

278

04 Aug 2020