ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.03801
  4. Cited By
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and
  its Applications to Regularization
v1v2 (latest)

A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization

7 December 2020
Adepu Ravi Sankar
Yash Khasbage
Rahul Vigneswaran
V. Balasubramanian
ArXiv (abs)PDFHTML

Papers citing "A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization"

35 / 35 papers shown
Low-Rank Curvature for Zeroth-Order Optimization in LLM Fine-Tuning
Low-Rank Curvature for Zeroth-Order Optimization in LLM Fine-Tuning
Hyunseok Seung
Jaewoo Lee
Hyunsuk Ko
95
1
0
11 Nov 2025
Adam or Gauss-Newton? A Comparative Study In Terms of Basis Alignment and SGD Noise
Adam or Gauss-Newton? A Comparative Study In Terms of Basis Alignment and SGD Noise
Bingbin Liu
Rachit Bansal
Depen Morwani
Nikhil Vyas
David Alvarez-Melis
Sham Kakade
214
2
0
15 Oct 2025
AppForge: From Assistant to Independent Developer - Are GPTs Ready for Software Development?
AppForge: From Assistant to Independent Developer - Are GPTs Ready for Software Development?
Dezhi Ran
Yuan Cao
Mengzhou Wu
Simin Chen
Yuzhe Guo
...
Jialei Wei
Linyi Li
Wei Yang
Baishakhi Ray
Tao Xie
LLMAGALMELM
161
1
0
09 Oct 2025
Flatness-Aware Stochastic Gradient Langevin Dynamics
Flatness-Aware Stochastic Gradient Langevin Dynamics
Stefano Bruno
Youngsik Hwang
Jaehyeon An
Sotirios Sabanis
Dong-Young Lim
260
0
0
02 Oct 2025
Understanding SOAP from the Perspective of Gradient Whitening
Understanding SOAP from the Perspective of Gradient Whitening
Yanqing Lu
Letao Wang
Jinbo Liu
FAtt
199
1
0
26 Sep 2025
Information-Theoretic Framework for Understanding Modern Machine-Learning
Information-Theoretic Framework for Understanding Modern Machine-Learning
M. Feder
Ruediger Urbanke
Yaniv Fogel
270
0
0
09 Jun 2025
TRACE for Tracking the Emergence of Semantic Representations in Transformers
TRACE for Tracking the Emergence of Semantic Representations in Transformers
Nura Aljaafari
Danilo S. Carvalho
André Freitas
353
2
0
23 May 2025
HessFormer: Hessians at Foundation Scale
HessFormer: Hessians at Foundation Scale
Diego Granziol
511
1
0
16 May 2025
Towards Quantifying the Hessian Structure of Neural Networks
Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong
Yushun Zhang
Jianfeng Yao
Jianfeng Yao
372
5
0
05 May 2025
Connecting Parameter Magnitudes and Hessian Eigenspaces at Scale using Sketched Methods
Connecting Parameter Magnitudes and Hessian Eigenspaces at Scale using Sketched Methods
Andres Fernandez
Frank Schneider
Maren Mahsereci
Philipp Hennig
434
1
0
20 Apr 2025
Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective
Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective
Sizhuang He
Ananyae Kumar Bhartari
Bowen Li
P. Perdikaris
PINN
518
48
0
02 Feb 2025
A Hessian-informed hyperparameter optimization for differential learning rate
A Hessian-informed hyperparameter optimization for differential learning rate
Shiyun Xu
Zhiqi Bu
Yiliang Zhang
Ian Barnett
378
1
0
12 Jan 2025
Building a Multivariate Time Series Benchmarking Datasets Inspired by
  Natural Language Processing (NLP)
Building a Multivariate Time Series Benchmarking Datasets Inspired by Natural Language Processing (NLP)
Mohammad Asif Ibna Mustafa
Ferdinand Heinrich
AI4TS
384
0
0
14 Oct 2024
A New Perspective on Shampoo's Preconditioner
A New Perspective on Shampoo's Preconditioner
Depen Morwani
Itai Shapira
Nikhil Vyas
Eran Malach
Sham Kakade
Lucas Janson
386
39
0
25 Jun 2024
Adam-mini: Use Fewer Learning Rates To Gain More
Adam-mini: Use Fewer Learning Rates To Gain More
Yushun Zhang
Congliang Chen
Ziniu Li
Tian Ding
Chenwei Wu
Yinyu Ye
Zhi-Quan Luo
Tian Ding
554
109
0
24 Jun 2024
Exact Gauss-Newton Optimization for Training Deep Neural Networks
Exact Gauss-Newton Optimization for Training Deep Neural Networks
Mikalai Korbit
Adeyemi Damilare Adeoye
Alberto Bemporad
Mario Zanon
ODL
408
9
0
23 May 2024
Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks
Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks
Yichu Xu
Xin-Chun Li
Lan Li
De-Chuan Zhan
438
3
0
21 May 2024
Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model
Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model
Connall Garrod
Jonathan P. Keating
410
11
0
09 Apr 2024
Continual Learning with Weight Interpolation
Continual Learning with Weight Interpolation
Jkedrzej Kozal
Jan Wasilewski
Bartosz Krawczyk
Michal Wo'zniak
CLLMoMe
554
11
0
05 Apr 2024
Why Transformers Need Adam: A Hessian Perspective
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Tian Ding
Zhimin Luo
511
105
0
26 Feb 2024
Ginger: An Efficient Curvature Approximation with Linear Complexity for
  General Neural Networks
Ginger: An Efficient Curvature Approximation with Linear Complexity for General Neural Networks
Yongchang Hao
Yanshuai Cao
Lili Mou
ODL
184
1
0
05 Feb 2024
Neglected Hessian component explains mysteries in Sharpness
  regularization
Neglected Hessian component explains mysteries in Sharpness regularization
Yann N. Dauphin
Atish Agarwala
Hossein Mobahi
FAtt
448
13
0
19 Jan 2024
FAM: Relative Flatness Aware Minimization
FAM: Relative Flatness Aware Minimization
Linara Adilova
Amr Abourayya
Jianning Li
Amin Dada
Henning Petzka
Jan Egger
Jens Kleesiek
Michael Kamp
ODL
259
3
0
05 Jul 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model
  Pre-training
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-trainingInternational Conference on Learning Representations (ICLR), 2023
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Abigail Z. Jacobs
Tengyu Ma
VLM
723
259
0
23 May 2023
A Theory on Adam Instability in Large-Scale Machine Learning
A Theory on Adam Instability in Large-Scale Machine Learning
Igor Molybog
Peter Albert
Moya Chen
Zach DeVito
David Esiobu
...
Puxin Xu
Yuchen Zhang
Melanie Kambadur
Stephen Roller
Susan Zhang
AI4CE
251
51
0
19 Apr 2023
Sketchy: Memory-efficient Adaptive Regularization with Frequent
  Directions
Sketchy: Memory-efficient Adaptive Regularization with Frequent DirectionsNeural Information Processing Systems (NeurIPS), 2023
Vladimir Feinberg
Xinyi Chen
Y. Jennifer Sun
Rohan Anil
Elad Hazan
284
19
0
07 Feb 2023
Generalisation under gradient descent via deterministic PAC-Bayes
Generalisation under gradient descent via deterministic PAC-BayesInternational Conference on Algorithmic Learning Theory (ALT), 2022
Eugenio Clerico
Tyler Farghly
George Deligiannidis
Benjamin Guedj
Arnaud Doucet
501
6
0
06 Sep 2022
Regularizing Deep Neural Networks with Stochastic Estimators of Hessian
  Trace
Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace
Yucong Liu
Shixing Yu
Tong Lin
299
4
0
11 Aug 2022
Beyond accuracy: generalization properties of bio-plausible temporal
  credit assignment rules
Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rulesNeural Information Processing Systems (NeurIPS), 2022
Yuhan Helena Liu
Arna Ghosh
Blake A. Richards
E. Shea-Brown
Guillaume Lajoie
609
10
0
02 Jun 2022
TorchNTK: A Library for Calculation of Neural Tangent Kernels of PyTorch
  Models
TorchNTK: A Library for Calculation of Neural Tangent Kernels of PyTorch Models
A. Engel
Zhichao Wang
Anand D. Sarwate
Sutanay Choudhury
Tony Chiang
265
3
0
24 May 2022
Neuronal diversity can improve machine learning for physics and beyond
Neuronal diversity can improve machine learning for physics and beyondScientific Reports (Sci Rep), 2022
A. Choudhary
Anil Radhakrishnan
J. Lindner
S. Sinha
W. Ditto
AI4CE
244
4
0
09 Apr 2022
When Do Flat Minima Optimizers Work?
When Do Flat Minima Optimizers Work?Neural Information Processing Systems (NeurIPS), 2022
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
638
92
0
01 Feb 2022
On the Power-Law Hessian Spectrums in Deep Learning
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
239
11
0
31 Jan 2022
Hessian Eigenspectra of More Realistic Nonlinear Models
Hessian Eigenspectra of More Realistic Nonlinear ModelsNeural Information Processing Systems (NeurIPS), 2021
Zhenyu Liao
Michael W. Mahoney
431
41
0
02 Mar 2021
Shallow Univariate ReLu Networks as Splines: Initialization, Loss
  Surface, Hessian, & Gradient Flow Dynamics
Shallow Univariate ReLu Networks as Splines: Initialization, Loss Surface, Hessian, & Gradient Flow Dynamics
Justin Sahs
Ryan Pyle
Aneel Damaraju
J. O. Caro
Onur Tavaslioglu
Andy Lu
Ankit B. Patel
278
21
0
04 Aug 2020
1
Page 1 of 1