ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.07476
  4. Cited By
Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

22 November 2016
Levent Sagun
Léon Bottou
Yann LeCun
    UQCV
ArXivPDFHTML

Papers citing "Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond"

49 / 49 papers shown
Title
Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home
Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home
Viktor Moskvoretskii
M. Lysyuk
Mikhail Salnikov
Nikolay Ivanov
Sergey Pletenev
Daria Galimzianova
Nikita Krayko
Vasily Konovalov
Irina Nikishina
Alexander Panchenko
RALM
74
4
0
24 Feb 2025
High-dimensional manifold of solutions in neural networks: insights from statistical physics
High-dimensional manifold of solutions in neural networks: insights from statistical physics
Enrico M. Malatesta
46
4
0
20 Feb 2025
Position: Curvature Matrices Should Be Democratized via Linear Operators
Position: Curvature Matrices Should Be Democratized via Linear Operators
Felix Dangel
Runa Eschenhagen
Weronika Ormaniec
Andres Fernandez
Lukas Tatzel
Agustinus Kristiadi
58
3
0
31 Jan 2025
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Rémy Hosseinkhan Boucher
Onofrio Semeraro
L. Mathelin
79
0
0
28 Jan 2025
FOCUS: First Order Concentrated Updating Scheme
FOCUS: First Order Concentrated Updating Scheme
Yizhou Liu
Ziming Liu
Jeff Gore
ODL
108
1
0
21 Jan 2025
Sketched Adaptive Federated Deep Learning: A Sharp Convergence Analysis
Sketched Adaptive Federated Deep Learning: A Sharp Convergence Analysis
Zhijie Chen
Qiaobo Li
A. Banerjee
FedML
35
0
0
11 Nov 2024
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Jim Zhao
Sidak Pal Singh
Aurelien Lucchi
AI4CE
45
0
0
04 Nov 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
33
6
0
14 Oct 2024
Nesterov acceleration in benignly non-convex landscapes
Nesterov acceleration in benignly non-convex landscapes
Kanan Gupta
Stephan Wojtowytsch
36
2
0
10 Oct 2024
Unraveling the Hessian: A Key to Smooth Convergence in Loss Function
  Landscapes
Unraveling the Hessian: A Key to Smooth Convergence in Loss Function Landscapes
Nikita Kiselev
Andrey Grabovoy
54
1
0
18 Sep 2024
Does SGD really happen in tiny subspaces?
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
71
4
1
25 May 2024
Dynamic Anisotropic Smoothing for Noisy Derivative-Free Optimization
Dynamic Anisotropic Smoothing for Noisy Derivative-Free Optimization
S. Reifenstein
T. Leleu
Yoshihisa Yamamoto
46
1
0
02 May 2024
Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer
Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer
Yanjun Zhao
Sizhe Dang
Haishan Ye
Guang Dai
Yi Qian
Ivor W.Tsang
66
8
0
23 Feb 2024
Spectral alignment of stochastic gradient descent for high-dimensional classification tasks
Spectral alignment of stochastic gradient descent for high-dimensional classification tasks
Gerard Ben Arous
Reza Gheissari
Jiaoyang Huang
Aukosh Jagannath
27
14
0
04 Oct 2023
Accelerating Distributed ML Training via Selective Synchronization
Accelerating Distributed ML Training via Selective Synchronization
S. Tyagi
Martin Swany
FedML
32
3
0
16 Jul 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model
  Pre-training
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Percy Liang
Tengyu Ma
VLM
52
130
0
23 May 2023
GraVAC: Adaptive Compression for Communication-Efficient Distributed DL
  Training
GraVAC: Adaptive Compression for Communication-Efficient Distributed DL Training
S. Tyagi
Martin Swany
25
4
0
20 May 2023
Sketchy: Memory-efficient Adaptive Regularization with Frequent
  Directions
Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions
Vladimir Feinberg
Xinyi Chen
Y. Jennifer Sun
Rohan Anil
Elad Hazan
29
12
0
07 Feb 2023
ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients
ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients
Guihong Li
Yuedong Yang
Kartikeya Bhardwaj
R. Marculescu
36
60
0
26 Jan 2023
On the Overlooked Structure of Stochastic Gradients
On the Overlooked Structure of Stochastic Gradients
Zeke Xie
Qian-Yuan Tang
Mingming Sun
P. Li
28
6
0
05 Dec 2022
Noise Injection as a Probe of Deep Learning Dynamics
Noise Injection as a Probe of Deep Learning Dynamics
Noam Levi
I. Bloch
M. Freytsis
T. Volansky
40
2
0
24 Oct 2022
Precision Machine Learning
Precision Machine Learning
Eric J. Michaud
Ziming Liu
Max Tegmark
24
34
0
24 Oct 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge
  of Stability
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
Z. Li
Zixuan Wang
Jian Li
19
42
0
26 Jul 2022
Laplacian Autoencoders for Learning Stochastic Representations
Laplacian Autoencoders for Learning Stochastic Representations
M. Miani
Frederik Warburg
Pablo Moreno-Muñoz
Nicke Skafte Detlefsen
Søren Hauberg
UQCV
BDL
SSL
35
10
0
30 Jun 2022
Neural Collapse: A Review on Modelling Principles and Generalization
Neural Collapse: A Review on Modelling Principles and Generalization
Vignesh Kothapalli
21
71
0
08 Jun 2022
Recycling Model Updates in Federated Learning: Are Gradient Subspaces
  Low-Rank?
Recycling Model Updates in Federated Learning: Are Gradient Subspaces Low-Rank?
Sheikh Shams Azam
Seyyedali Hosseinalipour
Qiang Qiu
Christopher G. Brinton
FedML
20
20
0
01 Feb 2022
On the Power-Law Hessian Spectrums in Deep Learning
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
42
9
0
31 Jan 2022
Eigenvalues of Autoencoders in Training and at Initialization
Eigenvalues of Autoencoders in Training and at Initialization
Ben Dees
S. Agarwala
Corey Lowman
21
0
0
27 Jan 2022
Impact of classification difficulty on the weight matrices spectra in
  Deep Learning and application to early-stopping
Impact of classification difficulty on the weight matrices spectra in Deep Learning and application to early-stopping
Xuran Meng
Jianfeng Yao
19
7
0
26 Nov 2021
Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural
  Networks: A Tale of Symmetry II
Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks: A Tale of Symmetry II
Yossi Arjevani
M. Field
28
18
0
21 Jul 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
  and Anomalous Diffusion
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion
D. Kunin
Javier Sagastuy-Breña
Lauren Gillespie
Eshed Margalit
Hidenori Tanaka
Surya Ganguli
Daniel L. K. Yamins
31
15
0
19 Jul 2021
Deep Learning Through the Lens of Example Difficulty
Deep Learning Through the Lens of Example Difficulty
R. Baldock
Hartmut Maennel
Behnam Neyshabur
47
155
0
17 Jun 2021
Appearance of Random Matrix Theory in Deep Learning
Appearance of Random Matrix Theory in Deep Learning
Nicholas P. Baskerville
Diego Granziol
J. Keating
15
11
0
12 Feb 2021
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and
  its Applications to Regularization
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization
Adepu Ravi Sankar
Yash Khasbage
Rahul Vigneswaran
V. Balasubramanian
25
41
0
07 Dec 2020
A Random Matrix Theory Approach to Damping in Deep Learning
A Random Matrix Theory Approach to Damping in Deep Learning
Diego Granziol
Nicholas P. Baskerville
AI4CE
ODL
29
2
0
15 Nov 2020
PEP: Parameter Ensembling by Perturbation
PEP: Parameter Ensembling by Perturbation
Alireza Mehrtash
Purang Abolmaesumi
Polina Golland
Tina Kapur
Demian Wassermann
W. Wells
17
10
0
24 Oct 2020
An analytic theory of shallow networks dynamics for hinge loss
  classification
An analytic theory of shallow networks dynamics for hinge loss classification
Franco Pellegrini
Giulio Biroli
35
19
0
19 Jun 2020
Directional Pruning of Deep Neural Networks
Directional Pruning of Deep Neural Networks
Shih-Kang Chao
Zhanyu Wang
Yue Xing
Guang Cheng
ODL
13
33
0
16 Jun 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory
  Approach to Neural Network Training
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
Diego Granziol
S. Zohren
Stephen J. Roberts
ODL
37
48
0
16 Jun 2020
Understanding and mitigating gradient pathologies in physics-informed
  neural networks
Understanding and mitigating gradient pathologies in physics-informed neural networks
Sizhuang He
Yujun Teng
P. Perdikaris
AI4CE
PINN
21
291
0
13 Jan 2020
Geometry of learning neural quantum states
Geometry of learning neural quantum states
Chae-Yeun Park
M. Kastoryano
16
60
0
24 Oct 2019
Asymptotics of Wide Networks from Feynman Diagrams
Asymptotics of Wide Networks from Feynman Diagrams
Ethan Dyer
Guy Gur-Ari
24
113
0
25 Sep 2019
Weight-space symmetry in deep networks gives rise to permutation
  saddles, connected by equal-loss valleys across the loss landscape
Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape
Johanni Brea
Berfin Simsek
Bernd Illing
W. Gerstner
20
55
0
05 Jul 2019
Gradient Descent Happens in a Tiny Subspace
Gradient Descent Happens in a Tiny Subspace
Guy Gur-Ari
Daniel A. Roberts
Ethan Dyer
28
228
0
12 Dec 2018
Local Saddle Point Optimization: A Curvature Exploitation Approach
Local Saddle Point Optimization: A Curvature Exploitation Approach
Leonard Adolphs
Hadi Daneshmand
Aurelien Lucchi
Thomas Hofmann
26
107
0
15 May 2018
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Marco Baity-Jesi
Levent Sagun
Mario Geiger
S. Spigler
Gerard Ben Arous
C. Cammarota
Yann LeCun
M. Wyart
Giulio Biroli
AI4CE
31
113
0
19 Mar 2018
High Dimensional Spaces, Deep Learning and Adversarial Examples
High Dimensional Spaces, Deep Learning and Adversarial Examples
S. Dube
26
29
0
02 Jan 2018
The loss surface of deep and wide neural networks
The loss surface of deep and wide neural networks
Quynh N. Nguyen
Matthias Hein
ODL
31
282
0
26 Apr 2017
Sharp Minima Can Generalize For Deep Nets
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
46
755
0
15 Mar 2017
1