v1v2 (latest)

The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size

16 November 2018

Papers citing "The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size"

25 / 25 papers shown

Title
The Persistence of Neural Collapse Despite Low-Rank Bias: An Analytic Perspective Through Unconstrained Features Connall Garrod Jonathan P. Keating 65 4 0 30 Oct 2024
Exact Gauss-Newton Optimization for Training Deep Neural Networks Mikalai Korbit Adeyemi Damilare Adeoye Alberto Bemporad Mario Zanon ODL 58 1 0 23 May 2024
Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model Connall Garrod Jonathan P. Keating 109 9 0 09 Apr 2024
Symmetric Neural-Collapse Representations with Supervised Contrastive Loss: The Impact of ReLU and Batching Ganesh Ramachandra Kini V. Vakilian Tina Behnia Jaidev Gill Christos Thrampoulidis 72 2 0 13 Jun 2023
Complexity from Adaptive-Symmetries Breaking: Global Minima in the Statistical Mechanics of Deep Neural Networks Shaun Li AI4CE 75 0 0 03 Jan 2022
MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning Siladittya Manna Umapada Pal Saumik Bhattacharya SSL 123 1 0 24 Nov 2021
Recent advances in deep learning theory Fengxiang He Dacheng Tao AI4CE 130 51 0 20 Dec 2020
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization Adepu Ravi Sankar Yash Khasbage Rahul Vigneswaran V. Balasubramanian 89 44 0 07 Dec 2020
Traces of Class/Cross-Class Structure Pervade Deep Learning Spectra Vardan Papyan 64 80 0 27 Aug 2020
Prevalence of Neural Collapse during the terminal phase of deep learning training Vardan Papyan Xuemei Han D. Donoho 252 582 0 18 Aug 2020
Exploring Weight Importance and Hessian Bias in Model Pruning Mingchen Li Yahya Sattar Christos Thrampoulidis Samet Oymak 71 4 0 19 Jun 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training Diego Granziol S. Zohren Stephen J. Roberts ODL 148 50 0 16 Jun 2020
Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of DNNs Lei Huang Jie Qin Li Liu Fan Zhu Ling Shao AI4CE 86 11 0 25 Feb 2020
The Geometry of Sign Gradient Descent Lukas Balles Fabian Pedregosa Nicolas Le Roux ODL 88 27 0 19 Feb 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 668 4,937 0 23 Jan 2020
Deep Curvature Suite Diego Granziol Xingchen Wan T. Garipov 3DV 50 12 0 20 Dec 2019
On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks Umut Simsekli Mert Gurbuzbalaban T. H. Nguyen G. Richard Levent Sagun 88 59 0 29 Nov 2019
Geometry of learning neural quantum states Chae-Yeun Park M. Kastoryano 72 63 0 24 Oct 2019
GradVis: Visualization and Second Order Analysis of Optimization Surfaces during the Training of Deep Neural Networks Avraam Chatzimichailidis Franz-Josef Pfreundt N. Gauger J. Keuper 52 10 0 26 Sep 2019
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization Xinyan Li Qilong Gu Yingxue Zhou Tiancong Chen A. Banerjee ODL 88 52 0 24 Jul 2019
First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise T. H. Nguyen Umut Simsekli Mert Gurbuzbalaban G. Richard 79 65 0 21 Jun 2019
Negative eigenvalues of the Hessian in deep neural networks Guillaume Alain Nicolas Le Roux Pierre-Antoine Manzagol 71 44 0 06 Feb 2019
An Investigation into Neural Net Optimization via Hessian Eigenvalue Density Behrooz Ghorbani Shankar Krishnan Ying Xiao ODL 113 326 0 29 Jan 2019
Ambitious Data Science Can Be Painless Hatef Monajemi Riccardo Murri Eric Jonas Percy Liang V. Stodden D. Donoho 135 13 0 25 Jan 2019
Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians Vardan Papyan 82 88 0 24 Jan 2019