Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1812.04754
Cited By
Gradient Descent Happens in a Tiny Subspace
12 December 2018
Guy Gur-Ari
Daniel A. Roberts
Ethan Dyer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gradient Descent Happens in a Tiny Subspace"
50 / 163 papers shown
Title
Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong
Yushun Zhang
Z. Luo
Jianfeng Yao
Ruoyu Sun
28
0
0
05 May 2025
ASGO: Adaptive Structured Gradient Optimization
Kang An
Yuxing Liu
Rui Pan
Shiqian Ma
D. Goldfarb
Tong Zhang
ODL
97
2
0
26 Mar 2025
Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning
Saber Malekmohammadi
Yaoliang Yu
Yang Cao
FedML
83
5
0
17 Feb 2025
SubTrack your Grad: Gradient Subspace Tracking for Memory and Time Efficient Full-Parameter LLM Training
Sahar Rajabi
Nayeema Nonta
Sirisha Rambhatla
90
0
0
03 Feb 2025
Position: Curvature Matrices Should Be Democratized via Linear Operators
Felix Dangel
Runa Eschenhagen
Weronika Ormaniec
Andres Fernandez
Lukas Tatzel
Agustinus Kristiadi
58
3
0
31 Jan 2025
FOCUS: First Order Concentrated Updating Scheme
Yizhou Liu
Ziming Liu
Jeff Gore
ODL
108
1
0
21 Jan 2025
Understanding Gradient Descent through the Training Jacobian
Nora Belrose
Adam Scherlis
72
1
0
09 Dec 2024
On Generalization Bounds for Neural Networks with Low Rank Layers
Andrea Pinto
Akshay Rangamani
T. Poggio
AI4CE
82
1
0
20 Nov 2024
The Persistence of Neural Collapse Despite Low-Rank Bias: An Analytic Perspective Through Unconstrained Features
Connall Garrod
Jonathan P. Keating
34
2
0
30 Oct 2024
Influential Language Data Selection via Gradient Trajectory Pursuit
Zhiwei Deng
Tao Li
Yang Li
26
1
0
22 Oct 2024
Debiasing Mini-Batch Quadratics for Applications in Deep Learning
Lukas Tatzel
Bálint Mucsányi
Osane Hackel
Philipp Hennig
43
0
0
18 Oct 2024
Building a Multivariate Time Series Benchmarking Datasets Inspired by Natural Language Processing (NLP)
Mohammad Asif Ibna Mustafa
Ferdinand Heinrich
AI4TS
22
0
0
14 Oct 2024
Parameter-Efficient Fine-Tuning of Large Language Models using Semantic Knowledge Tuning
Nusrat Jahan Prottasha
Asif Mahmud
Md. Shohanur Islam Sobuj
Prakash Bhat
Md. Kowsher
Niloofar Yousefi
O. Garibay
30
4
0
11 Oct 2024
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
Fabian Paischer
Lukas Hauzenberger
Thomas Schmied
Benedikt Alkin
Marc Peter Deisenroth
Sepp Hochreiter
29
4
0
09 Oct 2024
PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning
Qibin Wang
Xiaolin Hu
Weikai Xu
Wei Liu
Jian Luan
Bin Wang
28
1
0
25 Sep 2024
Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape
Tao Li
Zhengbao He
Yujun Li
Yasheng Wang
Lifeng Shang
X. Huang
53
0
0
22 Sep 2024
Communication-Efficient Federated Low-Rank Update Algorithm and its Connection to Implicit Regularization
Haemin Park
Diego Klabjan
FedML
32
0
0
19 Sep 2024
Propulsion: Steering LLM with Tiny Fine-Tuning
Md. Kowsher
Nusrat Jahan Prottasha
Prakash Bhat
38
4
0
17 Sep 2024
Memory-Efficient LLM Training with Online Subspace Descent
Kaizhao Liang
Bo Liu
Lizhang Chen
Qiang Liu
29
7
0
23 Aug 2024
LoRA-GA: Low-Rank Adaptation with Gradient Approximation
Shaowen Wang
Linxi Yu
Jian Li
ALM
AI4CE
26
27
0
06 Jul 2024
Adam-mini: Use Fewer Learning Rates To Gain More
Yushun Zhang
Congliang Chen
Ziniu Li
Tian Ding
Chenwei Wu
Yinyu Ye
Zhi-Quan Luo
Ruoyu Sun
36
36
0
24 Jun 2024
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees
A. Banerjee
Qiaobo Li
Yingxue Zhou
44
0
0
11 Jun 2024
Training on the Edge of Stability Is Caused by Layerwise Jacobian Alignment
Mark Lowell
Catharine A. Kastner
25
0
0
31 May 2024
Recurrent neural networks: vanishing and exploding gradients are not the end of the story
Nicolas Zucchet
Antonio Orvieto
ODL
AAML
40
9
0
31 May 2024
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
Roy Miles
Pradyumna Reddy
Ismail Elezi
Jiankang Deng
VLM
32
3
0
28 May 2024
Phase Transitions in the Output Distribution of Large Language Models
Julian Arnold
Flemming Holtorf
Frank Schafer
Niels Lörch
41
1
0
27 May 2024
LoQT: Low Rank Adapters for Quantized Training
Sebastian Loeschcke
M. Toftrup
M. Kastoryano
Serge J. Belongie
Vésteinn Snæbjarnarson
MQ
34
3
0
26 May 2024
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
66
4
1
25 May 2024
Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks
Xin-Chun Li
Lan Li
De-Chuan Zhan
33
2
0
21 May 2024
Differentially Private Federated Learning without Noise Addition: When is it Possible?
Jiang Zhang
Konstantinos Psounis
FedML
40
0
0
06 May 2024
Machine Unlearning via Null Space Calibration
Huiqiang Chen
Tianqing Zhu
Xin Yu
Wanlei Zhou
39
6
0
21 Apr 2024
Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model
Connall Garrod
Jonathan P. Keating
33
8
0
09 Apr 2024
Random Search as a Baseline for Sparse Neural Network Architecture Search
Rezsa Farahani
25
0
0
13 Mar 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao
Zhenyu (Allen) Zhang
Beidi Chen
Zhangyang Wang
A. Anandkumar
Yuandong Tian
43
173
0
06 Mar 2024
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Ruoyu Sun
Zhimin Luo
37
41
0
26 Feb 2024
Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features
Rodrigo Veiga
Anastasia Remizova
Nicolas Macris
34
0
0
12 Feb 2024
On Differentially Private Subspace Estimation in a Distribution-Free Setting
Eliad Tsfadia
23
1
0
09 Feb 2024
Deconstructing the Goldilocks Zone of Neural Network Initialization
Artem Vysogorets
Anna Dawid
Julia Kempe
38
1
0
05 Feb 2024
Identifying Policy Gradient Subspaces
Jan Schneider-Barnes
Pierre Schumacher
Simon Guist
Le Chen
D. Haeufle
Bernhard Scholkopf
Dieter Buchler
36
5
0
12 Jan 2024
Enhancing Neural Training via a Correlated Dynamics Model
Jonathan Brokman
Roy Betser
Rotem Turjeman
Tom Berkov
I. Cohen
Guy Gilboa
24
3
0
20 Dec 2023
PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance
Haichao Sha
Ruixuan Liu
Yi-xiao Liu
Hong Chen
52
1
0
06 Dec 2023
Directions of Curvature as an Explanation for Loss of Plasticity
Alex Lewandowski
Haruto Tanaka
Dale Schuurmans
Marlos C. Machado
11
5
0
30 Nov 2023
Low-Dimensional Gradient Helps Out-of-Distribution Detection
Yingwen Wu
Tao Li
Xinwen Cheng
Jie-jin Yang
Xiaolin Huang
OODD
49
3
0
26 Oct 2023
DPZero: Private Fine-Tuning of Language Models without Backpropagation
Liang Zhang
Bingcong Li
K. K. Thekumparampil
Sewoong Oh
Niao He
28
11
0
14 Oct 2023
Spectral alignment of stochastic gradient descent for high-dimensional classification tasks
Gerard Ben Arous
Reza Gheissari
Jiaoyang Huang
Aukosh Jagannath
27
14
0
04 Oct 2023
Towards guarantees for parameter isolation in continual learning
Giulia Lanzillotta
Sidak Pal Singh
Benjamin Grewe
Thomas Hofmann
27
0
0
02 Oct 2023
Separable Gaussian Neural Networks: Structure, Analysis, and Function Approximations
S. Xing
Jianqiao Sun
12
6
0
13 Aug 2023
Unveiling the Hessian's Connection to the Decision Boundary
Mahalakshmi Sabanayagam
Freya Behrens
Urte Adomaityte
Anna Dawid
20
5
0
12 Jun 2023
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances
Marcel Kühn
B. Rosenow
11
3
0
08 Jun 2023
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
Libin Zhu
Chaoyue Liu
Adityanarayanan Radhakrishnan
M. Belkin
30
13
0
07 Jun 2023
1
2
3
4
Next