ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.04754
  4. Cited By
Gradient Descent Happens in a Tiny Subspace

Gradient Descent Happens in a Tiny Subspace

12 December 2018
Guy Gur-Ari
Daniel A. Roberts
Ethan Dyer
ArXivPDFHTML

Papers citing "Gradient Descent Happens in a Tiny Subspace"

50 / 163 papers shown
Title
Towards Quantifying the Hessian Structure of Neural Networks
Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong
Yushun Zhang
Z. Luo
Jianfeng Yao
Ruoyu Sun
28
0
0
05 May 2025
ASGO: Adaptive Structured Gradient Optimization
ASGO: Adaptive Structured Gradient Optimization
Kang An
Yuxing Liu
Rui Pan
Shiqian Ma
D. Goldfarb
Tong Zhang
ODL
97
2
0
26 Mar 2025
Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning
Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning
Saber Malekmohammadi
Yaoliang Yu
Yang Cao
FedML
83
5
0
17 Feb 2025
SubTrack your Grad: Gradient Subspace Tracking for Memory and Time Efficient Full-Parameter LLM Training
SubTrack your Grad: Gradient Subspace Tracking for Memory and Time Efficient Full-Parameter LLM Training
Sahar Rajabi
Nayeema Nonta
Sirisha Rambhatla
90
0
0
03 Feb 2025
Position: Curvature Matrices Should Be Democratized via Linear Operators
Position: Curvature Matrices Should Be Democratized via Linear Operators
Felix Dangel
Runa Eschenhagen
Weronika Ormaniec
Andres Fernandez
Lukas Tatzel
Agustinus Kristiadi
58
3
0
31 Jan 2025
FOCUS: First Order Concentrated Updating Scheme
FOCUS: First Order Concentrated Updating Scheme
Yizhou Liu
Ziming Liu
Jeff Gore
ODL
108
1
0
21 Jan 2025
Understanding Gradient Descent through the Training Jacobian
Understanding Gradient Descent through the Training Jacobian
Nora Belrose
Adam Scherlis
72
1
0
09 Dec 2024
On Generalization Bounds for Neural Networks with Low Rank Layers
On Generalization Bounds for Neural Networks with Low Rank Layers
Andrea Pinto
Akshay Rangamani
T. Poggio
AI4CE
82
1
0
20 Nov 2024
The Persistence of Neural Collapse Despite Low-Rank Bias: An Analytic
  Perspective Through Unconstrained Features
The Persistence of Neural Collapse Despite Low-Rank Bias: An Analytic Perspective Through Unconstrained Features
Connall Garrod
Jonathan P. Keating
34
2
0
30 Oct 2024
Influential Language Data Selection via Gradient Trajectory Pursuit
Influential Language Data Selection via Gradient Trajectory Pursuit
Zhiwei Deng
Tao Li
Yang Li
26
1
0
22 Oct 2024
Debiasing Mini-Batch Quadratics for Applications in Deep Learning
Debiasing Mini-Batch Quadratics for Applications in Deep Learning
Lukas Tatzel
Bálint Mucsányi
Osane Hackel
Philipp Hennig
43
0
0
18 Oct 2024
Building a Multivariate Time Series Benchmarking Datasets Inspired by
  Natural Language Processing (NLP)
Building a Multivariate Time Series Benchmarking Datasets Inspired by Natural Language Processing (NLP)
Mohammad Asif Ibna Mustafa
Ferdinand Heinrich
AI4TS
22
0
0
14 Oct 2024
Parameter-Efficient Fine-Tuning of Large Language Models using Semantic
  Knowledge Tuning
Parameter-Efficient Fine-Tuning of Large Language Models using Semantic Knowledge Tuning
Nusrat Jahan Prottasha
Asif Mahmud
Md. Shohanur Islam Sobuj
Prakash Bhat
Md. Kowsher
Niloofar Yousefi
O. Garibay
30
4
0
11 Oct 2024
One Initialization to Rule them All: Fine-tuning via Explained Variance
  Adaptation
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
Fabian Paischer
Lukas Hauzenberger
Thomas Schmied
Benedikt Alkin
Marc Peter Deisenroth
Sepp Hochreiter
29
4
0
09 Oct 2024
PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning
PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning
Qibin Wang
Xiaolin Hu
Weikai Xu
Wei Liu
Jian Luan
Bin Wang
28
1
0
25 Sep 2024
Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape
Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape
Tao Li
Zhengbao He
Yujun Li
Yasheng Wang
Lifeng Shang
X. Huang
53
0
0
22 Sep 2024
Communication-Efficient Federated Low-Rank Update Algorithm and its
  Connection to Implicit Regularization
Communication-Efficient Federated Low-Rank Update Algorithm and its Connection to Implicit Regularization
Haemin Park
Diego Klabjan
FedML
32
0
0
19 Sep 2024
Propulsion: Steering LLM with Tiny Fine-Tuning
Propulsion: Steering LLM with Tiny Fine-Tuning
Md. Kowsher
Nusrat Jahan Prottasha
Prakash Bhat
38
4
0
17 Sep 2024
Memory-Efficient LLM Training with Online Subspace Descent
Memory-Efficient LLM Training with Online Subspace Descent
Kaizhao Liang
Bo Liu
Lizhang Chen
Qiang Liu
29
7
0
23 Aug 2024
LoRA-GA: Low-Rank Adaptation with Gradient Approximation
LoRA-GA: Low-Rank Adaptation with Gradient Approximation
Shaowen Wang
Linxi Yu
Jian Li
ALM
AI4CE
26
27
0
06 Jul 2024
Adam-mini: Use Fewer Learning Rates To Gain More
Adam-mini: Use Fewer Learning Rates To Gain More
Yushun Zhang
Congliang Chen
Ziniu Li
Tian Ding
Chenwei Wu
Yinyu Ye
Zhi-Quan Luo
Ruoyu Sun
36
36
0
24 Jun 2024
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees
A. Banerjee
Qiaobo Li
Yingxue Zhou
44
0
0
11 Jun 2024
Training on the Edge of Stability Is Caused by Layerwise Jacobian
  Alignment
Training on the Edge of Stability Is Caused by Layerwise Jacobian Alignment
Mark Lowell
Catharine A. Kastner
25
0
0
31 May 2024
Recurrent neural networks: vanishing and exploding gradients are not the
  end of the story
Recurrent neural networks: vanishing and exploding gradients are not the end of the story
Nicolas Zucchet
Antonio Orvieto
ODL
AAML
40
9
0
31 May 2024
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
Roy Miles
Pradyumna Reddy
Ismail Elezi
Jiankang Deng
VLM
32
3
0
28 May 2024
Phase Transitions in the Output Distribution of Large Language Models
Phase Transitions in the Output Distribution of Large Language Models
Julian Arnold
Flemming Holtorf
Frank Schafer
Niels Lörch
41
1
0
27 May 2024
LoQT: Low Rank Adapters for Quantized Training
LoQT: Low Rank Adapters for Quantized Training
Sebastian Loeschcke
M. Toftrup
M. Kastoryano
Serge J. Belongie
Vésteinn Snæbjarnarson
MQ
34
3
0
26 May 2024
Does SGD really happen in tiny subspaces?
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
66
4
1
25 May 2024
Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural
  Networks
Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks
Xin-Chun Li
Lan Li
De-Chuan Zhan
33
2
0
21 May 2024
Differentially Private Federated Learning without Noise Addition: When
  is it Possible?
Differentially Private Federated Learning without Noise Addition: When is it Possible?
Jiang Zhang
Konstantinos Psounis
FedML
40
0
0
06 May 2024
Machine Unlearning via Null Space Calibration
Machine Unlearning via Null Space Calibration
Huiqiang Chen
Tianqing Zhu
Xin Yu
Wanlei Zhou
39
6
0
21 Apr 2024
Unifying Low Dimensional Observations in Deep Learning Through the Deep
  Linear Unconstrained Feature Model
Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model
Connall Garrod
Jonathan P. Keating
33
8
0
09 Apr 2024
Random Search as a Baseline for Sparse Neural Network Architecture
  Search
Random Search as a Baseline for Sparse Neural Network Architecture Search
Rezsa Farahani
25
0
0
13 Mar 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao
Zhenyu (Allen) Zhang
Beidi Chen
Zhangyang Wang
A. Anandkumar
Yuandong Tian
43
173
0
06 Mar 2024
Why Transformers Need Adam: A Hessian Perspective
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Ruoyu Sun
Zhimin Luo
37
41
0
26 Feb 2024
Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution
  for Weak Features
Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features
Rodrigo Veiga
Anastasia Remizova
Nicolas Macris
34
0
0
12 Feb 2024
On Differentially Private Subspace Estimation in a Distribution-Free
  Setting
On Differentially Private Subspace Estimation in a Distribution-Free Setting
Eliad Tsfadia
23
1
0
09 Feb 2024
Deconstructing the Goldilocks Zone of Neural Network Initialization
Deconstructing the Goldilocks Zone of Neural Network Initialization
Artem Vysogorets
Anna Dawid
Julia Kempe
38
1
0
05 Feb 2024
Identifying Policy Gradient Subspaces
Identifying Policy Gradient Subspaces
Jan Schneider-Barnes
Pierre Schumacher
Simon Guist
Le Chen
D. Haeufle
Bernhard Scholkopf
Dieter Buchler
36
5
0
12 Jan 2024
Enhancing Neural Training via a Correlated Dynamics Model
Enhancing Neural Training via a Correlated Dynamics Model
Jonathan Brokman
Roy Betser
Rotem Turjeman
Tom Berkov
I. Cohen
Guy Gilboa
24
3
0
20 Dec 2023
PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance
PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance
Haichao Sha
Ruixuan Liu
Yi-xiao Liu
Hong Chen
52
1
0
06 Dec 2023
Directions of Curvature as an Explanation for Loss of Plasticity
Directions of Curvature as an Explanation for Loss of Plasticity
Alex Lewandowski
Haruto Tanaka
Dale Schuurmans
Marlos C. Machado
11
5
0
30 Nov 2023
Low-Dimensional Gradient Helps Out-of-Distribution Detection
Low-Dimensional Gradient Helps Out-of-Distribution Detection
Yingwen Wu
Tao Li
Xinwen Cheng
Jie-jin Yang
Xiaolin Huang
OODD
49
3
0
26 Oct 2023
DPZero: Private Fine-Tuning of Language Models without Backpropagation
DPZero: Private Fine-Tuning of Language Models without Backpropagation
Liang Zhang
Bingcong Li
K. K. Thekumparampil
Sewoong Oh
Niao He
28
11
0
14 Oct 2023
Spectral alignment of stochastic gradient descent for high-dimensional classification tasks
Spectral alignment of stochastic gradient descent for high-dimensional classification tasks
Gerard Ben Arous
Reza Gheissari
Jiaoyang Huang
Aukosh Jagannath
27
14
0
04 Oct 2023
Towards guarantees for parameter isolation in continual learning
Towards guarantees for parameter isolation in continual learning
Giulia Lanzillotta
Sidak Pal Singh
Benjamin Grewe
Thomas Hofmann
27
0
0
02 Oct 2023
Separable Gaussian Neural Networks: Structure, Analysis, and Function
  Approximations
Separable Gaussian Neural Networks: Structure, Analysis, and Function Approximations
S. Xing
Jianqiao Sun
12
6
0
13 Aug 2023
Unveiling the Hessian's Connection to the Decision Boundary
Unveiling the Hessian's Connection to the Decision Boundary
Mahalakshmi Sabanayagam
Freya Behrens
Urte Adomaityte
Anna Dawid
20
5
0
12 Jun 2023
Correlated Noise in Epoch-Based Stochastic Gradient Descent:
  Implications for Weight Variances
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances
Marcel Kühn
B. Rosenow
11
3
0
08 Jun 2023
Catapults in SGD: spikes in the training loss and their impact on
  generalization through feature learning
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
Libin Zhu
Chaoyue Liu
Adityanarayanan Radhakrishnan
M. Belkin
30
13
0
07 Jun 2023
1234
Next