Gradient Descent Happens in a Tiny Subspace

12 December 2018

Papers citing "Gradient Descent Happens in a Tiny Subspace"

50 / 163 papers shown

Title
Towards Quantifying the Hessian Structure of Neural Networks Zhaorui Dong Yushun Zhang Z. Luo Jianfeng Yao Ruoyu Sun 28 0 0 05 May 2025
ASGO: Adaptive Structured Gradient Optimization Kang An Yuxing Liu Rui Pan Shiqian Ma D. Goldfarb Tong Zhang ODL 97 2 0 26 Mar 2025
Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning Saber Malekmohammadi Yaoliang Yu Yang Cao FedML 83 5 0 17 Feb 2025
SubTrack your Grad: Gradient Subspace Tracking for Memory and Time Efficient Full-Parameter LLM Training Sahar Rajabi Nayeema Nonta Sirisha Rambhatla 90 0 0 03 Feb 2025
Position: Curvature Matrices Should Be Democratized via Linear Operators Felix Dangel Runa Eschenhagen Weronika Ormaniec Andres Fernandez Lukas Tatzel Agustinus Kristiadi 58 3 0 31 Jan 2025
FOCUS: First Order Concentrated Updating Scheme Yizhou Liu Ziming Liu Jeff Gore ODL 108 1 0 21 Jan 2025
Understanding Gradient Descent through the Training Jacobian Nora Belrose Adam Scherlis 72 1 0 09 Dec 2024
On Generalization Bounds for Neural Networks with Low Rank Layers Andrea Pinto Akshay Rangamani T. Poggio AI4CE 82 1 0 20 Nov 2024
The Persistence of Neural Collapse Despite Low-Rank Bias: An Analytic Perspective Through Unconstrained Features Connall Garrod Jonathan P. Keating 34 2 0 30 Oct 2024
Influential Language Data Selection via Gradient Trajectory Pursuit Zhiwei Deng Tao Li Yang Li 26 1 0 22 Oct 2024
Debiasing Mini-Batch Quadratics for Applications in Deep Learning Lukas Tatzel Bálint Mucsányi Osane Hackel Philipp Hennig 43 0 0 18 Oct 2024
Building a Multivariate Time Series Benchmarking Datasets Inspired by Natural Language Processing (NLP) Mohammad Asif Ibna Mustafa Ferdinand Heinrich AI4TS 22 0 0 14 Oct 2024
Parameter-Efficient Fine-Tuning of Large Language Models using Semantic Knowledge Tuning Nusrat Jahan Prottasha Asif Mahmud Md. Shohanur Islam Sobuj Prakash Bhat Md. Kowsher Niloofar Yousefi O. Garibay 30 4 0 11 Oct 2024
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation Fabian Paischer Lukas Hauzenberger Thomas Schmied Benedikt Alkin Marc Peter Deisenroth Sepp Hochreiter 29 4 0 09 Oct 2024
PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning Qibin Wang Xiaolin Hu Weikai Xu Wei Liu Jian Luan Bin Wang 28 1 0 25 Sep 2024
Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape Tao Li Zhengbao He Yujun Li Yasheng Wang Lifeng Shang X. Huang 53 0 0 22 Sep 2024
Communication-Efficient Federated Low-Rank Update Algorithm and its Connection to Implicit Regularization Haemin Park Diego Klabjan FedML 32 0 0 19 Sep 2024
Propulsion: Steering LLM with Tiny Fine-Tuning Md. Kowsher Nusrat Jahan Prottasha Prakash Bhat 38 4 0 17 Sep 2024
Memory-Efficient LLM Training with Online Subspace Descent Kaizhao Liang Bo Liu Lizhang Chen Qiang Liu 29 7 0 23 Aug 2024
LoRA-GA: Low-Rank Adaptation with Gradient Approximation Shaowen Wang Linxi Yu Jian Li ALM AI4CE 26 27 0 06 Jul 2024
Adam-mini: Use Fewer Learning Rates To Gain More Yushun Zhang Congliang Chen Ziniu Li Tian Ding Chenwei Wu Yinyu Ye Zhi-Quan Luo Ruoyu Sun 36 36 0 24 Jun 2024
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees A. Banerjee Qiaobo Li Yingxue Zhou 44 0 0 11 Jun 2024
Training on the Edge of Stability Is Caused by Layerwise Jacobian Alignment Mark Lowell Catharine A. Kastner 25 0 0 31 May 2024
Recurrent neural networks: vanishing and exploding gradients are not the end of the story Nicolas Zucchet Antonio Orvieto ODL AAML 40 9 0 31 May 2024
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections Roy Miles Pradyumna Reddy Ismail Elezi Jiankang Deng VLM 32 3 0 28 May 2024
Phase Transitions in the Output Distribution of Large Language Models Julian Arnold Flemming Holtorf Frank Schafer Niels Lörch 41 1 0 27 May 2024
LoQT: Low Rank Adapters for Quantized Training Sebastian Loeschcke M. Toftrup M. Kastoryano Serge J. Belongie Vésteinn Snæbjarnarson MQ 34 3 0 26 May 2024
Does SGD really happen in tiny subspaces? Minhak Song Kwangjun Ahn Chulhee Yun 66 4 1 25 May 2024
Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks Xin-Chun Li Lan Li De-Chuan Zhan 33 2 0 21 May 2024
Differentially Private Federated Learning without Noise Addition: When is it Possible? Jiang Zhang Konstantinos Psounis FedML 40 0 0 06 May 2024
Machine Unlearning via Null Space Calibration Huiqiang Chen Tianqing Zhu Xin Yu Wanlei Zhou 39 6 0 21 Apr 2024
Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model Connall Garrod Jonathan P. Keating 33 8 0 09 Apr 2024
Random Search as a Baseline for Sparse Neural Network Architecture Search Rezsa Farahani 25 0 0 13 Mar 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Jiawei Zhao Zhenyu (Allen) Zhang Beidi Chen Zhangyang Wang A. Anandkumar Yuandong Tian 43 173 0 06 Mar 2024
Why Transformers Need Adam: A Hessian Perspective Yushun Zhang Congliang Chen Tian Ding Ziniu Li Ruoyu Sun Zhimin Luo 37 41 0 26 Feb 2024
Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features Rodrigo Veiga Anastasia Remizova Nicolas Macris 34 0 0 12 Feb 2024
On Differentially Private Subspace Estimation in a Distribution-Free Setting Eliad Tsfadia 23 1 0 09 Feb 2024
Deconstructing the Goldilocks Zone of Neural Network Initialization Artem Vysogorets Anna Dawid Julia Kempe 38 1 0 05 Feb 2024
Identifying Policy Gradient Subspaces Jan Schneider-Barnes Pierre Schumacher Simon Guist Le Chen D. Haeufle Bernhard Scholkopf Dieter Buchler 36 5 0 12 Jan 2024
Enhancing Neural Training via a Correlated Dynamics Model Jonathan Brokman Roy Betser Rotem Turjeman Tom Berkov I. Cohen Guy Gilboa 24 3 0 20 Dec 2023
PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance Haichao Sha Ruixuan Liu Yi-xiao Liu Hong Chen 52 1 0 06 Dec 2023
Directions of Curvature as an Explanation for Loss of Plasticity Alex Lewandowski Haruto Tanaka Dale Schuurmans Marlos C. Machado 11 5 0 30 Nov 2023
Low-Dimensional Gradient Helps Out-of-Distribution Detection Yingwen Wu Tao Li Xinwen Cheng Jie-jin Yang Xiaolin Huang OODD 49 3 0 26 Oct 2023
DPZero: Private Fine-Tuning of Language Models without Backpropagation Liang Zhang Bingcong Li K. K. Thekumparampil Sewoong Oh Niao He 28 11 0 14 Oct 2023
Spectral alignment of stochastic gradient descent for high-dimensional classification tasks Gerard Ben Arous Reza Gheissari Jiaoyang Huang Aukosh Jagannath 27 14 0 04 Oct 2023
Towards guarantees for parameter isolation in continual learning Giulia Lanzillotta Sidak Pal Singh Benjamin Grewe Thomas Hofmann 27 0 0 02 Oct 2023
Separable Gaussian Neural Networks: Structure, Analysis, and Function Approximations S. Xing Jianqiao Sun 12 6 0 13 Aug 2023
Unveiling the Hessian's Connection to the Decision Boundary Mahalakshmi Sabanayagam Freya Behrens Urte Adomaityte Anna Dawid 20 5 0 12 Jun 2023
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances Marcel Kühn B. Rosenow 11 3 0 08 Jun 2023
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning Libin Zhu Chaoyue Liu Adityanarayanan Radhakrishnan M. Belkin 30 13 0 07 Jun 2023