ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.04754
  4. Cited By
Gradient Descent Happens in a Tiny Subspace

Gradient Descent Happens in a Tiny Subspace

12 December 2018
Guy Gur-Ari
Daniel A. Roberts
Ethan Dyer
ArXivPDFHTML

Papers citing "Gradient Descent Happens in a Tiny Subspace"

50 / 163 papers shown
Title
Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards
  Simpler Subnetworks
Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks
F. Chen
D. Kunin
Atsushi Yamamura
Surya Ganguli
23
26
0
07 Jun 2023
Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific
  Subspaces of Pre-trained Language Models
Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific Subspaces of Pre-trained Language Models
Zhong Zhang
Bang Liu
Junming Shao
23
6
0
27 May 2023
On the special role of class-selective neurons in early training
On the special role of class-selective neurons in early training
Omkar Ranadive
Nikhil Thakurdesai
Ari S. Morcos
Matthew L. Leavitt
Stéphane Deny
17
2
0
27 May 2023
The Hessian perspective into the Nature of Convolutional Neural Networks
The Hessian perspective into the Nature of Convolutional Neural Networks
Sidak Pal Singh
Thomas Hofmann
Bernhard Schölkopf
33
10
0
16 May 2023
Learning Linear Embeddings for Non-Linear Network Dynamics with Koopman
  Message Passing
Learning Linear Embeddings for Non-Linear Network Dynamics with Koopman Message Passing
King Fai Yeh
Paris D. L. Flood
William T. Redman
Pietro Lio'
22
1
0
15 May 2023
PGrad: Learning Principal Gradients For Domain Generalization
PGrad: Learning Principal Gradients For Domain Generalization
Zhe Wang
J. E. Grigsby
Yanjun Qi
OOD
29
10
0
02 May 2023
Towards a Phenomenological Understanding of Neural Networks: Data
Towards a Phenomenological Understanding of Neural Networks: Data
S. Tovey
Sven Krippendorf
Konstantin Nikolaou
Daniel Fink
FedML
27
2
0
01 May 2023
Low Rank Optimization for Efficient Deep Learning: Making A Balance
  between Compact Architecture and Fast Training
Low Rank Optimization for Efficient Deep Learning: Making A Balance between Compact Architecture and Fast Training
Xinwei Ou
Zhangxin Chen
Ce Zhu
Yipeng Liu
21
2
0
22 Mar 2023
Choosing Public Datasets for Private Machine Learning via Gradient
  Subspace Distance
Choosing Public Datasets for Private Machine Learning via Gradient Subspace Distance
Xin Gu
Gautam Kamath
Zhiwei Steven Wu
25
12
0
02 Mar 2023
Identifying Equivalent Training Dynamics
Identifying Equivalent Training Dynamics
William T. Redman
J. M. Bello-Rivas
M. Fonoberova
Ryan Mohr
Ioannis G. Kevrekidis
Igor Mezić
27
2
0
17 Feb 2023
Sketchy: Memory-efficient Adaptive Regularization with Frequent
  Directions
Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions
Vladimir Feinberg
Xinyi Chen
Y. Jennifer Sun
Rohan Anil
Elad Hazan
29
12
0
07 Feb 2023
On a continuous time model of gradient descent dynamics and instability
  in deep learning
On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin
16
6
0
03 Feb 2023
STEP: Learning N:M Structured Sparsity Masks from Scratch with
  Precondition
STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition
Yucheng Lu
Shivani Agrawal
Suvinay Subramanian
Oleg Rybakov
Chris De Sa
Amir Yazdanbakhsh
16
16
0
02 Feb 2023
Communication-Efficient Federated Learning for Heterogeneous Edge
  Devices Based on Adaptive Gradient Quantization
Communication-Efficient Federated Learning for Heterogeneous Edge Devices Based on Adaptive Gradient Quantization
Heting Liu
Fang He
Guohong Cao
FedML
MQ
21
24
0
16 Dec 2022
Accelerating Dataset Distillation via Model Augmentation
Accelerating Dataset Distillation via Model Augmentation
Lei Zhang
Jie M. Zhang
Bowen Lei
Subhabrata Mukherjee
Xiang Pan
Bo-Lu Zhao
Caiwen Ding
Y. Li
Dongkuan Xu
DD
40
62
0
12 Dec 2022
On the Overlooked Structure of Stochastic Gradients
On the Overlooked Structure of Stochastic Gradients
Zeke Xie
Qian-Yuan Tang
Mingming Sun
P. Li
28
6
0
05 Dec 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of
  SGD via Training Trajectories and via Terminal States
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States
Ziqiao Wang
Yongyi Mao
27
10
0
19 Nov 2022
Robust Federated Learning against both Data Heterogeneity and Poisoning
  Attack via Aggregation Optimization
Robust Federated Learning against both Data Heterogeneity and Poisoning Attack via Aggregation Optimization
Yueqi Xie
Weizhong Zhang
Renjie Pi
Fangzhao Wu
Qifeng Chen
Xing Xie
Sunghun Kim
FedML
23
7
0
10 Nov 2022
Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation
Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation
Cody Blakeney
Jessica Zosa Forde
Jonathan Frankle
Ziliang Zong
Matthew L. Leavitt
VLM
22
4
0
01 Nov 2022
A picture of the space of typical learnable tasks
A picture of the space of typical learnable tasks
Rahul Ramesh
J. Mao
Itay Griniasty
Rubing Yang
H. Teoh
M. Transtrum
J. Sethna
Pratik Chaudhari
SSL
DRL
36
4
0
31 Oct 2022
Noise Injection as a Probe of Deep Learning Dynamics
Noise Injection as a Probe of Deep Learning Dynamics
Noam Levi
I. Bloch
M. Freytsis
T. Volansky
37
2
0
24 Oct 2022
Precision Machine Learning
Precision Machine Learning
Eric J. Michaud
Ziming Liu
Max Tegmark
24
34
0
24 Oct 2022
On the optimization and pruning for Bayesian deep learning
On the optimization and pruning for Bayesian deep learning
X. Ke
Yanan Fan
BDL
UQCV
27
1
0
24 Oct 2022
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with
  Latest Weight Averaging
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging
Jean Kaddour
MoMe
3DH
24
39
0
29 Sep 2022
LGV: Boosting Adversarial Example Transferability from Large Geometric
  Vicinity
LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity
Martin Gubri
Maxime Cordy
Mike Papadakis
Yves Le Traon
Koushik Sen
AAML
27
51
0
26 Jul 2022
When Does Differentially Private Learning Not Suffer in High Dimensions?
When Does Differentially Private Learning Not Suffer in High Dimensions?
Xuechen Li
Daogao Liu
Tatsunori Hashimoto
Huseyin A. Inan
Janardhan Kulkarni
Y. Lee
Abhradeep Thakurta
25
58
0
01 Jul 2022
Winning the Lottery Ahead of Time: Efficient Early Network Pruning
Winning the Lottery Ahead of Time: Efficient Early Network Pruning
John Rachwan
Daniel Zügner
Bertrand Charpentier
Simon Geisler
Morgane Ayle
Stephan Günnemann
25
24
0
21 Jun 2022
Gradient Descent for Low-Rank Functions
Gradient Descent for Low-Rank Functions
Romain Cosson
Ali Jadbabaie
A. Makur
Amirhossein Reisizadeh
Devavrat Shah
23
3
0
16 Jun 2022
Few-Shot Learning by Dimensionality Reduction in Gradient Space
Few-Shot Learning by Dimensionality Reduction in Gradient Space
M. Gauch
M. Beck
Thomas Adler
D. Kotsur
Stefan Fiel
...
Markus Holzleitner
Werner Zellinger
D. Klotz
Sepp Hochreiter
Sebastian Lehner
35
9
0
07 Jun 2022
Trainable Weight Averaging: Accelerating Training and Improving Generalization
Trainable Weight Averaging: Accelerating Training and Improving Generalization
Tao Li
Zhehao Huang
Yingwen Wu
Zhengbao He
Qinghua Tao
X. Huang
Chih-Jen Lin
MoMe
50
3
0
26 May 2022
Memorization Without Overfitting: Analyzing the Training Dynamics of
  Large Language Models
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Kushal Tirumala
Aram H. Markosyan
Luke Zettlemoyer
Armen Aghajanyan
TDI
29
185
0
22 May 2022
Deep learning, stochastic gradient descent and diffusion maps
Deep learning, stochastic gradient descent and diffusion maps
Carmina Fjellström
Kaj Nyström
DiffM
20
14
0
04 Apr 2022
Origami in N dimensions: How feed-forward networks manufacture linear
  separability
Origami in N dimensions: How feed-forward networks manufacture linear separability
Christian Keup
M. Helias
13
8
0
21 Mar 2022
Towards understanding deep learning with the natural clustering prior
Towards understanding deep learning with the natural clustering prior
Simon Carbonnelle
13
0
0
15 Mar 2022
Recycling Model Updates in Federated Learning: Are Gradient Subspaces
  Low-Rank?
Recycling Model Updates in Federated Learning: Are Gradient Subspaces Low-Rank?
Sheikh Shams Azam
Seyyedali Hosseinalipour
Qiang Qiu
Christopher G. Brinton
FedML
20
20
0
01 Feb 2022
Generalizing to New Physical Systems via Context-Informed Dynamics Model
Generalizing to New Physical Systems via Context-Informed Dynamics Model
Matthieu Kirchmeyer
Yuan Yin
Jérémie Donà
Nicolas Baskiotis
A. Rakotomamonjy
Patrick Gallinari
OOD
AI4CE
24
32
0
01 Feb 2022
On the Power-Law Hessian Spectrums in Deep Learning
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
42
8
0
31 Jan 2022
Eigenvalues of Autoencoders in Training and at Initialization
Eigenvalues of Autoencoders in Training and at Initialization
Ben Dees
S. Agarwala
Corey Lowman
19
0
0
27 Jan 2022
There is a Singularity in the Loss Landscape
M. Lowell
16
0
0
12 Jan 2022
Neural Capacitance: A New Perspective of Neural Network Selection via
  Edge Dynamics
Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics
Chunheng Jiang
Tejaswini Pedapati
Pin-Yu Chen
Yizhou Sun
Jianxi Gao
21
2
0
11 Jan 2022
Federated Optimization of Smooth Loss Functions
Federated Optimization of Smooth Loss Functions
Ali Jadbabaie
A. Makur
Devavrat Shah
FedML
19
7
0
06 Jan 2022
Conditional Imitation Learning for Multi-Agent Games
Conditional Imitation Learning for Multi-Agent Games
Andy Shih
Stefano Ermon
Dorsa Sadigh
29
11
0
05 Jan 2022
Public Data-Assisted Mirror Descent for Private Model Training
Public Data-Assisted Mirror Descent for Private Model Training
Ehsan Amid
Arun Ganesh
Rajiv Mathews
Swaroop Indra Ramaswamy
Shuang Song
Thomas Steinke
Vinith M. Suriyakumar
Om Thakkar
Abhradeep Thakurta
8
49
0
01 Dec 2021
Subspace Adversarial Training
Subspace Adversarial Training
Tao Li
Yingwen Wu
Sizhe Chen
Kun Fang
Xiaolin Huang
AAML
OOD
38
56
0
24 Nov 2021
MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning
MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning
Siladittya Manna
Umapada Pal
Saumik Bhattacharya
SSL
35
1
0
24 Nov 2021
An Operator Theoretic View on Pruning Deep Neural Networks
An Operator Theoretic View on Pruning Deep Neural Networks
William T. Redman
M. Fonoberova
Ryan Mohr
Y. Kevrekidis
Igor Mezić
38
17
0
28 Oct 2021
Does the Data Induce Capacity Control in Deep Learning?
Does the Data Induce Capacity Control in Deep Learning?
Rubing Yang
J. Mao
Pratik Chaudhari
25
15
0
27 Oct 2021
Universality of Winning Tickets: A Renormalization Group Perspective
Universality of Winning Tickets: A Renormalization Group Perspective
William T. Redman
Tianlong Chen
Zhangyang Wang
Akshunna S. Dogra
UQCV
54
7
0
07 Oct 2021
On the Impact of Stable Ranks in Deep Nets
On the Impact of Stable Ranks in Deep Nets
B. Georgiev
L. Franken
Mayukh Mukherjee
Georgios Arvanitidis
13
3
0
05 Oct 2021
Fishr: Invariant Gradient Variances for Out-of-Distribution
  Generalization
Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization
Alexandre Ramé
Corentin Dancette
Matthieu Cord
OOD
38
204
0
07 Sep 2021
Previous
1234
Next