ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.04552
  4. Cited By
The Benefits of Implicit Regularization from SGD in Least Squares
  Problems
v1v2 (latest)

The Benefits of Implicit Regularization from SGD in Least Squares Problems

Neural Information Processing Systems (NeurIPS), 2021
10 August 2021
Difan Zou
Jingfeng Wu
Vladimir Braverman
Quanquan Gu
Dean Phillips Foster
Sham Kakade
ArXiv (abs)PDFHTMLGithub

Papers citing "The Benefits of Implicit Regularization from SGD in Least Squares Problems"

26 / 26 papers shown
On the Interplay between Graph Structure and Learning Algorithms in Graph Neural Networks
On the Interplay between Graph Structure and Learning Algorithms in Graph Neural Networks
Junwei Su
Chuan Wu
118
2
0
20 Aug 2025
Improved Scaling Laws in Linear Regression via Data Reuse
Licong Lin
Jingfeng Wu
Peter Bartlett
235
2
0
10 Jun 2025
Learning Curves of Stochastic Gradient Descent in Kernel Regression
Learning Curves of Stochastic Gradient Descent in Kernel Regression
Haihan Zhang
Weicheng Lin
Yuanshi Liu
Cong Fang
204
2
0
28 May 2025
Memory-Statistics Tradeoff in Continual Learning with Structural Regularization
Memory-Statistics Tradeoff in Continual Learning with Structural Regularization
Haoran Li
Jingfeng Wu
Vladimir Braverman
CLL
397
3
0
05 Apr 2025
Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors
Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors
Runxi Cheng
Feng Xiong
Yongxian Wei
Wanyun Zhu
Chun Yuan
MoMe
523
30
0
11 Mar 2025
How Transformers Utilize Multi-Head Attention in In-Context Learning? A
  Case Study on Sparse Linear Regression
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear RegressionNeural Information Processing Systems (NeurIPS), 2024
Xingwu Chen
Lei Zhao
Difan Zou
270
16
0
08 Aug 2024
Scaling Laws in Linear Regression: Compute, Parameters, and Data
Scaling Laws in Linear Regression: Compute, Parameters, and Data
Licong Lin
Jingfeng Wu
Sham Kakade
Peter L. Bartlett
Jason D. Lee
LRM
559
43
0
12 Jun 2024
On the Benefits of Over-parameterization for Out-of-Distribution
  Generalization
On the Benefits of Over-parameterization for Out-of-Distribution Generalization
Yifan Hao
Yong Lin
Difan Zou
Tong Zhang
OODDOOD
281
7
0
26 Mar 2024
Improving Implicit Regularization of SGD with Preconditioning for Least
  Square Problems
Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems
Junwei Su
Difan Zou
Chuan Wu
476
0
0
13 Mar 2024
Efficient Compression of Overparameterized Deep Models through
  Low-Dimensional Learning Dynamics
Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics
Soo Min Kwon
Zekai Zhang
Dogyoon Song
Laura Balzano
Qing Qu
355
4
0
08 Nov 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM
  Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
344
33
0
14 Sep 2023
Transformers as Support Vector Machines
Transformers as Support Vector Machines
Davoud Ataee Tarzanagh
Yingcong Li
Christos Thrampoulidis
Samet Oymak
531
67
0
31 Aug 2023
Max-Margin Token Selection in Attention Mechanism
Max-Margin Token Selection in Attention MechanismNeural Information Processing Systems (NeurIPS), 2023
Davoud Ataee Tarzanagh
Yingcong Li
Xuechen Zhang
Samet Oymak
612
61
0
23 Jun 2023
Federated Learning under Covariate Shifts with Generalization Guarantees
Federated Learning under Covariate Shifts with Generalization Guarantees
Ali Ramezani-Kebrya
Fanghui Liu
Thomas Pethick
Grigorios G. Chrysos
Volkan Cevher
FedMLOOD
391
12
0
08 Jun 2023
Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron
Finite-Sample Analysis of Learning High-Dimensional Single ReLU NeuronInternational Conference on Machine Learning (ICML), 2023
Jingfeng Wu
Difan Zou
Zixiang Chen
Vladimir Braverman
Quanquan Gu
Sham Kakade
323
9
0
03 Mar 2023
Local SGD in Overparameterized Linear Regression
Local SGD in Overparameterized Linear Regression
Mike Nguyen
Charly Kirst
Nicole Mücke
185
0
0
20 Oct 2022
Losing momentum in continuous-time stochastic optimisation
Losing momentum in continuous-time stochastic optimisation
Kexin Jin
J. Latz
Chenguang Liu
Alessandro Scagliotti
174
3
0
08 Sep 2022
The Power and Limitation of Pretraining-Finetuning for Linear Regression
  under Covariate Shift
The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate ShiftNeural Information Processing Systems (NeurIPS), 2022
Jingfeng Wu
Difan Zou
Vladimir Braverman
Quanquan Gu
Sham Kakade
214
24
0
03 Aug 2022
Implicit Regularization with Polynomial Growth in Deep Tensor
  Factorization
Implicit Regularization with Polynomial Growth in Deep Tensor FactorizationInternational Conference on Machine Learning (ICML), 2022
Kais Hariz
Hachem Kadri
Stéphane Ayache
Maher Moakher
Thierry Artières
210
4
0
18 Jul 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On
  Equivalence to Mirror Descent
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror DescentNeural Information Processing Systems (NeurIPS), 2022
Zhiyuan Li
Tianhao Wang
Jason D. Lee
Sanjeev Arora
368
37
0
08 Jul 2022
A Novel Fast Exact Subproblem Solver for Stochastic Quasi-Newton Cubic
  Regularized Optimization
A Novel Fast Exact Subproblem Solver for Stochastic Quasi-Newton Cubic Regularized Optimization
Jarad Forristal
J. Griffin
Wenwen Zhou
S. Yektamaram
ODL
241
0
0
19 Apr 2022
Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation
  Regime
Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation RegimeNeural Information Processing Systems (NeurIPS), 2022
Difan Zou
Jingfeng Wu
Vladimir Braverman
Quanquan Gu
Sham Kakade
265
8
0
07 Mar 2022
Last Iterate Risk Bounds of SGD with Decaying Stepsize for
  Overparameterized Linear Regression
Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear RegressionInternational Conference on Machine Learning (ICML), 2021
Jingfeng Wu
Difan Zou
Vladimir Braverman
Quanquan Gu
Sham Kakade
373
37
0
12 Oct 2021
Regularization Guarantees Generalization in Bayesian Reinforcement
  Learning through Algorithmic Stability
Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic StabilityAAAI Conference on Artificial Intelligence (AAAI), 2021
Aviv Tamar
Daniel Soudry
E. Zisselman
OODOffRL
248
9
0
24 Sep 2021
Comparing Classes of Estimators: When does Gradient Descent Beat Ridge
  Regression in Linear Models?
Comparing Classes of Estimators: When does Gradient Descent Beat Ridge Regression in Linear Models?
Dominic Richards
Guang Cheng
Patrick Rebeschini
568
4
0
26 Aug 2021
Learning distinct features helps, provably
Learning distinct features helps, provably
Firas Laakom
Jenni Raitoharju
Alexandros Iosifidis
Moncef Gabbouj
MLT
295
6
0
10 Jun 2021
1
Page 1 of 1