ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1710.10345
  4. Cited By
The Implicit Bias of Gradient Descent on Separable Data

The Implicit Bias of Gradient Descent on Separable Data

27 October 2017
Daniel Soudry
Elad Hoffer
Mor Shpigel Nacson
Suriya Gunasekar
Nathan Srebro
ArXivPDFHTML

Papers citing "The Implicit Bias of Gradient Descent on Separable Data"

42 / 42 papers shown
Title
Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks
Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks
Ichiro Hashimoto
MLT
42
0
0
22 May 2025
Embedding principle of homogeneous neural network for classification problem
Embedding principle of homogeneous neural network for classification problem
Jiahan Zhang
Yaoyu Zhang
Yaoyu Zhang
35
0
0
18 May 2025
An Analytical Characterization of Sloppiness in Neural Networks: Insights from Linear Models
An Analytical Characterization of Sloppiness in Neural Networks: Insights from Linear Models
Jialin Mao
Itay Griniasty
Yan Sun
Mark K. Transtrum
James P. Sethna
Pratik Chaudhari
75
0
0
13 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
85
0
0
02 May 2025
Weight Ensembling Improves Reasoning in Language Models
Weight Ensembling Improves Reasoning in Language Models
Xingyu Dang
Christina Baek
Kaiyue Wen
Zico Kolter
Aditi Raghunathan
MoMe
LRM
82
1
0
14 Apr 2025
Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks
Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks
Chenyang Zhang
Peifeng Gao
Difan Zou
Yuan Cao
OOD
MLT
108
0
0
11 Apr 2025
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
48
0
0
05 Apr 2025
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
Yize Zhao
Tina Behnia
V. Vakilian
Christos Thrampoulidis
124
10
0
20 Feb 2025
The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networks
Sholom Schechtman
Nicolas Schreuder
383
0
0
08 Feb 2025
Feasible Learning
Juan Ramirez
Ignacio Hounie
Juan Elenter
Jose Gallego-Posada
Meraj Hashemizadeh
Alejandro Ribeiro
Simon Lacoste-Julien
53
2
0
28 Jan 2025
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
Sajad Movahedi
Antonio Orvieto
Seyed-Mohsen Moosavi-Dezfooli
AI4CE
AAML
423
0
0
15 Oct 2024
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
Hojoon Lee
Dongyoon Hwang
Donghu Kim
Hyunseung Kim
Jun Jet Tai
K. Subramanian
Peter R. Wurman
Jaegul Choo
Peter Stone
Takuma Seno
OffRL
107
14
0
13 Oct 2024
Variational Search Distributions
Variational Search Distributions
Daniel M. Steinberg
Rafael Oliveira
Cheng Soon Ong
Edwin V. Bonilla
50
0
0
10 Sep 2024
Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks
Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks
Amit Peleg
Matthias Hein
45
0
0
04 Jul 2024
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
Subhojyoti Mukherjee
Josiah P. Hanna
Qiaomin Xie
Robert Nowak
143
2
0
07 Jun 2024
When does compositional structure yield compositional generalization? A kernel theory
When does compositional structure yield compositional generalization? A kernel theory
Samuel Lippl
Kim Stachenfeld
NAI
CoGe
150
8
0
26 May 2024
Information-Theoretic Generalization Bounds for Deep Neural Networks
Information-Theoretic Generalization Bounds for Deep Neural Networks
Haiyun He
Christina Lee Yu
77
5
0
04 Apr 2024
Neural Redshift: Random Networks are not Random Functions
Neural Redshift: Random Networks are not Random Functions
Damien Teney
A. Nicolicioiu
Valentin Hartmann
Ehsan Abbasnejad
117
22
0
04 Mar 2024
For Better or For Worse? Learning Minimum Variance Features With Label Augmentation
For Better or For Worse? Learning Minimum Variance Features With Label Augmentation
Muthuraman Chidambaram
Rong Ge
AAML
50
0
0
10 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
49
18
0
08 Feb 2024
Precise Asymptotic Generalization for Multiclass Classification with Overparameterized Linear Models
Precise Asymptotic Generalization for Multiclass Classification with Overparameterized Linear Models
David X. Wu
A. Sahai
41
3
0
23 Jun 2023
Penalising the biases in norm regularisation enforces sparsity
Penalising the biases in norm regularisation enforces sparsity
Etienne Boursier
Nicolas Flammarion
74
17
0
02 Mar 2023
Large-time asymptotics in deep learning
Large-time asymptotics in deep learning
Carlos Esteve
Borjan Geshkovski
Dario Pighin
Enrique Zuazua
77
34
0
06 Aug 2020
The Implicit Regularization of Stochastic Gradient Flow for Least
  Squares
The Implicit Regularization of Stochastic Gradient Flow for Least Squares
Alnur Ali
Yan Sun
Robert Tibshirani
54
77
0
17 Mar 2020
Can Implicit Bias Explain Generalization? Stochastic Convex Optimization
  as a Case Study
Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study
Assaf Dauber
M. Feder
Tomer Koren
Roi Livni
39
24
0
13 Mar 2020
Stochastic Gradient Descent on Separable Data: Exact Convergence with a
  Fixed Learning Rate
Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
Mor Shpigel Nacson
Nathan Srebro
Daniel Soudry
FedML
MLT
53
100
0
05 Jun 2018
Implicit Bias of Gradient Descent on Linear Convolutional Networks
Implicit Bias of Gradient Descent on Linear Convolutional Networks
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
MDE
54
408
0
01 Jun 2018
Risk and parameter convergence of logistic regression
Risk and parameter convergence of logistic regression
Ziwei Ji
Matus Telgarsky
35
129
0
20 Mar 2018
Convergence of Gradient Descent on Separable Data
Convergence of Gradient Descent on Separable Data
Mor Shpigel Nacson
Jason D. Lee
Suriya Gunasekar
Pedro H. P. Savarese
Nathan Srebro
Daniel Soudry
60
167
0
05 Mar 2018
Characterizing Implicit Bias in Terms of Optimization Geometry
Characterizing Implicit Bias in Terms of Optimization Geometry
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
AI4CE
62
404
0
22 Feb 2018
Exploring Generalization in Deep Learning
Exploring Generalization in Deep Learning
Behnam Neyshabur
Srinadh Bhojanapalli
David A. McAllester
Nathan Srebro
FAtt
134
1,245
0
27 Jun 2017
Implicit Regularization in Matrix Factorization
Implicit Regularization in Matrix Factorization
Suriya Gunasekar
Blake E. Woodworth
Srinadh Bhojanapalli
Behnam Neyshabur
Nathan Srebro
65
490
0
25 May 2017
Train longer, generalize better: closing the generalization gap in large
  batch training of neural networks
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
142
799
0
24 May 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
50
1,023
0
23 May 2017
Understanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
269
4,620
0
10 Nov 2016
Quantized Neural Networks: Training Neural Networks with Low Precision
  Weights and Activations
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
Itay Hubara
Matthieu Courbariaux
Daniel Soudry
Ran El-Yaniv
Yoshua Bengio
MQ
101
1,859
0
22 Sep 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
362
2,922
0
15 Sep 2016
Train faster, generalize better: Stability of stochastic gradient
  descent
Train faster, generalize better: Stability of stochastic gradient descent
Moritz Hardt
Benjamin Recht
Y. Singer
96
1,234
0
03 Sep 2015
Path-SGD: Path-Normalized Optimization in Deep Neural Networks
Path-SGD: Path-Normalized Optimization in Deep Neural Networks
Behnam Neyshabur
Ruslan Salakhutdinov
Nathan Srebro
ODL
57
305
0
08 Jun 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
844
149,474
0
22 Dec 2014
In Search of the Real Inductive Bias: On the Role of Implicit
  Regularization in Deep Learning
In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning
Behnam Neyshabur
Ryota Tomioka
Nathan Srebro
AI4CE
78
655
0
20 Dec 2014
Margins, Shrinkage, and Boosting
Margins, Shrinkage, and Boosting
Matus Telgarsky
49
73
0
18 Mar 2013
1