Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2105.13462
Cited By
On Linear Stability of SGD and Input-Smoothness of Neural Networks
27 May 2021
Chao Ma
Lexing Ying
MLT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Linear Stability of SGD and Input-Smoothness of Neural Networks"
13 / 13 papers shown
Title
Reasoning Bias of Next Token Prediction Training
Pengxiao Lin
Zhongwang Zhang
Zhi-Qin John Xu
LRM
91
1
0
21 Feb 2025
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
62
0
0
14 Oct 2024
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability
Atish Agarwala
Jeffrey Pennington
41
3
0
30 Apr 2024
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
Kaiyue Wen
Zhiyuan Li
Tengyu Ma
FAtt
36
26
0
20 Jul 2023
How to escape sharp minima with random perturbations
Kwangjun Ahn
Ali Jadbabaie
S. Sra
ODL
29
6
0
25 May 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
18
7
0
19 Feb 2023
On the Lipschitz Constant of Deep Networks and Double Descent
Matteo Gamba
Hossein Azizpour
Marten Bjorkman
25
7
0
28 Jan 2023
Deep Double Descent via Smooth Interpolation
Matteo Gamba
Erik Englesson
Marten Bjorkman
Hossein Azizpour
59
10
0
21 Sep 2022
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
34
72
0
26 Aug 2022
A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective
Chanwoo Park
Sangdoo Yun
Sanghyuk Chun
AAML
21
32
0
21 Aug 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
37
69
0
14 Jun 2022
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes
Chao Ma
D. Kunin
Lei Wu
Lexing Ying
25
27
0
24 Apr 2022
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
281
2,889
0
15 Sep 2016
1