Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.09572
Cited By
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
International Conference on Learning Representations (ICLR), 2020
21 February 2020
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Dong Wang
Krzysztof J. Geras
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The Break-Even Point on Optimization Trajectories of Deep Neural Networks"
50 / 134 papers shown
Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks
Florian A. Hölzl
Daniel Rueckert
Georgios Kaissis
177
0
0
29 Oct 2025
BSFA: Leveraging the Subspace Dichotomy to Accelerate Neural Network Training
Wenjie Zhou
Bohan Wang
Wei Chen
Xueqi Cheng
173
0
0
29 Oct 2025
Growing Winning Subnetworks, Not Pruning Them: A Paradigm for Density Discovery in Sparse Neural Networks
Qihang Yao
Constantine Dovrolis
185
0
0
30 Sep 2025
Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
Shuang Liang
Guido Montúfar
303
2
0
29 Sep 2025
Dynamics of Learning: Generative Schedules from Latent ODEs
Matt L. Sampson
Peter Melchior
168
0
0
27 Sep 2025
VASSO: Variance Suppression for Sharpness-Aware Minimization
Bingcong Li
Yilang Zhang
G. Giannakis
351
1
0
02 Sep 2025
Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility
Melih Barsbey
Lucas Prieto
Stefanos Zafeiriou
Tolga Birdal
321
0
0
23 Jul 2025
Reactivation: Empirical NTK Dynamics Under Task Shifts
Y. Liu
Zixuan Chen
Zirui Zhang
Yufei Liu
Giulia Lanzillotta
205
0
0
21 Jul 2025
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
Martin Marek
Sanae Lotfi
Aditya Somasundaram
A. Wilson
Micah Goldblum
LRM
488
27
0
09 Jul 2025
Hidden Breakthroughs in Language Model Training
Sara Kangaslahti
Elan Rosenfeld
Naomi Saphra
268
11
0
18 Jun 2025
Constant Stepsize Local GD for Logistic Regression: Acceleration by Instability
M. Crawshaw
Blake Woodworth
Mingrui Liu
267
1
0
16 Jun 2025
Variational Learning Finds Flatter Solutions at the Edge of Stability
Avrajit Ghosh
Bai Cong
Rio Yokota
S. Ravishankar
Rongrong Wang
Molei Tao
Mohammad Emtiyaz Khan
Thomas Möllenhoff
MLT
377
1
0
15 Jun 2025
Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models?
Sigma Jahan
Mohammad Masudur Rahman
231
0
0
09 Jun 2025
Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More
Geonhui Yoo
Minhak Song
Chulhee Yun
FAtt
234
1
0
07 Jun 2025
Adaptive Preconditioners Trigger Loss Spikes in Adam
Zhiwei Bai
Zhangchen Zhou
Jiajie Zhao
Xiaolong Li
Zhiyu Li
Feiyu Xiong
Hongkang Yang
Yaoyu Zhang
Z. Xu
ODL
388
3
0
05 Jun 2025
GradPower: Powering Gradients for Faster Language Model Pre-Training
Mingze Wang
Jinbo Wang
Jiaqi Zhang
Wei Wang
Peng Pei
Xunliang Cai
Weinan E
Lei Wu
251
2
0
30 May 2025
Understanding Differential Transformer Unchains Pretrained Self-Attentions
Chaerin Kong
Jiho Jang
Nojun Kwak
586
0
0
22 May 2025
New Evidence of the Two-Phase Learning Dynamics of Neural Networks
Zhanpeng Zhou
Yongyi Yang
Mahito Sugiyama
Junchi Yan
246
3
0
20 May 2025
Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong
Yushun Zhang
Jianfeng Yao
Jianfeng Yao
371
5
0
05 May 2025
How Effective Can Dropout Be in Multiple Instance Learning ?
Wenhui Zhu
Peijie Qiu
Xiwen Chen
Zhangsihao Yang
Aristeidis Sotiras
Abolfazl Razi
Yanjie Wang
540
5
0
21 Apr 2025
Enlightenment Period Improving DNN Performance
Tiantian Liu
Meng Wan
Meng Wan
Jue Wang
273
0
0
02 Apr 2025
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
Computer Vision and Pattern Recognition (CVPR), 2025
Chengxiang Huang
Yake Wei
Zequn Yang
D. Hu
316
13
0
24 Mar 2025
A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Liming Liu
Zixuan Zhang
S. Du
T. Zhao
397
1
0
04 Mar 2025
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
Jinbo Wang
Mingze Wang
Zhanpeng Zhou
Junchi Yan
Weinan E
Lei Wu
534
15
0
26 Feb 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra
Tianyu He
M. Barkeshli
428
13
0
17 Feb 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
503
4
0
15 Jan 2025
Where Do Large Learning Rates Lead Us?
Neural Information Processing Systems (NeurIPS), 2024
Ildus Sadrtdinov
M. Kodryan
Eduard Pokonechny
E. Lobacheva
Dmitry Vetrov
AI4CE
375
6
0
29 Oct 2024
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
International Conference on Learning Representations (ICLR), 2024
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
596
14
0
14 Oct 2024
Fisher Information guided Purification against Backdoor Attacks
Conference on Computer and Communications Security (CCS), 2024
Nazmul Karim
Abdullah Al Arafat
Adnan Siraj Rakin
Zhishan Guo
Nazanin Rahnavard
AAML
385
5
0
01 Sep 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
490
2
0
26 Aug 2024
Stepping on the Edge: Curvature Aware Learning Rate Tuners
Vincent Roulet
Atish Agarwala
Jean-Bastien Grill
Grzegorz Swirszcz
Mathieu Blondel
Fabian Pedregosa
455
6
0
08 Jul 2024
Flat Posterior Does Matter For Bayesian Model Averaging
Sungjun Lim
Jeyoon Yeom
Sooyon Kim
Hoyoon Byun
Jinho Kang
Yohan Jung
Jiyoung Jung
Kyungwoo Song
BDL
AAML
921
0
0
21 Jun 2024
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
627
22
1
25 May 2024
SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data
Sakshi Choudhary
Sai Aparna Aketi
Kaushik Roy
FedML
400
1
0
22 May 2024
Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks
Xin-Chun Li
Jinli Tang
Bo Zhang
Lan Li
De-Chuan Zhan
391
2
0
21 May 2024
High dimensional analysis reveals conservative sharpening and a stochastic edge of stability
Atish Agarwala
Jeffrey Pennington
438
12
0
30 Apr 2024
Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model
Connall Garrod
Jonathan P. Keating
408
10
0
09 Apr 2024
Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning
Lorenzo Noci
Alexandru Meterez
Thomas Hofmann
Antonio Orvieto
258
1
0
27 Feb 2024
Deconstructing the Goldilocks Zone of Neural Network Initialization
International Conference on Machine Learning (ICML), 2024
Artem Vysogorets
Anna Dawid
Julia Kempe
280
4
0
05 Feb 2024
A Precise Characterization of SGD Stability Using Loss Surface Geometry
International Conference on Learning Representations (ICLR), 2024
Gregory Dexter
Borja Ocejo
S. Keerthi
Aman Gupta
Ayan Acharya
Rajiv Khanna
MLT
268
2
0
22 Jan 2024
Investigation into the Training Dynamics of Learned Optimizers
International Conference on Agents and Artificial Intelligence (ICAART), 2023
Jan Sobotka
Petr Simánek
Daniel Vasata
279
0
0
12 Dec 2023
Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling
International Conference on Machine Learning (ICML), 2023
Mingze Wang
Zeping Min
Lei Wu
566
4
0
24 Nov 2023
A Coefficient Makes SVRG Effective
Yida Yin
Zhiqiu Xu
Zhiyuan Li
Trevor Darrell
Zhuang Liu
354
5
0
09 Nov 2023
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
Elan Rosenfeld
Andrej Risteski
295
15
0
07 Nov 2023
An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent
Zhao Song
Chiwun Yang
349
11
0
17 Oct 2023
From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression
Xuxing Chen
Krishnakumar Balasubramanian
Promit Ghosal
Bhavya Agrawalla
288
12
0
02 Oct 2023
Enhancing Sharpness-Aware Optimization Through Variance Suppression
Neural Information Processing Systems (NeurIPS), 2023
Bingcong Li
G. Giannakis
AAML
568
39
0
27 Sep 2023
Sharpness-Aware Minimization and the Edge of Stability
Journal of machine learning research (JMLR), 2023
Philip M. Long
Peter L. Bartlett
AAML
750
16
0
21 Sep 2023
Towards Last-layer Retraining for Group Robustness with Fewer Annotations
Neural Information Processing Systems (NeurIPS), 2023
Tyler LaBonte
Vidya Muthukumar
Abhishek Kumar
444
60
0
15 Sep 2023
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
International Conference on Learning Representations (ICLR), 2023
Angelica Chen
Ravid Schwartz-Ziv
Dong Wang
Matthew L. Leavitt
Naomi Saphra
669
116
0
13 Sep 2023
1
2
3
Next
Page 1 of 3