The Break-Even Point on Optimization Trajectories of Deep Neural Networks

International Conference on Learning Representations (ICLR), 2020

21 February 2020

Stanislaw Jastrzebski

Papers citing "The Break-Even Point on Optimization Trajectories of Deep Neural Networks"

50 / 134 papers shown

Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks

Florian A. Hölzl

Daniel Rueckert

Georgios Kaissis

177

29 Oct 2025

BSFA: Leveraging the Subspace Dichotomy to Accelerate Neural Network Training

173

29 Oct 2025

Growing Winning Subnetworks, Not Pruning Them: A Paradigm for Density Discovery in Sparse Neural Networks

Qihang Yao

Constantine Dovrolis

185

30 Sep 2025

Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region

Shuang Liang

Guido Montúfar

303

29 Sep 2025

Dynamics of Learning: Generative Schedules from Latent ODEs

Matt L. Sampson

Peter Melchior

168

27 Sep 2025

VASSO: Variance Suppression for Sharpness-Aware Minimization

Bingcong Li

Yilang Zhang

G. Giannakis

351

02 Sep 2025

Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility

321

23 Jul 2025

Reactivation: Empirical NTK Dynamics Under Task Shifts

205

21 Jul 2025

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful

488

09 Jul 2025

Hidden Breakthroughs in Language Model Training

Sara Kangaslahti

Elan Rosenfeld

Naomi Saphra

268

18 Jun 2025

Constant Stepsize Local GD for Logistic Regression: Acceleration by Instability

M. Crawshaw

Blake Woodworth

Mingrui Liu

267

16 Jun 2025

Variational Learning Finds Flatter Solutions at the Edge of Stability

Mohammad Emtiyaz Khan

Thomas Möllenhoff

MLT

377

15 Jun 2025

Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models?

Sigma Jahan

Mohammad Masudur Rahman

231

09 Jun 2025

Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More

234

07 Jun 2025

Adaptive Preconditioners Trigger Loss Spikes in Adam

388

05 Jun 2025

GradPower: Powering Gradients for Faster Language Model Pre-Training

251

30 May 2025

Understanding Differential Transformer Unchains Pretrained Self-Attentions

Chaerin Kong

Jiho Jang

Nojun Kwak

586

22 May 2025

New Evidence of the Two-Phase Learning Dynamics of Neural Networks

246

20 May 2025

Towards Quantifying the Hessian Structure of Neural Networks

Zhaorui Dong

Yushun Zhang

Jianfeng Yao

371

05 May 2025

How Effective Can Dropout Be in Multiple Instance Learning ?

540

21 Apr 2025

Enlightenment Period Improving DNN Performance

Tiantian Liu

Meng Wan

Jue Wang

273

02 Apr 2025

Adaptive Unimodal Regulation for Balanced Multimodal Information AcquisitionComputer Vision and Pattern Recognition (CVPR), 2025

316

24 Mar 2025

A Minimalist Example of Edge-of-Stability and Progressive Sharpening

397

04 Mar 2025

The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training

534

26 Feb 2025

Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos

Dayal Singh Kalra

Tianyu He

M. Barkeshli

428

17 Feb 2025

Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks

Pierfrancesco Beneventano

Blake Woodworth

MLT

503

15 Jan 2025

Where Do Large Learning Rates Lead Us?Neural Information Processing Systems (NeurIPS), 2024

375

29 Oct 2024

Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in TrainingInternational Conference on Learning Representations (ICLR), 2024

596

14 Oct 2024

Fisher Information guided Purification against Backdoor AttacksConference on Computer and Communications Security (CCS), 2024

385

01 Sep 2024

Can Optimization Trajectories Explain Multi-Task Transfer?

David Mueller

Mark Dredze

Nicholas Andrews

490

26 Aug 2024

Stepping on the Edge: Curvature Aware Learning Rate Tuners

455

08 Jul 2024

Flat Posterior Does Matter For Bayesian Model Averaging

921

21 Jun 2024

Does SGD really happen in tiny subspaces?

Minhak Song

Kwangjun Ahn

Chulhee Yun

627

25 May 2024

SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data

400

22 May 2024

Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks

Xin-Chun Li

391

21 May 2024

High dimensional analysis reveals conservative sharpening and a stochastic edge of stability

Atish Agarwala

Jeffrey Pennington

438

30 Apr 2024

Unifying Low Dimensional Observations in Deep Learning Through the Deep Linear Unconstrained Feature Model

Connall Garrod

Jonathan P. Keating

408

09 Apr 2024

Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning

Lorenzo Noci

Alexandru Meterez

Thomas Hofmann

Antonio Orvieto

258

27 Feb 2024

Deconstructing the Goldilocks Zone of Neural Network InitializationInternational Conference on Machine Learning (ICML), 2024

Artem Vysogorets

Anna Dawid

Julia Kempe

280

05 Feb 2024

A Precise Characterization of SGD Stability Using Loss Surface GeometryInternational Conference on Learning Representations (ICLR), 2024

268

22 Jan 2024

Investigation into the Training Dynamics of Learned OptimizersInternational Conference on Agents and Artificial Intelligence (ICAART), 2023

Jan Sobotka

Petr Simánek

Daniel Vasata

279

12 Dec 2023

Achieving Margin Maximization Exponentially Fast via Progressive Norm RescalingInternational Conference on Machine Learning (ICML), 2023

Mingze Wang

Zeping Min

Lei Wu

566

24 Nov 2023

A Coefficient Makes SVRG Effective

354

09 Nov 2023

Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization

Elan Rosenfeld

Andrej Risteski

295

07 Nov 2023

An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

Zhao Song

Chiwun Yang

349

17 Oct 2023

From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression

Xuxing Chen

Krishnakumar Balasubramanian

Promit Ghosal

Bhavya Agrawalla

288

02 Oct 2023

Enhancing Sharpness-Aware Optimization Through Variance SuppressionNeural Information Processing Systems (NeurIPS), 2023

Bingcong Li

G. Giannakis

AAML

568

27 Sep 2023

Sharpness-Aware Minimization and the Edge of StabilityJournal of machine learning research (JMLR), 2023

Philip M. Long

Peter L. Bartlett

AAML

750

21 Sep 2023

Towards Last-layer Retraining for Group Robustness with Fewer AnnotationsNeural Information Processing Systems (NeurIPS), 2023

Tyler LaBonte

Vidya Muthukumar

Abhishek Kumar

444

15 Sep 2023

Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMsInternational Conference on Learning Representations (ICLR), 2023

669

116

13 Sep 2023