The Geometry of Sign Gradient Descent

19 February 2020

Nicolas Le Roux

Papers citing "The Geometry of Sign Gradient Descent"

27 / 27 papers shown

A Tale of Two Geometries: Adaptive Optimizers and Non-Euclidean Descent

227

25 Nov 2025

Non-Euclidean Broximal Point Method: A Blueprint for Geometry-Aware Optimization

Kaja Gruntkowska

Peter Richtárik

208

01 Oct 2025

Per-example gradients: a new frontier for understanding and improving optimizers

Vincent Roulet

Atish Agarwala

153

30 Sep 2025

$Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization$

Stacey: Promoting Stochastic Steepest Descent via Accelerated

\ell_p

-Smooth Nonconvex Optimization

250

07 Jun 2025

Generalized Gradient Norm Clipping & Non-Euclidean

(L_0,L_1)

345

02 Jun 2025

LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning

275

24 May 2025

Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)

434

19 May 2025

FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

332

12 Nov 2024

Understanding Adam Requires Better Rotation Dependent Assumptions

Tianyue H. Zhang

Lucas Maes

Alexia Jolicoeur-Martineau

Damien Scieur

Simon Lacoste-Julien

Charles Guille-Escuret

323

25 Oct 2024

A Mirror Descent Perspective of Smoothed Sign DescentConference on Uncertainty in Artificial Intelligence (UAI), 2024

Shuyang Wang

Diego Klabjan

324

18 Oct 2024

Faster Acceleration for Steepest DescentAnnual Conference Computational Learning Theory (COLT), 2024

Site Bai

Brian Bullins

ODL

406

28 Sep 2024

Deconstructing What Makes a Good Optimizer for Language Models

473

10 Jul 2024

A New Perspective on Shampoo's Preconditioner

337

25 Jun 2024

Large Batch Analysis for Adagrad Under Anisotropic Smoothness

Yuxing Liu

Boyao Wang

Tong Zhang

283

21 Jun 2024

The Implicit Bias of Adam on Separable DataNeural Information Processing Systems (NeurIPS), 2024

300

15 Jun 2024

Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning

Mohamed Elsayed

Homayoon Farrahi

Felix Dangel

A. Rupam Mahmood

454

05 Jun 2024

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

379

29 Feb 2024

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

561

11 Sep 2023

On the Implicit Bias of AdamInternational Conference on Machine Learning (ICML), 2023

M. D. Cattaneo

Jason M. Klusowski

Boris Shigida

474

31 Aug 2023

On Neural Network approximation of ideal adversarial attack and convergence of adversarial trainingSIAM Journal on Mathematics of Data Science (SIMODS), 2023

Rajdeep Haldar

Qifan Song

AAML

192

30 Jul 2023

SignSVRG: fixing SignSGD via variance reduction

Evgenii Chzhen

S. Schechtman

332

22 May 2023

Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might BeInternational Conference on Learning Representations (ICLR), 2023

296

109

27 Apr 2023

Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noiseSIAM Journal on Optimization (SIAM J. Optim.), 2022

220

06 Apr 2022

Revealing and Protecting Labels in Distributed TrainingNeural Information Processing Systems (NeurIPS), 2021

Trung D. Q. Dang

Om Thakkar

Swaroop Indra Ramaswamy

Rajiv Mathews

Peter Chin

Franccoise Beaufays

120

31 Oct 2021

Hard to Forget: Poisoning Attacks on Certified Machine Unlearning

Neil G. Marchant

Benjamin I. P. Rubinstein

Scott Alfeld

MU AAML

262

17 Sep 2021

On Faster Convergence of Scaled Sign Gradient Descent

191

04 Sep 2021

Online Training of Spiking Recurrent Neural Networks with Phase-Change Memory Synapses

314

04 Aug 2021