Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.08056
Cited By
The Geometry of Sign Gradient Descent
19 February 2020
Lukas Balles
Fabian Pedregosa
Nicolas Le Roux
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The Geometry of Sign Gradient Descent"
27 / 27 papers shown
A Tale of Two Geometries: Adaptive Optimizers and Non-Euclidean Descent
Shuo Xie
Tianhao Wang
Beining Wu
Zhiyuan Li
227
2
0
25 Nov 2025
Non-Euclidean Broximal Point Method: A Blueprint for Geometry-Aware Optimization
Kaja Gruntkowska
Peter Richtárik
208
2
0
01 Oct 2025
Per-example gradients: a new frontier for understanding and improving optimizers
Vincent Roulet
Atish Agarwala
153
1
0
30 Sep 2025
Stacey: Promoting Stochastic Steepest Descent via Accelerated
ℓ
p
\ell_p
ℓ
p
-Smooth Nonconvex Optimization
Xinyu Luo
Cedar Site Bai
Bolian Li
Petros Drineas
Ruqi Zhang
Brian Bullins
250
2
0
07 Jun 2025
Generalized Gradient Norm Clipping & Non-Euclidean
(
L
0
,
L
1
)
(L_0,L_1)
(
L
0
,
L
1
)
-Smoothness
Thomas Pethick
Wanyun Xie
Mete Erdogan
Kimon Antonakopoulos
Tony Silveti-Falls
Volkan Cevher
345
7
0
02 Jun 2025
LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning
Junyu Chen
Junzhuo Li
Zhen Peng
Wenjie Wang
Yuxiang Ren
Long Shi
Xuming Hu
MQ
275
3
0
24 May 2025
Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)
Artem Riabinin
Egor Shulgin
Kaja Gruntkowska
Peter Richtárik
AI4CE
434
28
0
19 May 2025
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
Philip Zmushko
Aleksandr Beznosikov
Martin Takáč
Samuel Horváth
332
6
0
12 Nov 2024
Understanding Adam Requires Better Rotation Dependent Assumptions
Tianyue H. Zhang
Lucas Maes
Alexia Jolicoeur-Martineau
Alexia Jolicoeur-Martineau
Damien Scieur
Damien Scieur
Simon Lacoste-Julien
Charles Guille-Escuret
323
9
0
25 Oct 2024
A Mirror Descent Perspective of Smoothed Sign Descent
Conference on Uncertainty in Artificial Intelligence (UAI), 2024
Shuyang Wang
Diego Klabjan
324
2
0
18 Oct 2024
Faster Acceleration for Steepest Descent
Annual Conference Computational Learning Theory (COLT), 2024
Site Bai
Brian Bullins
ODL
406
1
0
28 Sep 2024
Deconstructing What Makes a Good Optimizer for Language Models
Rosie Zhao
Depen Morwani
David Brandfonbrener
Nikhil Vyas
Sham Kakade
473
40
0
10 Jul 2024
A New Perspective on Shampoo's Preconditioner
Depen Morwani
Itai Shapira
Nikhil Vyas
Eran Malach
Sham Kakade
Lucas Janson
337
38
0
25 Jun 2024
Large Batch Analysis for Adagrad Under Anisotropic Smoothness
Yuxing Liu
Boyao Wang
Tong Zhang
283
0
0
21 Jun 2024
The Implicit Bias of Adam on Separable Data
Neural Information Processing Systems (NeurIPS), 2024
Chenyang Zhang
Difan Zou
Yuan Cao
AI4CE
300
22
0
15 Jun 2024
Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning
Mohamed Elsayed
Homayoon Farrahi
Felix Dangel
A. Rupam Mahmood
454
7
0
05 Jun 2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Frederik Kunstner
Robin Yadav
Alan Milligan
Mark Schmidt
Alberto Bietti
379
70
0
29 Feb 2024
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Wenhua Cheng
Weiwei Zhang
Haihao Shen
Yiyang Cai
Xin He
Kaokao Lv
Yi. Liu
MQ
561
35
0
11 Sep 2023
On the Implicit Bias of Adam
International Conference on Machine Learning (ICML), 2023
M. D. Cattaneo
Jason M. Klusowski
Boris Shigida
474
25
0
31 Aug 2023
On Neural Network approximation of ideal adversarial attack and convergence of adversarial training
SIAM Journal on Mathematics of Data Science (SIMODS), 2023
Rajdeep Haldar
Qifan Song
AAML
192
0
0
30 Jul 2023
SignSVRG: fixing SignSGD via variance reduction
Evgenii Chzhen
S. Schechtman
332
6
0
22 May 2023
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
International Conference on Learning Representations (ICLR), 2023
Frederik Kunstner
Jacques Chen
J. Lavington
Mark Schmidt
296
109
0
27 Apr 2023
Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noise
SIAM Journal on Optimization (SIAM J. Optim.), 2022
D. Jakovetić
Dragana Bajović
Anit Kumar Sahu
S. Kar
Nemanja Milošević
Dusan Stamenkovic
220
22
0
06 Apr 2022
Revealing and Protecting Labels in Distributed Training
Neural Information Processing Systems (NeurIPS), 2021
Trung D. Q. Dang
Om Thakkar
Swaroop Indra Ramaswamy
Rajiv Mathews
Peter Chin
Franccoise Beaufays
120
31
0
31 Oct 2021
Hard to Forget: Poisoning Attacks on Certified Machine Unlearning
Neil G. Marchant
Benjamin I. P. Rubinstein
Scott Alfeld
MU
AAML
262
92
0
17 Sep 2021
On Faster Convergence of Scaled Sign Gradient Descent
Xiuxian Li
Kuo-Yi Lin
Li Li
Yiguang Hong
Jie-bin Chen
ODL
191
21
0
04 Sep 2021
Online Training of Spiking Recurrent Neural Networks with Phase-Change Memory Synapses
Yiğit Demirağ
Charlotte Frenkel
Melika Payvand
Giacomo Indiveri
314
18
0
04 Aug 2021
1
Page 1 of 1