ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.08056
  4. Cited By
The Geometry of Sign Gradient Descent

The Geometry of Sign Gradient Descent

19 February 2020
Lukas Balles
Fabian Pedregosa
Nicolas Le Roux
    ODL
ArXiv (abs)PDFHTML

Papers citing "The Geometry of Sign Gradient Descent"

27 / 27 papers shown
A Tale of Two Geometries: Adaptive Optimizers and Non-Euclidean Descent
A Tale of Two Geometries: Adaptive Optimizers and Non-Euclidean Descent
Shuo Xie
Tianhao Wang
Beining Wu
Zhiyuan Li
227
2
0
25 Nov 2025
Non-Euclidean Broximal Point Method: A Blueprint for Geometry-Aware Optimization
Non-Euclidean Broximal Point Method: A Blueprint for Geometry-Aware Optimization
Kaja Gruntkowska
Peter Richtárik
208
2
0
01 Oct 2025
Per-example gradients: a new frontier for understanding and improving optimizers
Per-example gradients: a new frontier for understanding and improving optimizers
Vincent Roulet
Atish Agarwala
153
1
0
30 Sep 2025
Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization
Stacey: Promoting Stochastic Steepest Descent via Accelerated ℓp\ell_pℓp​-Smooth Nonconvex Optimization
Xinyu Luo
Cedar Site Bai
Bolian Li
Petros Drineas
Ruqi Zhang
Brian Bullins
250
2
0
07 Jun 2025
Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness
Generalized Gradient Norm Clipping & Non-Euclidean (L0,L1)(L_0,L_1)(L0​,L1​)-Smoothness
Thomas Pethick
Wanyun Xie
Mete Erdogan
Kimon Antonakopoulos
Tony Silveti-Falls
Volkan Cevher
345
7
0
02 Jun 2025
LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning
LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning
Junyu Chen
Junzhuo Li
Zhen Peng
Wenjie Wang
Yuxiang Ren
Long Shi
Xuming Hu
MQ
275
3
0
24 May 2025
Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)
Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)
Artem Riabinin
Egor Shulgin
Kaja Gruntkowska
Peter Richtárik
AI4CE
434
28
0
19 May 2025
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
Philip Zmushko
Aleksandr Beznosikov
Martin Takáč
Samuel Horváth
332
6
0
12 Nov 2024
Understanding Adam Requires Better Rotation Dependent Assumptions
Understanding Adam Requires Better Rotation Dependent Assumptions
Tianyue H. Zhang
Lucas Maes
Alexia Jolicoeur-Martineau
Alexia Jolicoeur-Martineau
Damien Scieur
Damien Scieur
Simon Lacoste-Julien
Charles Guille-Escuret
323
9
0
25 Oct 2024
A Mirror Descent Perspective of Smoothed Sign Descent
A Mirror Descent Perspective of Smoothed Sign DescentConference on Uncertainty in Artificial Intelligence (UAI), 2024
Shuyang Wang
Diego Klabjan
324
2
0
18 Oct 2024
Faster Acceleration for Steepest Descent
Faster Acceleration for Steepest DescentAnnual Conference Computational Learning Theory (COLT), 2024
Site Bai
Brian Bullins
ODL
406
1
0
28 Sep 2024
Deconstructing What Makes a Good Optimizer for Language Models
Deconstructing What Makes a Good Optimizer for Language Models
Rosie Zhao
Depen Morwani
David Brandfonbrener
Nikhil Vyas
Sham Kakade
473
40
0
10 Jul 2024
A New Perspective on Shampoo's Preconditioner
A New Perspective on Shampoo's Preconditioner
Depen Morwani
Itai Shapira
Nikhil Vyas
Eran Malach
Sham Kakade
Lucas Janson
337
38
0
25 Jun 2024
Large Batch Analysis for Adagrad Under Anisotropic Smoothness
Large Batch Analysis for Adagrad Under Anisotropic Smoothness
Yuxing Liu
Boyao Wang
Tong Zhang
283
0
0
21 Jun 2024
The Implicit Bias of Adam on Separable Data
The Implicit Bias of Adam on Separable DataNeural Information Processing Systems (NeurIPS), 2024
Chenyang Zhang
Difan Zou
Yuan Cao
AI4CE
300
22
0
15 Jun 2024
Revisiting Scalable Hessian Diagonal Approximations for Applications in
  Reinforcement Learning
Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning
Mohamed Elsayed
Homayoon Farrahi
Felix Dangel
A. Rupam Mahmood
454
7
0
05 Jun 2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent
  on Language Models
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
Frederik Kunstner
Robin Yadav
Alan Milligan
Mark Schmidt
Alberto Bietti
379
70
0
29 Feb 2024
Optimize Weight Rounding via Signed Gradient Descent for the
  Quantization of LLMs
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Wenhua Cheng
Weiwei Zhang
Haihao Shen
Yiyang Cai
Xin He
Kaokao Lv
Yi. Liu
MQ
561
35
0
11 Sep 2023
On the Implicit Bias of Adam
On the Implicit Bias of AdamInternational Conference on Machine Learning (ICML), 2023
M. D. Cattaneo
Jason M. Klusowski
Boris Shigida
474
25
0
31 Aug 2023
On Neural Network approximation of ideal adversarial attack and
  convergence of adversarial training
On Neural Network approximation of ideal adversarial attack and convergence of adversarial trainingSIAM Journal on Mathematics of Data Science (SIMODS), 2023
Rajdeep Haldar
Qifan Song
AAML
192
0
0
30 Jul 2023
SignSVRG: fixing SignSGD via variance reduction
SignSVRG: fixing SignSGD via variance reduction
Evgenii Chzhen
S. Schechtman
332
6
0
22 May 2023
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on
  Transformers, but Sign Descent Might Be
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might BeInternational Conference on Learning Representations (ICLR), 2023
Frederik Kunstner
Jacques Chen
J. Lavington
Mark Schmidt
296
109
0
27 Apr 2023
Nonlinear gradient mappings and stochastic optimization: A general
  framework with applications to heavy-tail noise
Nonlinear gradient mappings and stochastic optimization: A general framework with applications to heavy-tail noiseSIAM Journal on Optimization (SIAM J. Optim.), 2022
D. Jakovetić
Dragana Bajović
Anit Kumar Sahu
S. Kar
Nemanja Milošević
Dusan Stamenkovic
220
22
0
06 Apr 2022
Revealing and Protecting Labels in Distributed Training
Revealing and Protecting Labels in Distributed TrainingNeural Information Processing Systems (NeurIPS), 2021
Trung D. Q. Dang
Om Thakkar
Swaroop Indra Ramaswamy
Rajiv Mathews
Peter Chin
Franccoise Beaufays
120
31
0
31 Oct 2021
Hard to Forget: Poisoning Attacks on Certified Machine Unlearning
Hard to Forget: Poisoning Attacks on Certified Machine Unlearning
Neil G. Marchant
Benjamin I. P. Rubinstein
Scott Alfeld
MUAAML
262
92
0
17 Sep 2021
On Faster Convergence of Scaled Sign Gradient Descent
On Faster Convergence of Scaled Sign Gradient Descent
Xiuxian Li
Kuo-Yi Lin
Li Li
Yiguang Hong
Jie-bin Chen
ODL
191
21
0
04 Sep 2021
Online Training of Spiking Recurrent Neural Networks with Phase-Change
  Memory Synapses
Online Training of Spiking Recurrent Neural Networks with Phase-Change Memory Synapses
Yiğit Demirağ
Charlotte Frenkel
Melika Payvand
Giacomo Indiveri
314
18
0
04 Aug 2021
1
Page 1 of 1