Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1802.04434
Cited By
v1
v2
v3 (latest)
signSGD: Compressed Optimisation for Non-Convex Problems
13 February 2018
Jeremy Bernstein
Yu Wang
Kamyar Azizzadenesheli
Anima Anandkumar
FedML
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (86★)
Papers citing
"signSGD: Compressed Optimisation for Non-Convex Problems"
50 / 592 papers shown
Title
Hi-SAFE: Hierarchical Secure Aggregation for Lightweight Federated Learning
Hyeong-Gun Joo
Songnam Hong
Seunghwan Lee
Dong-joon Shin
FedML
310
0
0
24 Nov 2025
Weight-sparse transformers have interpretable circuits
Leo Gao
Achyuta Rajaram
Jacob Coxon
Soham V. Govande
Bowen Baker
Dan Mossing
MILM
144
2
0
17 Nov 2025
VecComp: Vector Computing via MIMO Digital Over-the-Air Computation
Saeed Razavikia
José Hélio da Cruz Júnior
Carlo Fischione
52
0
0
04 Nov 2025
Isotropic Curvature Model for Understanding Deep Learning Optimization: Is Gradient Orthogonalization Optimal?
Weijie Su
68
0
0
01 Nov 2025
Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
Beomhan Baek
Minhak Song
Chulhee Yun
83
0
0
30 Oct 2025
What Really Matters in Matrix-Whitening Optimizers?
Kevin Frans
Pieter Abbeel
Sergey Levine
84
1
0
28 Oct 2025
How Muon's Spectral Design Benefits Generalization: A Study on Imbalanced Data
Bhavya Vasudeva
Puneesh Deora
Yize Zhao
Vatsal Sharan
Christos Thrampoulidis
64
0
0
27 Oct 2025
Unbiased Gradient Low-Rank Projection
Rui Pan
Yang Luo
Yuxing Liu
Yang You
Tong Zhang
100
0
0
20 Oct 2025
Adam or Gauss-Newton? A Comparative Study In Terms of Basis Alignment and SGD Noise
Bingbin Liu
Rachit Bansal
Depen Morwani
Nikhil Vyas
David Alvarez-Melis
Sham Kakade
104
1
0
15 Oct 2025
Cautious Weight Decay
Lizhang Chen
Jonathan Li
Kaizhao Liang
Baiyu Su
Cong Xie
Nuo Wang Pierse
Chen Liang
Ni Lao
Qiang Liu
68
0
0
14 Oct 2025
Preconditioned Norms: A Unified Framework for Steepest Descent, Quasi-Newton and Adaptive Methods
Andrey Veprikov
Arman Bolatov
Samuel Horváth
Aleksandr Beznosikov
Martin Takáč
Slavomír Hanzely
ODL
229
0
0
12 Oct 2025
Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models
Filippo Rinaldi
Aniello Panariello
Giacomo Salici
Fengyuan Liu
Marco Ciccone
Angelo Porrello
Simone Calderara
131
0
0
07 Oct 2025
Randomized Gradient Subspaces for Efficient Large Language Model Training
Sahar Rajabi
Nayeema Nonta
Samanvay Vajpayee
Sirisha Rambhatla
68
0
0
02 Oct 2025
Non-Euclidean Broximal Point Method: A Blueprint for Geometry-Aware Optimization
Kaja Gruntkowska
Peter Richtárik
116
2
0
01 Oct 2025
Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning
Yicheng Lang
Yihua Zhang
Chongyu Fan
Changsheng Wang
Jinghan Jia
Sijia Liu
MU
309
0
0
01 Oct 2025
Per-example gradients: a new frontier for understanding and improving optimizers
Vincent Roulet
Atish Agarwala
92
1
0
30 Sep 2025
Binary Sparse Coding for Interpretability
Lucia Quirke
Stepan Shabalin
Nora Belrose
52
1
0
29 Sep 2025
Learned Digital Codes for Over-the-Air Federated Learning
Antonio Tarizzo
M. Kazemi
Deniz Gündüz
80
0
0
20 Sep 2025
Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds
Mert Gurbuzbalaban
Yasa Syed
Necdet Serhat Aybat
147
0
0
17 Sep 2025
MERIT: Maximum-normalized Element-wise Ratio for Language Model Large-batch Training
Yang Luo
Zangwei Zheng
Ziheng Qin
Zirui Zhu
Yong Liu
Yang You
ALM
68
0
0
28 Aug 2025
ANO : Faster is Better in Noisy Landscape
Adrien Kegreisz
ODL
319
0
0
25 Aug 2025
Deploying Models to Non-participating Clients in Federated Learning without Fine-tuning: A Hypernetwork-based Approach
Yuhao Zhou
Jindi Lv
Yuxin Tian
Dan Si
Qing Ye
Jiancheng Lv
FedML
92
0
0
18 Aug 2025
Fed-DPRoC:Communication-Efficient Differentially Private and Robust Federated Learning
Yue Xia
Tayyebeh Jahani-Nezhad
Rawad Bitar
FedML
100
0
0
18 Aug 2025
Communication-Efficient Distributed Asynchronous ADMM
Sagar Shrestha
FedML
72
0
0
17 Aug 2025
Convergence Analysis of the Lion Optimizer in Centralized and Distributed Settings
Wei Jiang
Lijun Zhang
92
0
0
17 Aug 2025
Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex Optimization
Wei Liu
Anweshit Panda
Ujwal Pandey
Christopher Brissette
Yikang Shen
George M. Slota
Naigang Wang
Jie Chen
Yangyang Xu
64
0
0
07 Aug 2025
Efficient Machine Unlearning via Influence Approximation
Jiawei Liu
Chenwang Wu
Defu Lian
Enhong Chen
MU
111
0
0
31 Jul 2025
DeCo-SGD: Joint Optimization of Delay Staleness and Gradient Compression Ratio for Distributed SGD
Rongwei Lu
Jingyan Jiang
Chunyang Li
Haotian Dong
Xingguang Wei
Delin Cai
Zhi Wang
117
1
0
23 Jul 2025
DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD
Xianbiao Qi
Marco Chen
Wenjie Xiao
Jiaquan Ye
Yelin He
Chun-Guang Li
Zhouchen Lin
OffRL
101
0
0
23 Jul 2025
Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees
Chuyan Chen
Yutong He
Pengrui Li
Weichen Jia
Kun Yuan
427
4
0
11 Jul 2025
Muon Optimizes Under Spectral Norm Constraints
Lizhang Chen
Jonathan Li
Qiang Liu
380
15
0
18 Jun 2025
Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling
Teodora Srećković
Jonas Geiping
Antonio Orvieto
MoE
155
5
0
14 Jun 2025
ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations
Amirreza Rouhi
Solmaz Arezoomandan
Knut Peterson
Joseph T. Woods
David Han
VLM
143
11
0
10 Jun 2025
A Stable Whitening Optimizer for Efficient Neural Network Training
Kevin Frans
Sergey Levine
Pieter Abbeel
219
3
0
08 Jun 2025
LADSG: Label-Anonymized Distillation and Similar Gradient Substitution for Label Privacy in Vertical Federated Learning
Zeyu Yan
Yifei Yao
Xuanbing Wen
Shixiong Zhang
Juli Zhang
Kai Fan
AAML
229
0
0
07 Jun 2025
Tight analyses of first-order methods with error feedback
Daniel Berg Thomsen
Adrien B. Taylor
Hadrien Hendrikx
199
1
0
05 Jun 2025
FERRET: Private Deep Learning Faster And Better Than DPSGD
David Zagardo
FedML
83
0
0
04 Jun 2025
Lions and Muons: Optimization via Stochastic Frank-Wolfe
Maria-Eleni Sfyraki
Jun-Kun Wang
578
14
0
04 Jun 2025
Provable Reinforcement Learning from Human Feedback with an Unknown Link Function
Qining Zhang
Lei Ying
196
0
0
03 Jun 2025
Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism
Sameera Ramasinghe
Thalaiyasingam Ajanthan
Gil Avraham
Yan Zuo
Alexander Long
GNN
309
0
0
02 Jun 2025
Why Gradients Rapidly Increase Near the End of Training
Aaron Defazio
120
6
0
02 Jun 2025
Taming LLMs by Scaling Learning Rates with Gradient Grouping
Siyuan Li
Juanxi Tian
Zedong Wang
Xin Jin
Zicheng Liu
Wentao Zhang
Dan Xu
182
0
0
01 Jun 2025
On the Interaction of Noise, Compression Role, and Adaptivity under
(
L
0
,
L
1
)
(L_0, L_1)
(
L
0
,
L
1
)
-Smoothness: An SDE-based Approach
Enea Monzio Compagnoni
Rustem Islamov
Antonio Orvieto
Eduard A. Gorbunov
114
1
0
30 May 2025
GradPower: Powering Gradients for Faster Language Model Pre-Training
Mingze Wang
Jinbo Wang
Jiaqi Zhang
Wei Wang
Peng Pei
Xunliang Cai
Weinan E
Lei Wu
169
0
0
30 May 2025
SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training
Yehonathan Refael
Guy Smorodinsky
Tom Tirer
Ofir Lindenbaum
133
5
0
30 May 2025
Efficient AllReduce with Stragglers
Arjun Devraj
Eric Ding
Abhishek Vijaya Kumar
Robert Kleinberg
Rachee Singh
220
0
0
29 May 2025
In Search of Adam's Secret Sauce
Antonio Orvieto
Robert Gower
251
10
0
27 May 2025
Convergence Analysis of Asynchronous Federated Learning with Gradient Compression for Non-Convex Optimization
Diying Yang
Yingwei Hou
Danyang Xiao
FedML
254
0
0
28 Apr 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer
Soham Sane
ODL
352
0
0
22 Apr 2025
Tin-Tin: Towards Tiny Learning on Tiny Devices with Integer-based Neural Network Training
Yi Hu
Jinhang Zuo
Eddie Zhang
Bob Iannucci
Carlee Joe-Wong
223
1
0
13 Apr 2025
1
2
3
4
...
10
11
12
Next