ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.12460
  4. Cited By
Adaptive Gradient Quantization for Data-Parallel SGD

Adaptive Gradient Quantization for Data-Parallel SGD

Neural Information Processing Systems (NeurIPS), 2020
23 October 2020
Fartash Faghri
Iman Tabrizian
I. Markov
Dan Alistarh
Daniel M. Roy
Ali Ramezani-Kebrya
    MQ
ArXiv (abs)PDFHTML

Papers citing "Adaptive Gradient Quantization for Data-Parallel SGD"

41 / 41 papers shown
Layer-wise Quantization for Quantized Optimistic Dual Averaging
Layer-wise Quantization for Quantized Optimistic Dual Averaging
Anh Duc Nguyen
Ilia Markov
Frank Zhengqing Wu
Ali Ramezani-Kebrya
Kimon Antonakopoulos
Dan Alistarh
Volkan Cevher
MQ
255
1
0
20 May 2025
Addressing Label Shift in Distributed Learning via Entropy Regularization
Addressing Label Shift in Distributed Learning via Entropy RegularizationInternational Conference on Learning Representations (ICLR), 2025
Zhiyuan Wu
Changkyu Choi
Xiangcheng Cao
Volkan Cevher
Ali Ramezani-Kebrya
375
0
0
04 Feb 2025
SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data
  Parallelism for LLM Training
SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM TrainingNeural Information Processing Systems (NeurIPS), 2024
Jinda Jia
Cong Xie
Hanlin Lu
Daoce Wang
Hao Feng
...
Baixi Sun
Yanghua Peng
Zhi-Li Zhang
Xin Liu
Dingwen Tao
MQ
274
10
0
20 Oct 2024
Differentiable Weightless Neural Networks
Differentiable Weightless Neural NetworksInternational Conference on Machine Learning (ICML), 2024
Alan T. L. Bacellar
Zachary Susskind
Mauricio Breternitz Jr.
E. John
L. John
P. Lima
F. M. G. França
598
22
0
14 Oct 2024
FedFQ: Federated Learning with Fine-Grained Quantization
FedFQ: Federated Learning with Fine-Grained Quantization
Haowei Li
Weiying Xie
Hangyu Ye
Haonan Qin
Shuran Ma
Yunsong Li
FedMLMQ
168
3
0
16 Aug 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A
  Comprehensive Survey
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
Feng Liang
Zhen Zhang
Haifeng Lu
Victor C. M. Leung
Yanyi Guo
Xiping Hu
GNN
347
24
0
09 Apr 2024
Lightweight Deep Learning for Resource-Constrained Environments: A
  Survey
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
359
151
0
08 Apr 2024
Optimal and Near-Optimal Adaptive Vector Quantization
Optimal and Near-Optimal Adaptive Vector Quantization
Ran Ben-Basat
Y. Ben-Itzhak
Michael Mitzenmacher
S. Vargaftik
MQ
205
1
0
05 Feb 2024
Contractive error feedback for gradient compression
Contractive error feedback for gradient compression
Bingcong Li
Shuai Zheng
Parameswaran Raman
Anshumali Shrivastava
G. Giannakis
195
0
0
13 Dec 2023
Communication Compression for Byzantine Robust Learning: New Efficient
  Algorithms and Improved Rates
Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates
Ahmad Rammal
Kaja Gruntkowska
Nikita Fedin
Eduard A. Gorbunov
Peter Richtárik
331
11
0
15 Oct 2023
CORE: Common Random Reconstruction for Distributed Optimization with
  Provable Low Communication Complexity
CORE: Common Random Reconstruction for Distributed Optimization with Provable Low Communication Complexity
Pengyun Yue
Hanzheng Zhao
Cong Fang
Di He
Liwei Wang
Zhouchen Lin
Song-Chun Zhu
201
1
0
23 Sep 2023
Distributed Extra-gradient with Optimal Complexity and Communication
  Guarantees
Distributed Extra-gradient with Optimal Complexity and Communication GuaranteesInternational Conference on Learning Representations (ICLR), 2023
Ali Ramezani-Kebrya
Kimon Antonakopoulos
Igor Krawczuk
Justin Deschenaux
Volkan Cevher
241
4
0
17 Aug 2023
Self-Distilled Quantization: Achieving High Compression Rates in
  Transformer-Based Language Models
Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
James OÑeill
Sourav Dutta
VLMMQ
172
1
0
12 Jul 2023
Federated Learning under Covariate Shifts with Generalization Guarantees
Federated Learning under Covariate Shifts with Generalization Guarantees
Ali Ramezani-Kebrya
Fanghui Liu
Thomas Pethick
Grigorios G. Chrysos
Volkan Cevher
FedMLOOD
319
12
0
08 Jun 2023
Fast Optimal Locally Private Mean Estimation via Random Projections
Fast Optimal Locally Private Mean Estimation via Random ProjectionsNeural Information Processing Systems (NeurIPS), 2023
Hilal Asi
Vitaly Feldman
Jelani Nelson
Huy Le Nguyen
Kunal Talwar
FedML
253
15
0
07 Jun 2023
Communication-Efficient Design for Quantized Decentralized Federated
  Learning
Communication-Efficient Design for Quantized Decentralized Federated LearningIEEE Transactions on Signal Processing (IEEE TSP), 2023
Lixing Chen
Wei Liu
Yunfei Chen
Weidong Wang
FedMLMQ
273
20
0
15 Mar 2023
FedREP: A Byzantine-Robust, Communication-Efficient and
  Privacy-Preserving Framework for Federated Learning
FedREP: A Byzantine-Robust, Communication-Efficient and Privacy-Preserving Framework for Federated Learning
Yi-Rui Yang
Kun Wang
Wulu Li
FedML
238
6
0
09 Mar 2023
Quantized Distributed Training of Large Models with Convergence
  Guarantees
Quantized Distributed Training of Large Models with Convergence GuaranteesInternational Conference on Machine Learning (ICML), 2023
I. Markov
Adrian Vladu
Qi Guo
Dan Alistarh
MQ
267
15
0
05 Feb 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware
  Communication Compression
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication CompressionInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2023
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
267
39
0
24 Jan 2023
Adaptive Compression for Communication-Efficient Distributed Training
Adaptive Compression for Communication-Efficient Distributed Training
Maksim Makarenko
Elnur Gasanov
Rustem Islamov
Abdurakhmon Sadiev
Peter Richtárik
292
18
0
31 Oct 2022
L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and
  Accurate Deep Learning
L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning
Mohammadreza Alimohammadi
I. Markov
Elias Frantar
Dan Alistarh
215
4
0
31 Oct 2022
lo-fi: distributed fine-tuning without communication
lo-fi: distributed fine-tuning without communication
Mitchell Wortsman
Suchin Gururangan
Shen Li
Ali Farhadi
Ludwig Schmidt
Michael G. Rabbat
Ari S. Morcos
346
24
0
19 Oct 2022
Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep
  Learning in a Supercomputing Environment
Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment
Daegun Yoon
Sangyoon Oh
188
0
0
18 Sep 2022
MixTailor: Mixed Gradient Aggregation for Robust Learning Against
  Tailored Attacks
MixTailor: Mixed Gradient Aggregation for Robust Learning Against Tailored Attacks
Ali Ramezani-Kebrya
Iman Tabrizian
Fartash Faghri
P. Popovski
AAMLFedML
144
7
0
16 Jul 2022
Fine-tuning Language Models over Slow Networks using Activation
  Compression with Guarantees
Fine-tuning Language Models over Slow Networks using Activation Compression with GuaranteesNeural Information Processing Systems (NeurIPS), 2022
Jue Wang
Binhang Yuan
Luka Rimanic
Yongjun He
Tri Dao
Beidi Chen
Christopher Ré
Ce Zhang
AI4CE
399
18
0
02 Jun 2022
Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker
  Assumptions and Communication Compression as a Cherry on the Top
Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top
Eduard A. Gorbunov
Samuel Horváth
Peter Richtárik
Gauthier Gidel
AAML
283
0
0
01 Jun 2022
Communication-Efficient Distributed Learning via Sparse and Adaptive
  Stochastic Gradient
Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient
Xiaoge Deng
Dongsheng Li
Tao Sun
Xicheng Lu
FedML
255
1
0
08 Dec 2021
CGX: Adaptive System Support for Communication-Efficient Deep Learning
CGX: Adaptive System Support for Communication-Efficient Deep Learning
I. Markov
Hamidreza Ramezanikebrya
Dan Alistarh
GNN
323
5
0
16 Nov 2021
Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI
Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AIIEEE Transactions on Knowledge and Data Engineering (TKDE), 2021
Jiangchao Yao
Shengyu Zhang
Yang Yao
Feng Wang
Jianxin Ma
...
Kun Kuang
Chao-Xiang Wu
Leilei Gan
Jingren Zhou
Hongxia Yang
380
139
0
11 Nov 2021
What Do We Mean by Generalization in Federated Learning?
What Do We Mean by Generalization in Federated Learning?
Honglin Yuan
Warren Morningstar
Lin Ning
K. Singhal
OODFedML
284
94
0
27 Oct 2021
NeRV: Neural Representations for Videos
NeRV: Neural Representations for Videos
Hao Chen
Bo He
Hanyu Wang
Yixuan Ren
Ser-Nam Lim
Abhinav Shrivastava
135
327
0
26 Oct 2021
Fundamental limits of over-the-air optimization: Are analog schemes
  optimal?
Fundamental limits of over-the-air optimization: Are analog schemes optimal?
Shubham K. Jha
Prathamesh Mayekar
Himanshu Tyagi
163
10
0
11 Sep 2021
A Distributed SGD Algorithm with Global Sketching for Deep Learning
  Training Acceleration
A Distributed SGD Algorithm with Global Sketching for Deep Learning Training Acceleration
Lingfei Dai
Boyu Diao
Chao Li
Yongjun Xu
206
5
0
13 Aug 2021
Theoretically Better and Numerically Faster Distributed Optimization
  with Smoothness-Aware Quantization Techniques
Theoretically Better and Numerically Faster Distributed Optimization with Smoothness-Aware Quantization TechniquesNeural Information Processing Systems (NeurIPS), 2021
Bokun Wang
M. Safaryan
Peter Richtárik
MQ
193
11
0
07 Jun 2021
Moshpit SGD: Communication-Efficient Decentralized Training on
  Heterogeneous Unreliable Devices
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable DevicesNeural Information Processing Systems (NeurIPS), 2021
Max Ryabinin
Eduard A. Gorbunov
Vsevolod Plokhotnyuk
Gennady Pekhimenko
340
44
0
04 Mar 2021
Lossless Compression of Efficient Private Local Randomizers
Lossless Compression of Efficient Private Local RandomizersInternational Conference on Machine Learning (ICML), 2021
Vitaly Feldman
Kunal Talwar
211
43
0
24 Feb 2021
Distributed Online Learning for Joint Regret with Communication
  Constraints
Distributed Online Learning for Joint Regret with Communication ConstraintsInternational Conference on Algorithmic Learning Theory (ALT), 2021
Dirk van der Hoeven
Hédi Hadiji
T. Erven
191
6
0
15 Feb 2021
Adaptive Quantization of Model Updates for Communication-Efficient
  Federated Learning
Adaptive Quantization of Model Updates for Communication-Efficient Federated LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Divyansh Jhunjhunwala
Advait Gadhikar
Gauri Joshi
Yonina C. Eldar
FedMLMQ
214
125
0
08 Feb 2021
DeepReduce: A Sparse-tensor Communication Framework for Distributed Deep
  Learning
DeepReduce: A Sparse-tensor Communication Framework for Distributed Deep Learning
Kelly Kostopoulou
Hang Xu
Aritra Dutta
Xin Li
A. Ntoulas
Panos Kalnis
130
7
0
05 Feb 2021
Wyner-Ziv Estimators for Distributed Mean Estimation with Side
  Information and Optimization
Wyner-Ziv Estimators for Distributed Mean Estimation with Side Information and OptimizationIEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2020
Prathamesh Mayekar
Shubham K. Jha
A. Suresh
Himanshu Tyagi
FedML
269
2
0
24 Nov 2020
Communication-Efficient Distributed Deep Learning: A Comprehensive
  Survey
Communication-Efficient Distributed Deep Learning: A Comprehensive Survey
Zhenheng Tang
Shaoshuai Shi
Wei Wang
Yue Liu
Xiaowen Chu
225
54
0
10 Mar 2020
1