Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.03500
Cited By
DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification
7 July 2023
Daegun Yoon
Sangyoon Oh
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification"
3 / 3 papers shown
Title
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
Chia-Yu Chen
Jiamin Ni
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
...
Naigang Wang
Swagath Venkataramani
Vijayalakshmi Srinivasan
Wei Zhang
K. Gopalakrishnan
27
65
0
21 Apr 2021
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems
A. Abdelmoniem
Ahmed Elzanaty
Mohamed-Slim Alouini
Marco Canini
49
73
0
26 Jan 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1