Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.18313
Cited By
PLUMAGE: Probabilistic Low rank Unbiased Min Variance Gradient Estimator for Efficient Large Model Training
23 May 2025
Matan Haroush
Daniel Soudry
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"PLUMAGE: Probabilistic Low rank Unbiased Min Variance Gradient Estimator for Efficient Large Model Training"
27 / 27 papers shown
Title
CompAct: Compressed Activations for Memory-Efficient LLM Training
Yara Shamshoum
Nitzan Hodos
Yuval Sieradzki
Assaf Schuster
MQ
VLM
107
4
0
20 Oct 2024
Memory-Efficient LLM Training with Online Subspace Descent
Kaizhao Liang
Bo Liu
Lizhang Chen
Qiang Liu
67
15
0
23 Aug 2024
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
Aashiq Muhamed
Oscar Li
David Woodruff
Mona Diab
Virginia Smith
94
13
0
25 Jun 2024
AI and Memory Wall
A. Gholami
Z. Yao
Sehoon Kim
Coleman Hooper
Michael W. Mahoney
Kurt Keutzer
84
161
0
21 Mar 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao
Zhenyu Zhang
Beidi Chen
Zhangyang Wang
A. Anandkumar
Yuandong Tian
106
229
0
06 Mar 2024
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Yongchang Hao
Yanshuai Cao
Lili Mou
90
55
0
05 Feb 2024
Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators
Yaniv Blumenfeld
Itay Hubara
Daniel Soudry
76
4
0
25 Jan 2024
QLoRA: Efficient Finetuning of Quantized LLMs
Tim Dettmers
Artidoro Pagnoni
Ari Holtzman
Luke Zettlemoyer
ALM
159
2,638
0
23 May 2023
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
Yanli Zhao
Andrew Gu
R. Varma
Liangchen Luo
Chien-chin Huang
...
Bernard Nguyen
Geeta Chauhan
Y. Hao
Ajit Mathews
Shen Li
FedML
MoE
109
351
0
21 Apr 2023
8-bit Optimizers via Block-wise Quantization
Tim Dettmers
M. Lewis
Sam Shleifer
Luke Zettlemoyer
MQ
150
305
0
06 Oct 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
624
10,625
0
17 Jun 2021
Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks
Itay Hubara
Brian Chmiel
Moshe Island
Ron Banner
S. Naor
Daniel Soudry
124
119
0
16 Feb 2021
GLU Variants Improve Transformer
Noam M. Shazeer
177
1,026
0
12 Feb 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
602
20,418
0
23 Oct 2019
Root Mean Square Layer Normalization
Biao Zhang
Rico Sennrich
119
765
0
16 Oct 2019
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Samyam Rajbhandari
Jeff Rasley
Olatunji Ruwase
Yuxiong He
ALM
AI4CE
90
922
0
04 Oct 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
363
1,925
0
17 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
782
24,613
0
26 Jul 2019
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization
Thijs Vogels
Sai Praneeth Karimireddy
Martin Jaggi
105
322
0
31 May 2019
Fisher Information and Natural Gradient Learning of Random Deep Networks
S. Amari
Ryo Karakida
Masafumi Oizumi
71
36
0
22 Aug 2018
ATOMO: Communication-efficient Learning via Atomic Sparsification
Hongyi Wang
Scott Sievert
Zachary B. Charles
Shengchao Liu
S. Wright
Dimitris Papailiopoulos
95
356
0
11 Jun 2018
Scalable Methods for 8-bit Training of Neural Networks
Ron Banner
Itay Hubara
Elad Hoffer
Daniel Soudry
MQ
90
340
0
25 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.2K
7,210
0
20 Apr 2018
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Noam M. Shazeer
Mitchell Stern
ODL
96
1,055
0
11 Apr 2018
Shampoo: Preconditioned Stochastic Tensor Optimization
Vineet Gupta
Tomer Koren
Y. Singer
ODL
115
226
0
26 Feb 2018
Sparse Communication for Distributed Gradient Descent
Alham Fikri Aji
Kenneth Heafield
92
744
0
17 Apr 2017
Variance Reduction in SGD by Distributed Importance Sampling
Guillaume Alain
Alex Lamb
Chinnadhurai Sankar
Aaron Courville
Yoshua Bengio
FedML
127
200
0
20 Nov 2015
1