Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.07857
Cited By
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021
16 April 2021
Samyam Rajbhandari
Olatunji Ruwase
Jeff Rasley
Shaden Smith
Yuxiong He
GNN
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning"
50 / 235 papers shown
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Guanhua Wang
Heyang Qin
S. A. Jacobs
Connor Holmes
Samyam Rajbhandari
Olatunji Ruwase
Feng Yan
Lei Yang
Yuxiong He
VLM
222
78
0
16 Jun 2023
Full Parameter Fine-tuning for Large Language Models with Limited Resources
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Kai Lv
Yuqing Yang
Tengxiao Liu
Qi-jie Gao
Qipeng Guo
Xipeng Qiu
330
186
0
16 Jun 2023
Proteus: Simulating the Performance of Distributed DNN Training
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2023
Jiangfei Duan
Xiuhong Li
Ping Xu
Xingcheng Zhang
Shengen Yan
Yun Liang
Dahua Lin
223
13
0
04 Jun 2023
Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training
European Conference on Artificial Intelligence (ECAI), 2023
Yijia Zhang
Yibo Han
Shijie Cao
Guohao Dai
Youshan Miao
Ting Cao
Fan Yang
Ningyi Xu
118
5
0
31 May 2023
Automated Tensor Model Parallelism with Overlapped Communication for Efficient Foundation Model Training
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2023
Shengwei Li
Zhiquan Lai
Yanqi Hao
Weijie Liu
Ke-shi Ge
Xiaoge Deng
Dongsheng Li
KaiCheng Lu
175
11
0
25 May 2023
Scaling Speech Technology to 1,000+ Languages
Journal of machine learning research (JMLR), 2023
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
391
522
0
22 May 2023
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
International Joint Conference on Artificial Intelligence (IJCAI), 2022
Youhe Jiang
Fangcheng Fu
Xupeng Miao
Xiaonan Nie
Tengjiao Wang
217
14
0
17 May 2023
DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training
Computer Vision and Pattern Recognition (CVPR), 2023
Yihao Chen
Xianbiao Qi
Jianan Wang
Lei Zhang
175
24
0
17 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
296
51
0
07 Apr 2023
The Online Pause and Resume Problem: Optimal Algorithms and An Application to Carbon-Aware Load Shifting
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), 2023
Adam Lechowicz
Nicolas H. Christianson
Jinhang Zuo
Noman Bashir
Mohammad Hajiesmaili
Adam Wierman
Prashant J. Shenoy
175
28
0
30 Mar 2023
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2023
William Won
Taekyung Heo
Saeed Rashidi
Srinivas Sridharan
Sudarshan Srinivasan
T. Krishna
142
83
0
24 Mar 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
International Conference on Machine Learning (ICML), 2023
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Abigail Z. Jacobs
Christopher Ré
Ion Stoica
Ce Zhang
451
575
0
13 Mar 2023
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training
International Conference on Supercomputing (ICS), 2023
Siddharth Singh
Olatunji Ruwase
A. A. Awan
Samyam Rajbhandari
Yuxiong He
A. Bhatele
MoE
218
64
0
11 Mar 2023
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Proceedings of the VLDB Endowment (PVLDB), 2023
Xiaonan Nie
Yi Liu
Fangcheng Fu
Jinbao Xue
Dian Jiao
Xupeng Miao
Yangyu Tao
Tengjiao Wang
MoE
205
24
0
06 Mar 2023
SWIFT: Expedited Failure Recovery for Large-scale DNN Training
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2023
Keon Jang
Hassan M. G. Wassel
Behnam Montazeri
Michael Ryan
David Wetherall
162
17
0
13 Feb 2023
Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training
IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2023
Siddharth Singh
A. Bhatele
275
10
0
10 Feb 2023
Computation vs. Communication Scaling for Future Transformers on Future Hardware
Suchita Pati
Shaizeen Aga
Mahzabeen Islam
Nuwan Jayasena
Matthew D. Sinclair
262
14
0
06 Feb 2023
Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models
Yuliang Liu
Shenggui Li
Jiarui Fang
Yan Shao
Boyuan Yao
Yang You
OffRL
216
11
0
06 Feb 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
International Conference on Machine Learning (ICML), 2023
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
365
55
0
27 Jan 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2023
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
274
39
0
24 Jan 2023
ATP: Adaptive Tensor Parallelism for Foundation Models
Shenggan Cheng
Ziming Liu
Jiangsu Du
Yang You
138
11
0
20 Jan 2023
Systems for Parallel and Distributed Large-Model Deep Learning Training
Kabir Nagrecha
GNN
VLM
MoE
154
9
0
06 Jan 2023
Elixir: Train a Large Language Model on a Small GPU Cluster
Haichen Huang
Jiarui Fang
Hongxin Liu
Shenggui Li
Yang You
VLM
250
10
0
10 Dec 2022
Deep Incubation: Training Large Models by Divide-and-Conquering
IEEE International Conference on Computer Vision (ICCV), 2022
Zanlin Ni
Yulin Wang
Jiangwei Yu
Haojun Jiang
Yu Cao
Gao Huang
VLM
243
13
0
08 Dec 2022
COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training
D. Kadiyala
Saeed Rashidi
Taekyung Heo
Abhimanyu Bambhaniya
T. Krishna
Alexandros Daglis
VLM
172
11
0
30 Nov 2022
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Conference on Machine Learning and Systems (MLSys), 2022
Kazuki Osawa
Shigang Li
Torsten Hoefler
AI4CE
206
36
0
25 Nov 2022
Breadth-First Pipeline Parallelism
J. Lamy-Poirier
GNN
MoE
AI4CE
121
1
0
11 Nov 2022
On Optimizing the Communication of Model Parallelism
Conference on Machine Learning and Systems (MLSys), 2022
Yonghao Zhuang
Hexu Zhao
Lianmin Zheng
Zhuohan Li
Eric P. Xing
Qirong Ho
Joseph E. Gonzalez
Ion Stoica
Haotong Zhang
197
40
0
10 Nov 2022
Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction
Neural Information Processing Systems (NeurIPS), 2022
Muralidhar Andoorveedu
Zhanda Zhu
Bojian Zheng
Gennady Pekhimenko
185
8
0
19 Oct 2022
Mimose: An Input-Aware Checkpointing Planner for Efficient Training on GPU
Jian-He Liao
Mingzhen Li
Qingxiao Sun
Jiwei Hao
F. Yu
...
Ye Tao
Zicheng Zhang
Hailong Yang
Zhongzhi Luan
D. Qian
146
4
0
06 Sep 2022
Petals: Collaborative Inference and Fine-tuning of Large Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Alexander Borzunov
Dmitry Baranchuk
Tim Dettmers
Max Ryabinin
Younes Belkada
Artem Chumachenko
Pavel Samygin
Colin Raffel
VLM
224
95
0
02 Sep 2022
Training a T5 Using Lab-sized Resources
Manuel R. Ciosici
Leon Derczynski
VLM
178
8
0
25 Aug 2022
PromptFL: Let Federated Participants Cooperatively Learn Prompts Instead of Models -- Federated Learning in Age of Foundation Model
IEEE Transactions on Mobile Computing (IEEE TMC), 2022
Tao Guo
Song Guo
Junxiao Wang
Wenchao Xu
FedML
VLM
LRM
197
189
0
24 Aug 2022
Multimodal foundation models are better simulators of the human brain
Haoyu Lu
Qiongyi Zhou
Nanyi Fei
Zhiwu Lu
Mingyu Ding
...
Changde Du
Xin Zhao
Haoran Sun
Huiguang He
J. Wen
AI4CE
172
19
0
17 Aug 2022
PolarFly: A Cost-Effective and Flexible Low-Diameter Topology
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2022
Kartik Lakhotia
Maciej Besta
Laura Monroe
K. Isham
Patrick Iff
Torsten Hoefler
Fabrizio Petrini
351
28
0
02 Aug 2022
Dive into Big Model Training
Qinghua Liu
Yuxiang Jiang
MoMe
AI4CE
LRM
108
3
0
25 Jul 2022
Machine Learning Model Sizes and the Parameter Gap
Pablo Villalobos
J. Sevilla
T. Besiroglu
Lennart Heim
A. Ho
Marius Hobbhahn
ALM
ELM
AI4CE
189
78
0
05 Jul 2022
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2022
Reza Yazdani Aminabadi
Samyam Rajbhandari
Minjia Zhang
A. A. Awan
Cheng-rong Li
...
Elton Zheng
Jeff Rasley
Shaden Smith
Olatunji Ruwase
Yuxiong He
408
506
0
30 Jun 2022
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
Conference on Machine Learning and Systems (MLSys), 2022
Vitaliy Chiley
Vithursan Thangarasa
Abhay Gupta
Anshul Samar
Joel Hestness
D. DeCoste
193
13
0
28 Jun 2022
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
International Conference on Learning Representations (ICLR), 2022
Gunho Park
Baeseong Park
Minsub Kim
Sungjae Lee
Jeonghoon Kim
Beomseok Kwon
S. Kwon
Byeongwook Kim
Youngjoo Lee
Dongsoo Lee
MQ
447
113
0
20 Jun 2022
Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2022
Zhiquan Lai
Shengwei Li
Xudong Tang
Ke-shi Ge
Weijie Liu
Yabo Duan
Linbo Qiao
Dongsheng Li
316
61
0
10 Jun 2022
A New Frontier of AI: On-Device AI Training and Personalization
Jijoong Moon
Parichay Kapoor
Ji Hoon Lee
Donghak Park
Seungbaek Hong
Hyungyu Lee
Donghyeon Jeong
Sungsik Kong
MyungJoo Ham
171
4
0
09 Jun 2022
Can Foundation Models Help Us Achieve Perfect Secrecy?
Simran Arora
Christopher Ré
FedML
245
12
0
27 May 2022
Reducing Activation Recomputation in Large Transformer Models
Conference on Machine Learning and Systems (MLSys), 2022
V. Korthikanti
Jared Casper
Sangkug Lym
Lawrence C. McAfee
M. Andersch
Mohammad Shoeybi
Bryan Catanzaro
AI4CE
300
385
0
10 May 2022
Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards
International Symposium on Computer Architecture (ISCA), 2022
Youngeun Kwon
Minsoo Rhu
143
30
0
10 May 2022
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
Proceedings of the VLDB Endowment (PVLDB), 2022
Zhen Zhang
Shuai Zheng
Yida Wang
Justin Chiu
George Karypis
Trishul Chilimbi
Mu Li
Xin Jin
451
47
0
30 Apr 2022
PaLM: Scaling Language Modeling with Pathways
Journal of machine learning research (JMLR), 2022
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
1.2K
7,457
0
05 Apr 2022
DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation
Yu Tang
Chenyu Wang
Yufan Zhang
Yuliang Liu
Xingcheng Zhang
Linbo Qiao
Zhiquan Lai
Dongsheng Li
224
6
0
30 Mar 2022
Pathways: Asynchronous Distributed Dataflow for ML
Conference on Machine Learning and Systems (MLSys), 2022
P. Barham
Aakanksha Chowdhery
J. Dean
Sanjay Ghemawat
Steven Hand
...
Parker Schuh
Ryan Sepassi
Laurent El Shafey
C. A. Thekkath
Yonghui Wu
GNN
MoE
296
145
0
23 Mar 2022
DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction
Buyun Zhang
Liangchen Luo
Xi Liu
Jay Li
Zeliang Chen
...
Yasmine Badr
Jongsoo Park
Jiyan Yang
Dheevatsa Mudigere
Ellie Wen
3DV
149
12
0
11 Mar 2022
Previous
1
2
3
4
5
Next