Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.07857
Cited By
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021
16 April 2021
Samyam Rajbhandari
Olatunji Ruwase
Jeff Rasley
Shaden Smith
Yuxiong He
GNN
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning"
35 / 235 papers shown
GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022
Zaid Qureshi
Vikram Sharma Mailthody
Isaac Gelado
S. Min
Amna Masood
...
Dmitri Vainbrand
I-Hsin Chung
M. Garland
W. Dally
Wen-mei W. Hwu
GNN
156
42
0
09 Mar 2022
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours
Shenggan Cheng
Xuanlei Zhao
Guangyang Lu
Bin-Rui Li
Zhongming Yu
Tian Zheng
R. Wu
Xiwen Zhang
Jian Peng
Yang You
AI4CE
209
35
0
02 Mar 2022
Survey on Large Scale Neural Network Training
Julia Gusak
Daria Cherniuk
Alena Shilova
A. Katrutsa
Daniel Bershatsky
...
Lionel Eyraud-Dubois
Oleg Shlyazhko
Denis Dimitrov
Ivan Oseledets
Olivier Beaumont
220
13
0
21 Feb 2022
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers
Proceedings of the VLDB Endowment (PVLDB), 2022
Youjie Li
Amar Phanishayee
D. Murray
Jakub Tarnawski
Nam Sung Kim
233
25
0
02 Feb 2022
TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs
Symposium on Networked Systems Design and Implementation (NSDI), 2022
Weiyang Wang
Moein Khazraee
Zhizhen Zhong
M. Ghobadi
Zhihao Jia
Dheevatsa Mudigere
Ying Zhang
A. Kewitsch
447
141
0
01 Feb 2022
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
...
Mohammad Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
MoE
427
810
0
28 Jan 2022
GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge
Symposium on Networked Systems Design and Implementation (NSDI), 2022
Arthi Padmanabhan
Neil Agarwal
Anand Iyer
Ganesh Ananthanarayanan
Yuanchao Shu
Nikolaos Karianakis
G. Xu
Ravi Netravali
236
78
0
19 Jan 2022
Analyzing the Limits of Self-Supervision in Handling Bias in Language
Lisa Bauer
Karthik Gopalakrishnan
Spandana Gella
Yang Liu
Joey Tianyi Zhou
Dilek Z. Hakkani-Tür
ELM
231
3
0
16 Dec 2021
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh
Ronghang Hu
Vedanuj Goswami
Guillaume Couairon
Wojciech Galuba
Marcus Rohrbach
Douwe Kiela
CLIP
VLM
348
857
0
08 Dec 2021
End-to-end Adaptive Distributed Training on PaddlePaddle
Yulong Ao
Zhihua Wu
Dianhai Yu
Weibao Gong
Zhiqing Kui
...
Yanjun Ma
Tian Wu
Haifeng Wang
Wei Zeng
Chao Yang
216
14
0
06 Dec 2021
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Arun Babu
Changhan Wang
Andros Tjandra
Kushal Lakhotia
Qiantong Xu
...
Yatharth Saraf
J. Pino
Alexei Baevski
Alexis Conneau
Michael Auli
SSL
403
904
0
17 Nov 2021
Amazon SageMaker Model Parallelism: A General and Flexible Framework for Large Model Training
C. Karakuş
R. Huilgol
Leilei Gan
Anirudh Subramanian
Cade Daniel
D. Çavdar
Teng Xu
Haohan Chen
Arash Rahnama
L. Quintela
MoE
AI4CE
104
31
0
10 Nov 2021
A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks
Daniel Nichols
Siddharth Singh
Shuqing Lin
A. Bhatele
OOD
214
11
0
09 Nov 2021
Varuna: Scalable, Low-cost Training of Massive Deep Learning Models
Sanjith Athlur
Nitika Saran
Muthian Sivathanu
Ramachandran Ramjee
Nipun Kwatra
GNN
258
106
0
07 Nov 2021
Sustainable AI: Environmental Implications, Challenges and Opportunities
Conference on Machine Learning and Systems (MLSys), 2021
Carole-Jean Wu
Ramya Raghavendra
Udit Gupta
Bilge Acun
Newsha Ardalani
...
Maximilian Balandat
Joe Spisak
R. Jain
Michael G. Rabbat
K. Hazelwood
394
532
0
30 Oct 2021
OneFlow: Redesign the Distributed Deep Learning Framework from Scratch
Jinhui Yuan
Xinqi Li
Cheng Cheng
Juncheng Liu
Ran Guo
...
Fei Yang
Xiaodong Yi
Chuan Wu
Haoran Zhang
Jie Zhao
229
51
0
28 Oct 2021
Towards artificial general intelligence via a multimodal foundation model
Nanyi Fei
Zhiwu Lu
Yizhao Gao
Guoxing Yang
Yuqi Huo
...
Ruihua Song
Xin Gao
Tao Xiang
Haoran Sun
Jiling Wen
AI4CE
LRM
223
284
0
27 Oct 2021
AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning
IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021
Siddharth Singh
A. Bhatele
GNN
295
21
0
25 Oct 2021
Hydra: A System for Large Multi-Model Deep Learning
Kabir Nagrecha
Arun Kumar
MoE
AI4CE
327
5
0
16 Oct 2021
PAGnol: An Extra-Large French Generative Model
Julien Launay
E. L. Tommasone
B. Pannier
Franccois Boniface
A. Chatelain
Alessandro Cappelli
Iacopo Poli
Djamé Seddah
AILaw
MoE
AI4CE
213
9
0
16 Oct 2021
A Short Study on Compressing Decoder-Based Language Models
Tianda Li
Yassir El Mesbahi
I. Kobyzev
Ahmad Rashid
A. Mahmud
Nithin Anchuri
Habib Hajimolahoseini
Yang Liu
Mehdi Rezagholizadeh
245
29
0
16 Oct 2021
M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining
Junyang Lin
An Yang
Jinze Bai
Chang Zhou
Le Jiang
...
Jie Zhang
Yong Li
Jialin Li
Jingren Zhou
Hongxia Yang
MoE
334
46
0
08 Oct 2021
8-bit Optimizers via Block-wise Quantization
Tim Dettmers
M. Lewis
Sam Shleifer
Luke Zettlemoyer
MQ
376
383
0
06 Oct 2021
Is the Number of Trainable Parameters All That Actually Matters?
A. Chatelain
Amine Djeghri
Daniel Hesslow
Julien Launay
Iacopo Poli
157
7
0
24 Sep 2021
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management
IEEE Transactions on Parallel and Distributed Systems (TPDS), 2021
Jiarui Fang
Zilin Zhu
Shenggui Li
Hui Su
Yang Yu
Jie Zhou
Yang You
VLM
258
41
0
12 Aug 2021
AutoFL: Enabling Heterogeneity-Aware Energy Efficient Federated Learning
Micro (MICRO), 2021
Young Geun Kim
Carole-Jean Wu
251
107
0
16 Jul 2021
Pre-Trained Models: Past, Present and Future
AI Open (AO), 2021
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
384
985
0
14 Jun 2021
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
J. Lamy-Poirier
MoE
210
11
0
04 Jun 2021
M6-T: Exploring Sparse Expert Models and Beyond
An Yang
Junyang Lin
Rui Men
Chang Zhou
Le Jiang
...
Dingyang Zhang
Jialin Li
Lin Qu
Jingren Zhou
Hongxia Yang
MoE
366
24
0
31 May 2021
Tesseract: Parallelize the Tensor Parallelism Efficiently
International Conference on Parallel Processing (ICPP), 2021
Boxiang Wang
Qifan Xu
Zhengda Bian
Yang You
VLM
GNN
144
48
0
30 May 2021
Maximizing Parallelism in Distributed Training for Huge Neural Networks
Zhengda Bian
Qifan Xu
Boxiang Wang
Yang You
MoE
113
55
0
30 May 2021
Sequence Parallelism: Long Sequence Training from System Perspective
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Shenggui Li
Fuzhao Xue
Chaitanya Baranwal
Yongbin Li
Yang You
362
138
0
26 May 2021
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021
Deepak Narayanan
Mohammad Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
703
970
0
09 Apr 2021
Whale: Efficient Giant Model Training over Heterogeneous GPUs
USENIX Annual Technical Conference (USENIX ATC), 2020
Chencan Wu
Le Jiang
Ang Wang
Wencong Xiao
Ziji Shi
...
Lan-yue Chen
Yong Li
Zhen Zheng
Xiaoyong Liu
Wei Lin
274
68
0
18 Nov 2020
Neural Parameter Allocation Search
Bryan A. Plummer
Nikoli Dryden
Julius Frost
Torsten Hoefler
Kate Saenko
495
18
0
18 Jun 2020
Previous
1
2
3
4
5