ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning

International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021

16 April 2021

Yuxiong He

Papers citing "ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning"

35 / 235 papers shown

GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System ArchitectureInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022

Zaid Qureshi

Vikram Sharma Mailthody

...

156

09 Mar 2022

FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours

Yang You

209

02 Mar 2022

Survey on Large Scale Neural Network Training

...

220

21 Feb 2022

Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity ServersProceedings of the VLDB Endowment (PVLDB), 2022

233

02 Feb 2022

TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training JobsSymposium on Networked Systems Design and Implementation (NSDI), 2022

447

141

01 Feb 2022

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

...

Yuxiong He

427

810

28 Jan 2022

GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the EdgeSymposium on Networked Systems Design and Implementation (NSDI), 2022

Arthi Padmanabhan

Neil Agarwal

Anand Iyer

Ganesh Ananthanarayanan

236

19 Jan 2022

Analyzing the Limits of Self-Supervision in Handling Bias in Language

Lisa Bauer

Karthik Gopalakrishnan

Yang Liu

231

16 Dec 2021

FLAVA: A Foundational Language And Vision Alignment Model

Amanpreet Singh

Douwe Kiela

348

857

08 Dec 2021

End-to-end Adaptive Distributed Training on PaddlePaddle

Dianhai Yu

...

216

06 Dec 2021

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

...

403

904

17 Nov 2021

Amazon SageMaker Model Parallelism: A General and Flexible Framework for Large Model Training

104

10 Nov 2021

A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks

214

09 Nov 2021

Varuna: Scalable, Low-cost Training of Massive Deep Learning Models

258

106

07 Nov 2021

Sustainable AI: Environmental Implications, Challenges and OpportunitiesConference on Machine Learning and Systems (MLSys), 2021

...

394

532

30 Oct 2021

OneFlow: Redesign the Distributed Deep Learning Framework from Scratch

...

229

28 Oct 2021

Towards artificial general intelligence via a multimodal foundation model

...

Xin Gao

223

284

27 Oct 2021

AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learningIEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021

Siddharth Singh

A. Bhatele

GNN

295

25 Oct 2021

Hydra: A System for Large Multi-Model Deep Learning

Kabir Nagrecha

Arun Kumar

MoE AI4CE

327

16 Oct 2021

PAGnol: An Extra-Large French Generative Model

213

16 Oct 2021

A Short Study on Compressing Decoder-Based Language Models

Habib Hajimolahoseini

Yang Liu

Mehdi Rezagholizadeh

245

16 Oct 2021

M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion Parameter Pretraining

Junyang Lin

An Yang

Jinze Bai

Chang Zhou

Le Jiang

...

Jie Zhang

Yong Li

Jialin Li

Jingren Zhou

Hongxia Yang

MoE

334

08 Oct 2021

8-bit Optimizers via Block-wise Quantization

Tim Dettmers

M. Lewis

Sam Shleifer

Luke Zettlemoyer

376

383

06 Oct 2021

Is the Number of Trainable Parameters All That Actually Matters?

157

24 Sep 2021

PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory ManagementIEEE Transactions on Parallel and Distributed Systems (TPDS), 2021

Yang You

258

12 Aug 2021

AutoFL: Enabling Heterogeneity-Aware Energy Efficient Federated LearningMicro (MICRO), 2021

Young Geun Kim

Carole-Jean Wu

251

107

16 Jul 2021

Pre-Trained Models: Past, Present and FutureAI Open (AO), 2021

Xu Han

Zhengyan Zhang

Ning Ding

Yuxian Gu

Xiao Liu

...

Jun Zhu

384

985

14 Jun 2021

Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models

J. Lamy-Poirier

MoE

210

04 Jun 2021

M6-T: Exploring Sparse Expert Models and Beyond

An Yang

Junyang Lin

Rui Men

Chang Zhou

...

Jingren Zhou

Hongxia Yang

MoE

366

31 May 2021

Tesseract: Parallelize the Tensor Parallelism EfficientlyInternational Conference on Parallel Processing (ICPP), 2021

Yang You

144

30 May 2021

Maximizing Parallelism in Distributed Training for Huge Neural Networks

Yang You

113

30 May 2021

Sequence Parallelism: Long Sequence Training from System PerspectiveAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Yang You

362

138

26 May 2021

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LMInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021

Deepak Narayanan

...

703

970

09 Apr 2021

Whale: Efficient Giant Model Training over Heterogeneous GPUsUSENIX Annual Technical Conference (USENIX ATC), 2020

Ziji Shi

...

Lan-yue Chen

Yong Li

Zhen Zheng

Xiaoyong Liu

Wei Lin

274

18 Nov 2020

Neural Parameter Allocation Search

495

18 Jun 2020