Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.09503
Cited By
Memory-Efficient Pipeline-Parallel DNN Training
16 June 2020
Deepak Narayanan
Amar Phanishayee
Kaiyu Shi
Xie Chen
Matei A. Zaharia
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Memory-Efficient Pipeline-Parallel DNN Training"
50 / 109 papers shown
Title
Efficient Parallelization Layouts for Large-Scale Distributed Model Training
Johannes Hagemann
Samuel Weinbach
Konstantin Dobler
Maximilian Schall
Gerard de Melo
LRM
37
6
0
09 Nov 2023
Practical Performance Guarantees for Pipelined DNN Inference
Aaron Archer
Matthew Fahrbach
Kuikui Liu
Prakash Prabhu
29
0
0
07 Nov 2023
Ring Attention with Blockwise Transformers for Near-Infinite Context
Hao Liu
Matei A. Zaharia
Pieter Abbeel
38
217
0
03 Oct 2023
Enabling Resource-efficient AIoT System with Cross-level Optimization: A survey
Sicong Liu
Bin Guo
Cheng Fang
Ziqi Wang
Shiyan Luo
Zimu Zhou
Zhiwen Yu
AI4CE
34
22
0
27 Sep 2023
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
S. A. Jacobs
Masahiro Tanaka
Chengming Zhang
Minjia Zhang
L. Song
Samyam Rajbhandari
Yuxiong He
25
103
0
25 Sep 2023
GNNPipe: Scaling Deep GNN Training with Pipelined Model Parallelism
Jingji Chen
Zhuoming Chen
Xuehai Qian
GNN
AI4CE
33
3
0
19 Aug 2023
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin
Ke Wu
Jie Li
Jun Yu Li
Wu-Jun Li
33
1
0
31 Jul 2023
Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Yujie Wang
Youhe Jiang
Xupeng Miao
Fangcheng Fu
Shenhan Zhu
Xiaonan Nie
Yaofeng Tu
Bin Cui
45
9
0
05 Jul 2023
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Guanhua Wang
Heyang Qin
S. A. Jacobs
Connor Holmes
Samyam Rajbhandari
Olatunji Ruwase
Feng Yan
Lei Yang
Yuxiong He
VLM
59
57
0
16 Jun 2023
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Youhe Jiang
Fangcheng Fu
Xupeng Miao
Xiaonan Nie
Bin Cui
36
11
0
17 May 2023
Cloudless-Training: A Framework to Improve Efficiency of Geo-Distributed ML Training
W. Tan
Xiao Shi
Cunchi Lv
Xiaofang Zhao
FedML
20
1
0
09 Mar 2023
SWIFT: Expedited Failure Recovery for Large-scale DNN Training
Keon Jang
Hassan M. G. Wassel
Behnam Montazeri
Michael Ryan
David Wetherall
17
8
0
13 Feb 2023
Partitioning Distributed Compute Jobs with Reinforcement Learning and Graph Neural Networks
Christopher W. F. Parsonson
Zacharaya Shabka
Alessandro Ottino
G. Zervas
34
0
0
31 Jan 2023
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth Cost
Jinfan Chen
Shigang Li
Ran Guo
Jinhui Yuan
Torsten Hoefler
23
2
0
17 Jan 2023
Systems for Parallel and Distributed Large-Model Deep Learning Training
Kabir Nagrecha
GNN
VLM
MoE
26
7
0
06 Jan 2023
Does compressing activations help model parallel training?
S. Bian
Dacheng Li
Hongyi Wang
Eric P. Xing
Shivaram Venkataraman
19
5
0
06 Jan 2023
PiPAD: Pipelined and Parallel Dynamic GNN Training on GPUs
Chunyang Wang
Desen Sun
Yunru Bai
GNN
AI4CE
50
15
0
01 Jan 2023
Deep Incubation: Training Large Models by Divide-and-Conquering
Zanlin Ni
Yulin Wang
Jiangwei Yu
Haojun Jiang
Yu Cao
Gao Huang
VLM
18
11
0
08 Dec 2022
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Trevor Gale
Deepak Narayanan
C. Young
Matei A. Zaharia
MoE
14
102
0
29 Nov 2022
PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices
Kazuki Osawa
Shigang Li
Torsten Hoefler
AI4CE
35
24
0
25 Nov 2022
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Xupeng Miao
Yujie Wang
Youhe Jiang
Chunan Shi
Xiaonan Nie
Hailin Zhang
Bin Cui
GNN
MoE
37
60
0
25 Nov 2022
On Optimizing the Communication of Model Parallelism
Yonghao Zhuang
Hexu Zhao
Lianmin Zheng
Zhuohan Li
Eric P. Xing
Qirong Ho
Joseph E. Gonzalez
Ion Stoica
Haotong Zhang
22
24
0
10 Nov 2022
PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks
Enrico Meloni
Lapo Faggi
Simone Marullo
Alessandro Betti
Matteo Tiezzi
Marco Gori
S. Melacci
GNN
AI4TS
19
1
0
17 Oct 2022
Demand Layering for Real-Time DNN Inference with Minimized Memory Usage
Min-Zhi Ji
Saehanseul Yi
Chang-Mo Koo
Sol Ahn
Dongjoo Seo
N. Dutt
Jong-Chan Kim
42
16
0
08 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
250
1,073
0
05 Oct 2022
MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms
Yuke Wang
Boyuan Feng
Zheng Wang
Tong Geng
Kevin J. Barker
Ang Li
Yufei Ding
GNN
45
25
0
14 Sep 2022
Dive into Big Model Training
Qinghua Liu
Yuxiang Jiang
MoMe
AI4CE
LRM
18
3
0
25 Jul 2022
Merak: An Efficient Distributed DNN Training Framework with Automated 3D Parallelism for Giant Foundation Models
Zhiquan Lai
Shengwei Li
Xudong Tang
Ke-shi Ge
Weijie Liu
Yabo Duan
Linbo Qiao
Dongsheng Li
27
39
0
10 Jun 2022
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees
Jue Wang
Binhang Yuan
Luka Rimanic
Yongjun He
Tri Dao
Beidi Chen
Christopher Ré
Ce Zhang
AI4CE
19
11
0
02 Jun 2022
Decentralized Training of Foundation Models in Heterogeneous Environments
Binhang Yuan
Yongjun He
Jared Davis
Tianyi Zhang
Tri Dao
Beidi Chen
Percy Liang
Christopher Ré
Ce Zhang
25
90
0
02 Jun 2022
Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters
Yang Xiang
Zhihua Wu
Weibao Gong
Siyu Ding
Xianjie Mo
...
Yue Yu
Ge Li
Yu Sun
Yanjun Ma
Dianhai Yu
19
4
0
19 May 2022
Reducing Activation Recomputation in Large Transformer Models
V. Korthikanti
Jared Casper
Sangkug Lym
Lawrence C. McAfee
M. Andersch
M. Shoeybi
Bryan Catanzaro
AI4CE
27
256
0
10 May 2022
Efficient Pipeline Planning for Expedited Distributed DNN Training
Ziyue Luo
Xiaodong Yi
Guoping Long
Shiqing Fan
Chuan Wu
Jun Yang
Wei Lin
28
16
0
22 Apr 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
91
6,015
0
05 Apr 2022
Shisha: Online scheduling of CNN pipelines on heterogeneous architectures
Pirah Noor Soomro
M. Abduljabbar
J. Castrillón
Miquel Pericàs
24
1
0
23 Feb 2022
Survey on Large Scale Neural Network Training
Julia Gusak
Daria Cherniuk
Alena Shilova
A. Katrutsa
Daniel Bershatsky
...
Lionel Eyraud-Dubois
Oleg Shlyazhko
Denis Dimitrov
Ivan V. Oseledets
Olivier Beaumont
22
10
0
21 Feb 2022
DistrEdge: Speeding up Convolutional Neural Network Inference on Distributed Edge Devices
Xueyu Hou
Yongjie Guan
Tao Han
Ning Zhang
19
41
0
03 Feb 2022
Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers
Youjie Li
Amar Phanishayee
D. Murray
Jakub Tarnawski
N. Kim
16
19
0
02 Feb 2022
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
Lianmin Zheng
Zhuohan Li
Hao Zhang
Yonghao Zhuang
Zhifeng Chen
...
Yuanzhong Xu
Danyang Zhuo
Eric P. Xing
Joseph E. Gonzalez
Ion Stoica
MoE
24
104
0
28 Jan 2022
Near-Optimal Sparse Allreduce for Distributed Deep Learning
Shigang Li
Torsten Hoefler
23
50
0
19 Jan 2022
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Shuohuan Wang
Yu Sun
Yang Xiang
Zhihua Wu
Siyu Ding
...
Tian Wu
Wei Zeng
Ge Li
Wen Gao
Haifeng Wang
ELM
39
79
0
23 Dec 2021
Layer-Parallel Training of Residual Networks with Auxiliary-Variable Networks
Qi Sun
Hexin Dong
Zewei Chen
Jiacheng Sun
Zhenguo Li
Bin Dong
27
1
0
10 Dec 2021
Automap: Towards Ergonomic Automated Parallelism for ML Models
Michael Schaarschmidt
Dominik Grewe
Dimitrios Vytiniotis
Adam Paszke
G. Schmid
...
James Molloy
Jonathan Godwin
Norman A. Rink
Vinod Nair
Dan Belov
MoE
17
16
0
06 Dec 2021
End-to-end Adaptive Distributed Training on PaddlePaddle
Yulong Ao
Zhihua Wu
Dianhai Yu
Weibao Gong
Zhiqing Kui
...
Yanjun Ma
Tian Wu
Haifeng Wang
Wei Zeng
Chao Yang
19
10
0
06 Dec 2021
Amazon SageMaker Model Parallelism: A General and Flexible Framework for Large Model Training
C. Karakuş
R. Huilgol
Fei Wu
Anirudh Subramanian
Cade Daniel
D. Çavdar
Teng Xu
Haohan Chen
Arash Rahnama
L. Quintela
MoE
AI4CE
28
28
0
10 Nov 2021
DistIR: An Intermediate Representation and Simulator for Efficient Neural Network Distribution
Keshav Santhanam
Siddharth Krishna
Ryota Tomioka
Tim Harris
Matei A. Zaharia
12
5
0
09 Nov 2021
Varuna: Scalable, Low-cost Training of Massive Deep Learning Models
Sanjith Athlur
Nitika Saran
Muthian Sivathanu
Ramachandran Ramjee
Nipun Kwatra
GNN
31
80
0
07 Nov 2021
Pipeline Parallelism for Inference on Heterogeneous Edge Computing
Yang Hu
Connor Imes
Xuanang Zhao
Souvik Kundu
P. Beerel
S. Crago
J. Walters
MoE
99
19
0
28 Oct 2021
Hydra: A System for Large Multi-Model Deep Learning
Kabir Nagrecha
Arun Kumar
MoE
AI4CE
38
5
0
16 Oct 2021
Scheduling Optimization Techniques for Neural Network Training
Hyungjun Oh
Junyeol Lee
HyeongJu Kim
Jiwon Seo
21
0
0
03 Oct 2021
Previous
1
2
3
Next