Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.05198
Cited By
Reducing Activation Recomputation in Large Transformer Models
10 May 2022
V. Korthikanti
Jared Casper
Sangkug Lym
Lawrence C. McAfee
M. Andersch
M. Shoeybi
Bryan Catanzaro
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reducing Activation Recomputation in Large Transformer Models"
14 / 164 papers shown
Title
An Evaluation of Memory Optimization Methods for Training Neural Networks
Xiaoxuan Liu
Siddharth Jha
Alvin Cheung
18
0
0
26 Mar 2023
Extending the Pre-Training of BLOOM for Improved Support of Traditional Chinese: Models, Methods and Results
Philipp Ennen
Po-Chun Hsu
Chan-Jan Hsu
Chang-Le Liu
Yen-Chen Wu
Yin-Hsiang Liao
Chin-Tung Lin
Da-shan Shiu
Wei-Yun Ma
OSLM
VLM
AI4CE
30
10
0
08 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
8
12,223
0
27 Feb 2023
Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform
Shiwei Zhang
Lansong Diao
Siyu Wang
Zongyan Cao
Yiliang Gu
Chang Si
Ziji Shi
Zhen Zheng
Chuan Wu
W. Lin
AI4CE
22
4
0
16 Feb 2023
Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training
Hongzheng Chen
Cody Hao Yu
Shuai Zheng
Zhen Zhang
Zhiru Zhang
Yida Wang
22
6
0
16 Feb 2023
Alternating Updates for Efficient Transformers
Cenk Baykal
D. Cutler
Nishanth Dikkala
Nikhil Ghosh
Rina Panigrahy
Xin Wang
MoE
27
5
0
30 Jan 2023
SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction
Zhiqi Lin
Youshan Miao
Guodong Liu
Xiaoxiang Shi
Quanlu Zhang
...
Xu Cao
Cheng-Wu Li
Mao Yang
Lintao Zhang
Lidong Zhou
16
6
0
21 Jan 2023
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Trevor Gale
Deepak Narayanan
C. Young
Matei A. Zaharia
MoE
8
102
0
29 Nov 2022
Breadth-First Pipeline Parallelism
J. Lamy-Poirier
GNN
MoE
AI4CE
20
1
0
11 Nov 2022
On Optimizing the Communication of Model Parallelism
Yonghao Zhuang
Hexu Zhao
Lianmin Zheng
Zhuohan Li
Eric P. Xing
Qirong Ho
Joseph E. Gonzalez
Ion Stoica
Haotong Zhang
22
23
0
10 Nov 2022
Efficiently Scaling Transformer Inference
Reiner Pope
Sholto Douglas
Aakanksha Chowdhery
Jacob Devlin
James Bradbury
Anselm Levskaya
Jonathan Heek
Kefan Xiao
Shivani Agrawal
J. Dean
21
292
0
09 Nov 2022
Scaling Laws Beyond Backpropagation
Matthew J. Filipovich
Alessandro Cappelli
Daniel Hesslow
Julien Launay
11
3
0
26 Oct 2022
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
160
413
0
18 Jan 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,815
0
17 Sep 2019
Previous
1
2
3
4