Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.05198
Cited By
Reducing Activation Recomputation in Large Transformer Models
10 May 2022
V. Korthikanti
Jared Casper
Sangkug Lym
Lawrence C. McAfee
M. Andersch
M. Shoeybi
Bryan Catanzaro
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reducing Activation Recomputation in Large Transformer Models"
50 / 164 papers shown
Title
Understanding Stragglers in Large Model Training Using What-if Analysis
Jinkun Lin
Ziheng Jiang
Zuquan Song
Sida Zhao
Menghan Yu
...
Shuguang Wang
Haibin Lin
Xin Liu
Aurojit Panda
Jinyang Li
22
0
0
09 May 2025
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
Yehui Tang
Yichun Yin
Yaoyuan Wang
Hang Zhou
Yu Pan
...
Zhe Liu
Zhicheng Liu
Z. Tu
Zilin Ding
Zongyuan Zhan
MoE
30
0
0
07 May 2025
Galvatron: An Automatic Distributed System for Efficient Foundation Model Training
Xinyi Liu
Y. Wang
Shenhan Zhu
Fangcheng Fu
Qingshuo Liu
Guangming Lin
Bin Cui
GNN
73
0
0
30 Apr 2025
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
J. Li
Yixin Ji
Z. Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Z. Wang
Baoxing Huai
M. Zhang
LLMAG
77
0
0
28 Apr 2025
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
Jiliang Ni
Jiachen Pu
Zhongyi Yang
Kun Zhou
Hui Wang
Xiaoliang Xiao
Dakui Wang
Xin Li
Jingfeng Luo
Conggang Hu
32
0
0
18 Apr 2025
NNTile: a machine learning framework capable of training extremely large GPT language models on a single node
A. Mikhalev
Aleksandr Katrutsa
Konstantin Sozykin
Ivan V. Oseledets
25
0
0
17 Apr 2025
OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training
Juntao Zhao
Qi Lu
Wei Jia
Borui Wan
Lei Zuo
...
Y. Hu
Yanghua Peng
H. Lin
Xin Liu
Chuan Wu
AI4CE
32
0
0
14 Apr 2025
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Zhiqi Huang
Zihao Huang
Zijia Zhao
Z. Chen
Zongyu Lin
MLLM
VLM
MoE
137
1
0
10 Apr 2025
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
Kazuki Yano
Takumi Ito
Jun Suzuki
LRM
47
1
0
05 Apr 2025
UniEDU: A Unified Language and Vision Assistant for Education Applications
Zhendong Chu
Jian Xie
Shen Wang
Z. Wang
Qingsong Wen
AI4Ed
110
0
0
26 Mar 2025
Maya: Optimizing Deep Learning Training Workloads using Emulated Virtual Accelerators
Srihas Yarlagadda
A. Agrawal
Elton Pinto
Hakesh Darapaneni
Mitali Meratwal
Shivam Mittal
Pranavi Bajjuri
S.
Alexey Tumanov
74
0
0
26 Mar 2025
Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
Zhanda Zhu
Christina Giannoula
Muralidhar Andoorveedu
Qidong Su
Karttikeya Mangalam
Bojian Zheng
Gennady Pekhimenko
VLM
MoE
49
0
0
24 Mar 2025
WLB-LLM: Workload-Balanced 4D Parallelism for Large Language Model Training
Z. Wang
Anna Cai
Xinfeng Xie
Zaifeng Pan
Yue Guan
...
Shikai Li
Jianyu Huang
Chris Cai
Yuchen Hao
Yufei Ding
39
2
0
23 Mar 2025
ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism
Venmugil Elango
48
0
0
20 Mar 2025
The Lucie-7B LLM and the Lucie Training Dataset: Open resources for multilingual language generation
Olivier Gouvert
Julie Hunter
Jérôme Louradour
Christophe Cerisara
Evan Dufraisse
Yaya Sy
Laura Rivière
Jean-Pierre Lorré
OpenLLM-France community
102
0
0
15 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu-Xi Cheng
MoE
67
1
0
07 Mar 2025
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization
Xinyi Wan
Penghui Qi
Guangxing Huang
Jialin Li
Min Lin
39
0
0
03 Mar 2025
ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs
Hao Ge
Junda Feng
Qi Huang
Fangcheng Fu
Xiaonan Nie
Lei Zuo
Haibin Lin
Bin Cui
Xin Liu
37
2
0
28 Feb 2025
PaCA: Partial Connection Adaptation for Efficient Fine-Tuning
Sunghyeon Woo
Sol Namkung
Sunwoo Lee
Inho Jeong
Beomseok Kim
Dongsuk Jeon
33
0
0
28 Feb 2025
Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
Tian Jin
Ellie Y. Cheng
Zack Ankner
Nikunj Saunshi
Blake M. Elias
Amir Yazdanbakhsh
Jonathan Ragan-Kelley
Suvinay Subramanian
Michael Carbin
52
2
0
24 Feb 2025
Understanding Silent Data Corruption in LLM Training
Jeffrey Ma
Hengzhi Pei
Leonard Lausen
George Karypis
37
0
0
17 Feb 2025
Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization
Bowen Pang
Kai Li
Ruifeng She
Feifan Wang
OffRL
43
2
0
14 Feb 2025
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
Guoqing Ma
Haoyang Huang
K. Yan
L. Chen
Nan Duan
...
Y. Wang
Yuanwei Lu
Yu-Cheng Chen
Yu-Juan Luo
Y. Luo
DiffM
VGen
147
17
0
14 Feb 2025
Gradient Multi-Normalization for Stateless and Scalable LLM Training
M. Scetbon
Chao Ma
Wenbo Gong
Edward Meeds
97
1
0
10 Feb 2025
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
Tzu-Tao Chang
Shivaram Venkataraman
VLM
111
0
0
04 Feb 2025
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Kai He
Rui Mao
Qika Lin
Yucheng Ruan
Xiang Lan
Mengling Feng
Erik Cambria
LM&MA
AILaw
93
151
0
28 Jan 2025
A Survey on Memory-Efficient Large-Scale Model Training in AI for Science
Kaiyuan Tian
Linbo Qiao
Baihui Liu
Gongqingjian Jiang
Dongsheng Li
31
0
0
21 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
97
0
0
30 Dec 2024
FlexSP: Accelerating Large Language Model Training via Flexible Sequence Parallelism
Y. Wang
Shiju Wang
Shenhan Zhu
Fangcheng Fu
Xinyi Liu
Xuefeng Xiao
Huixia Li
Jiashi Li
Faming Wu
Bin Cui
81
0
0
02 Dec 2024
Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution
Haiquan Wang
Chaoyi Ruan
Jia He
Jiaqi Ruan
Chengjie Tang
Xiaosong Ma
Cheng-rong Li
73
1
0
24 Nov 2024
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
130
3
0
20 Nov 2024
Accelerating Large Language Model Training with 4D Parallelism and Memory Consumption Estimator
Kazuki Fujii
Kohei Watanabe
Rio Yokota
27
0
0
10 Nov 2024
Context Parallelism for Scalable Million-Token Inference
Amy Yang
Jingyi Yang
Aya Ibrahim
Xinfeng Xie
Bangsheng Tang
Grigory Sizov
Jeremy Reizenstein
Jongsoo Park
Jianyu Huang
MoE
LRM
60
5
0
04 Nov 2024
MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization
J. Guo
Yan Liu
Yu Meng
Zhiwei Tao
Banglan Liu
Gang Chen
Xiang Li
MoE
22
0
0
01 Nov 2024
SimpleFSDP: Simpler Fully Sharded Data Parallel with torch.compile
Ruisi Zhang
Tianyu Liu
Will Feng
Andrew Gu
Sanket Purandare
Wanchao Liang
Francisco Massa
24
1
0
01 Nov 2024
Extralonger: Toward a Unified Perspective of Spatial-Temporal Factors for Extra-Long-Term Traffic Forecasting
Zhiwei Zhang
Shaojun E
Fandong Meng
Jie Zhou
Wenjuan Han
31
0
0
30 Oct 2024
Revisiting Reliability in Large-Scale Machine Learning Research Clusters
Apostolos Kokolis
Michael Kuchnik
John Hoffman
Adithya Kumar
Parth Malani
Faye Ma
Zachary DeVito
S.
Kalyan Saladi
Carole-Jean Wu
89
7
0
29 Oct 2024
Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling
Minhyuk Seo
Hyunseo Koh
Jonghyun Choi
29
1
0
19 Oct 2024
Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization
Haoyang Li
Fangcheng Fu
Hao Ge
Sheng Lin
Xuanyu Wang
Jiawen Niu
Y. Wang
Hailin Zhang
Xiaonan Nie
Bin Cui
MoMe
23
2
0
17 Oct 2024
FALCON: Pinpointing and Mitigating Stragglers for Large-Scale Hybrid-Parallel Training
Tianyuan Wu
Wei Wang
Yinghao Yu
Siran Yang
Wenchao Wu
Qinkai Duan
Guodong Yang
Jiamang Wang
Lin Qu
Liping Zhang
22
6
0
16 Oct 2024
Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM
Haiyue Ma
Jian Liu
Ronny Krashinsky
16
0
0
10 Oct 2024
TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training
Wanchao Liang
Tianyu Liu
Less Wright
Will Constable
Andrew Gu
...
Howard Huang
Junjie Wang
Sanket Purandare
Gokul Nadathur
Stratos Idreos
OffRL
33
9
0
09 Oct 2024
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs
Mehdi Ali
Michael Fromm
Klaudia Thellmann
Jan Ebert
Alexander Arno Weber
...
René Jäkel
Georg Rehm
Stefan Kesselheim
Joachim Köhler
Nicolas Flores-Herr
64
6
0
30 Sep 2024
Hyper-Connections
Defa Zhu
Hongzhi Huang
Zihao Huang
Yutao Zeng
Yunyao Mao
Banggu Wu
Qiyang Min
Xun Zhou
29
3
0
29 Sep 2024
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping
Guanhua Wang
Chengming Zhang
Zheyu Shen
Ang Li
Olatunji Ruwase
18
3
0
23 Sep 2024
CSPS: A Communication-Efficient Sequence-Parallelism based Serving System for Transformer based Models with Long Prompts
Zeyu Zhang
Haiying Shen
VLM
24
0
0
23 Sep 2024
Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML
Chelsea Maria John
Stepan Nassyr
Carolin Penke
A. Herten
23
0
0
19 Sep 2024
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Jinghan Yao
Sam Ade Jacobs
Masahiro Tanaka
Olatunji Ruwase
A. Shafi
D. Panda
28
2
0
30 Aug 2024
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
Wei An
Xiao Bi
Guanting Chen
Shanhuang Chen
Chengqi Deng
...
Chenggang Zhao
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Yuheng Zou
34
5
0
26 Aug 2024
Real-Time Video Generation with Pyramid Attention Broadcast
Xuanlei Zhao
Xiaolong Jin
Kai Wang
Yang You
VGen
DiffM
69
31
0
22 Aug 2024
1
2
3
4
Next