Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2102.03161
Cited By
v1
v2 (latest)
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers
5 February 2021
Chaoyang He
Shen Li
Mahdi Soltanolkotabi
Salman Avestimehr
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers"
22 / 22 papers shown
An Analysis of Layer-Freezing Strategies for Enhanced Transfer Learning in YOLO Architectures
Andrzej D. Dobrzycki
Ana M. Bernardos
José Ramón Casar
132
3
0
05 Sep 2025
Balanced and Elastic End-to-end Training of Dynamic LLMs
Mohamed Wahib
Muhammed Abdullah Soyturk
Didem Unat
MoE
379
1
0
20 May 2025
ORBIT-2: Scaling Exascale Vision Foundation Models for Weather and Climate Downscaling
Xiao Wang
Jong Youl Choi
Takuya Kurihaya
Isaac Lyngaas
Hong-Jun Yoon
...
Dali Wang
Peter Thornton
Prasanna Balaprakash
M. Ashfaq
Dan Lu
321
3
0
07 May 2025
TAGC: Optimizing Gradient Communication in Distributed Transformer Training
Igor Polyakov
Alexey Dukhanov
Egor Spirin
307
1
0
08 Apr 2025
Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization
European Conference on Computer Systems (EuroSys), 2025
Zhanda Zhu
Christina Giannoula
Muralidhar Andoorveedu
Qidong Su
Karttikeya Mangalam
Bojian Zheng
Gennady Pekhimenko
VLM
MoE
261
7
0
24 Mar 2025
ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability
Xiao Wang
A. Tsaris
Siyan Liu
Jong Youl Choi
Ming Fan
Wei Zhang
Ju Yin
M. Ashfaq
Dan Lu
Dali Wang
286
22
0
23 Apr 2024
Private Knowledge Sharing in Distributed Learning: A Survey
Yasas Supeksala
Dinh C. Nguyen
Ming Ding
Thilina Ranbaduge
Calson Chua
Jun Zhang
Jun Li
H. Vincent Poor
253
2
0
08 Feb 2024
SmartFRZ: An Efficient Training Framework using Attention-Based Layer Freezing
Sheng Li
Geng Yuan
Yuezhen Dai
Youtao Zhang
Yanzhi Wang
Xulong Tang
403
27
0
30 Jan 2024
RTP: Rethinking Tensor Parallelism with Memory Deduplication
Cheng Luo
Tianle Zhong
Geoffrey C. Fox
198
4
0
02 Nov 2023
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
Proceedings of the VLDB Endowment (PVLDB), 2023
Yanli Zhao
Andrew Gu
R. Varma
Liangchen Luo
Chien-chin Huang
...
Bernard Nguyen
Geeta Chauhan
Y. Hao
Ajit Mathews
Shen Li
FedML
MoE
485
628
0
21 Apr 2023
ProGAP: Progressive Graph Neural Networks with Differential Privacy Guarantees
Web Search and Data Mining (WSDM), 2023
Sina Sajadmanesh
D. Gática-Pérez
448
25
0
18 Apr 2023
Efficient Self-supervised Continual Learning with Progressive Task-correlated Layer Freezing
IEEE International Symposium on Quality Electronic Design (ISQED), 2023
Li Yang
Sen Lin
Fan Zhang
Junshan Zhang
Deliang Fan
CLL
311
8
0
13 Mar 2023
Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training
Neural Information Processing Systems (NeurIPS), 2022
Geng Yuan
Yanyu Li
Sheng Li
Zhenglun Kong
Sergey Tulyakov
Xulong Tang
Yanzhi Wang
Jian Ren
327
21
0
22 Sep 2022
Embedding Recycling for Language Models
Findings (Findings), 2022
Jon Saad-Falcon
Amanpreet Singh
Luca Soldaini
Mike DÁrcy
Arman Cohan
Doug Downey
KELM
229
5
0
11 Jul 2022
Federated Learning in Non-IID Settings Aided by Differentially Private Synthetic Data
Huancheng Chen
H. Vikalo
FedML
356
21
0
01 Jun 2022
End-to-end Adaptive Distributed Training on PaddlePaddle
Yulong Ao
Zhihua Wu
Dianhai Yu
Weibao Gong
Zhiqing Kui
...
Yanjun Ma
Tian Wu
Haifeng Wang
Wei Zeng
Chao Yang
294
14
0
06 Dec 2021
Pipeline Parallelism for Inference on Heterogeneous Edge Computing
Yang Hu
Connor Imes
Xuanang Zhao
Souvik Kundu
Peter A. Beerel
S. Crago
J. Walters
MoE
330
27
0
28 Oct 2021
ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training
International Conference on Machine Learning (ICML), 2021
Hui-Po Wang
Sebastian U. Stich
Yang He
Mario Fritz
FedML
AI4CE
370
69
0
11 Oct 2021
BAGUA: Scaling up Distributed Learning with System Relaxations
Shaoduo Gan
Xiangru Lian
Rui Wang
Jianbin Chang
Chengjun Liu
...
Jiawei Jiang
Binhang Yuan
Sen Yang
Ji Liu
Ce Zhang
411
35
0
03 Jul 2021
Subgraph Federated Learning with Missing Neighbor Generation
Ke Zhang
Carl Yang
Xiaoxiao Li
Lichao Sun
Siu-Ming Yiu
FedML
596
257
0
25 Jun 2021
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021
Deepak Narayanan
Mohammad Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
896
1,080
0
09 Apr 2021
Reservoir Transformers
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Sheng Shen
Alexei Baevski
Ari S. Morcos
Kurt Keutzer
Michael Auli
Douwe Kiela
479
23
0
30 Dec 2020
1
Page 1 of 1