v1v2 (latest)

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers

5 February 2021

Salman Avestimehr

Papers citing "PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers"

22 / 22 papers shown

An Analysis of Layer-Freezing Strategies for Enhanced Transfer Learning in YOLO Architectures

Andrzej D. Dobrzycki

Ana M. Bernardos

José Ramón Casar

132

05 Sep 2025

Balanced and Elastic End-to-end Training of Dynamic LLMs

Mohamed Wahib

Muhammed Abdullah Soyturk

Didem Unat

MoE

379

20 May 2025

ORBIT-2: Scaling Exascale Vision Foundation Models for Weather and Climate Downscaling

...

321

07 May 2025

TAGC: Optimizing Gradient Communication in Distributed Transformer Training

Igor Polyakov

Alexey Dukhanov

Egor Spirin

307

08 Apr 2025

Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-OptimizationEuropean Conference on Computer Systems (EuroSys), 2025

Zhanda Zhu

Christina Giannoula

Muralidhar Andoorveedu

261

24 Mar 2025

ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability

Wei Zhang

286

23 Apr 2024

Private Knowledge Sharing in Distributed Learning: A Survey

Ming Ding

253

08 Feb 2024

SmartFRZ: An Efficient Training Framework using Attention-Based Layer Freezing

403

30 Jan 2024

RTP: Rethinking Tensor Parallelism with Memory Deduplication

Cheng Luo

Tianle Zhong

Geoffrey C. Fox

198

02 Nov 2023

PyTorch FSDP: Experiences on Scaling Fully Sharded Data ParallelProceedings of the VLDB Endowment (PVLDB), 2023

...

485

628

21 Apr 2023

ProGAP: Progressive Graph Neural Networks with Differential Privacy GuaranteesWeb Search and Data Mining (WSDM), 2023

Sina Sajadmanesh

D. Gática-Pérez

448

18 Apr 2023

Efficient Self-supervised Continual Learning with Progressive Task-correlated Layer FreezingIEEE International Symposium on Quality Electronic Design (ISQED), 2023

311

13 Mar 2023

Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse TrainingNeural Information Processing Systems (NeurIPS), 2022

327

22 Sep 2022

Embedding Recycling for Language ModelsFindings (Findings), 2022

Jon Saad-Falcon

Amanpreet Singh

Luca Soldaini

Mike DÁrcy

Arman Cohan

Doug Downey

KELM

229

11 Jul 2022

Federated Learning in Non-IID Settings Aided by Differentially Private Synthetic Data

Huancheng Chen

H. Vikalo

FedML

356

01 Jun 2022

End-to-end Adaptive Distributed Training on PaddlePaddle

Dianhai Yu

...

294

06 Dec 2021

Pipeline Parallelism for Inference on Heterogeneous Edge Computing

330

28 Oct 2021

ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive TrainingInternational Conference on Machine Learning (ICML), 2021

Mario Fritz

370

11 Oct 2021

BAGUA: Scaling up Distributed Learning with System Relaxations

...

Jiawei Jiang

411

03 Jul 2021

Subgraph Federated Learning with Missing Neighbor Generation

Lichao Sun

596

257

25 Jun 2021

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LMInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2021

Deepak Narayanan

...

896

1,080

09 Apr 2021

Reservoir TransformersAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

Douwe Kiela

479

30 Dec 2020