Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.02691
Cited By
Systems for Parallel and Distributed Large-Model Deep Learning Training
6 January 2023
Kabir Nagrecha
GNN
VLM
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Systems for Parallel and Distributed Large-Model Deep Learning Training"
9 / 9 papers shown
Title
A Comparative Analysis of Distributed Training Strategies for GPT-2
Ishan Patwardhan
Shubham Gandhi
Om M. Khare
Amit Joshi
Suraj Sawant
27
1
0
24 May 2024
Applications of Large Scale Foundation Models for Autonomous Driving
Yu Huang
Yue Chen
Zhu Li
ELM
AI4CE
LRM
ALM
LM&Ro
46
15
0
20 Nov 2023
Saturn: An Optimized Data System for Large Model Deep Learning Workloads
Kabir Nagrecha
Arun Kumar
11
6
0
03 Sep 2023
InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models
Kabir Nagrecha
Lingyi Liu
P. Delgado
Prasanna Padmanabhan
OffRL
AI4CE
25
5
0
13 Aug 2023
Hydra: A System for Large Multi-Model Deep Learning
Kabir Nagrecha
Arun Kumar
MoE
AI4CE
30
5
0
16 Oct 2021
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Shigang Li
Torsten Hoefler
GNN
AI4CE
LRM
77
131
0
14 Jul 2021
Model-Parallel Model Selection for Deep Learning Systems
Kabir Nagrecha
29
16
0
14 Jul 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
160
413
0
18 Jan 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,817
0
17 Sep 2019
1