Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-OptimizationEuropean Conference on Computer Systems (EuroSys), 2025 |
Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous
GPU ClustersAAAI Conference on Artificial Intelligence (AAAI), 2024 |
Oobleck: Resilient Distributed Training of Large Models Using Pipeline
TemplatesSymposium on Operating Systems Principles (SOSP), 2023 |
Automated Tensor Model Parallelism with Overlapped Communication for
Efficient Foundation Model TrainingIEEE Transactions on Parallel and Distributed Systems (TPDS), 2023 |