Poplar: Efficient Scaling of Distributed DNN Training on Heterogeneous
GPU ClustersAAAI Conference on Artificial Intelligence (AAAI), 2024 |
ED-Batch: Efficient Automatic Batching of Dynamic Neural Networks via
Learned Finite State MachinesInternational Conference on Machine Learning (ICML), 2023 |
Baechi: Fast Device Placement of Machine Learning GraphsACM Symposium on Cloud Computing (SoCC), 2020 |
PiPAD: Pipelined and Parallel Dynamic GNN Training on GPUsACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPoPP), 2023 |
A Fast Post-Training Pruning Framework for TransformersNeural Information Processing Systems (NeurIPS), 2022 |
Pathways: Asynchronous Distributed Dataflow for MLConference on Machine Learning and Systems (MLSys), 2022 |
Optimal channel selection with discrete QCQPInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022 |
Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning
ProgramsNeural Information Processing Systems (NeurIPS), 2022 |
GPUReplay: A 50-KB GPU Stack for Client MLInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2021 |
IOS: Inter-Operator Scheduler for CNN AccelerationConference on Machine Learning and Systems (MLSys), 2020 |