On Optimizing the Communication of Model ParallelismConference on Machine Learning and Systems (MLSys), 2022 |
Impact of RoCE Congestion Control Policies on Distributed Training of
DNNsIEEE Symposium on High-Performance Interconnects (HI), 2022 |
Efficient Direct-Connect Topologies for Collective CommunicationsSymposium on Networked Systems Design and Implementation (NSDI), 2022 |
Themis: A Network Bandwidth-Aware Collective Scheduling Policy for
Distributed Training of DL ModelsInternational Symposium on Computer Architecture (ISCA), 2021 |