LithOS: An Operating System for Efficient Machine Learning on GPUsSymposium on Operating Systems Principles (SOSP), 2025 |
Privacy-Aware Joint DNN Model Deployment and Partitioning Optimization for Collaborative Edge Inference ServicesIEEE Transactions on Services Computing (TSC), 2025 |
ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in
Cloud EnvironmentsInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2024 |
HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous
Serverless FunctionsInternational Workshop on Quality of Service (IWQoS), 2024 |
Ultima: Robust and Tail-Optimal AllReduce for Distributed Deep Learning
in the CloudSymposium on Networked Systems Design and Implementation (NSDI), 2023 |
Clover: Toward Sustainable AI with Carbon-Aware Machine Learning
Inference ServiceInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2023 |