LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference

25 October 2020

Papers citing "LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference"

24 / 24 papers shown

Title
Patchwork: A Unified Framework for RAG Serving Bodun Hu Luis Pabon Saurabh Agarwal Aditya Akella 21 0 0 01 May 2025
SMDP-Based Dynamic Batching for Improving Responsiveness and Energy Efficiency of Batch Services Yaodan Xu Sheng Zhou Zhisheng Niu 31 2 0 04 Jan 2025
PREBA: A Hardware/Software Co-Design for Multi-Instance GPU based AI Inference Servers Gwangoo Yeo Jiin Kim Yujeong Choi Minsoo Rhu 74 0 0 28 Nov 2024
Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization Yangjie Zhou Honglin Zhu Qian Qiu Weihao Cui Zihan Liu ... Jintao Meng Haidong Lan Jingwen Leng Wenxi Zhu Minwen Deng 36 0 0 02 Sep 2024
Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling Kamran Razavi Saeid Ghafouri Max Mühlhäuser Pooyan Jamshidi Lin Wang 24 3 0 31 Mar 2024
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching Cong Guo Rui Zhang Jiale Xu Jingwen Leng Zihan Liu ... Minyi Guo Hao Wu Shouren Zhao Junping Zhao Ke Zhang VLM 78 10 0 16 Jan 2024
Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving Yinwei Dai Rui Pan Anand Iyer Kai Li Ravi Netravali 24 7 0 08 Dec 2023
vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training Jehyeon Bang Yujeong Choi Myeongwoo Kim Yongdeok Kim Minsoo Rhu 22 15 0 27 Nov 2023
A Survey of Serverless Machine Learning Model Inference Kamil Kojs 30 2 0 22 Nov 2023
MOSEL: Inference Serving Using Dynamic Modality Selection Bodun Hu Le Xu Jeongyoon Moon N. Yadwadkar Aditya Akella 11 4 0 27 Oct 2023
EdgeMatrix: A Resource-Redefined Scheduling Framework for SLA-Guaranteed Multi-Tier Edge-Cloud Computing Systems Shihao Shen Yuanming Ren Yanli Ju Xiaofei Wang Wenyu Wang Victor C. M. Leung 16 15 0 01 Aug 2023
Adaptive Scheduling for Edge-Assisted DNN Serving Jian He Chen-Shun Yang Zhaoyuan He Ghufran Baig L. Qiu 11 0 0 19 Apr 2023
Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service Baolin Li S. Samsi V. Gadepally Devesh Tiwari 20 27 0 19 Apr 2023
SMDP-Based Dynamic Batching for Efficient Inference on GPU-Based Platforms Yaodan Xu Jingzhou Sun Sheng Zhou Z. Niu 16 5 0 30 Jan 2023
KAIROS: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources Baolin Li S. Samsi V. Gadepally Devesh Tiwari 17 11 0 12 Oct 2022
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs Alexandros Kouris Stylianos I. Venieris Stefanos Laskaridis Nicholas D. Lane 30 8 0 27 Sep 2022
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization Cong Guo Chen Zhang Jingwen Leng Zihan Liu Fan Yang Yun-Bo Liu Minyi Guo Yuhao Zhu MQ 16 54 0 30 Aug 2022
Multi-user Co-inference with Batch Processing Capable Edge Server Wenqi Shi Sheng Zhou Z. Niu Miao Jiang Lu Geng 19 21 0 03 Jun 2022
Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation Liu Ke Udit Gupta Mark Hempstead Carole-Jean Wu Hsien-Hsin S. Lee Xuan Zhang 19 21 0 14 Mar 2022
PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers Yunseong Kim Yujeong Choi Minsoo Rhu 13 15 0 27 Feb 2022
VELTAIR: Towards High-Performance Multi-tenant Deep Learning Services via Adaptive Compilation and Scheduling Zihan Liu Jingwen Leng Zhihui Zhang Quan Chen Chao Li M. Guo 19 46 0 17 Jan 2022
Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning S. Choi Sunho Lee Yeonjae Kim Jongse Park Youngjin Kwon Jaehyuk Huh 19 21 0 01 Sep 2021
GPU Domain Specialization via Composable On-Package Architecture Yaosheng Fu Evgeny Bolotin Niladrish Chatterjee D. Nellans S. Keckler 12 12 0 05 Apr 2021
Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training Youngeun Kwon Yunjae Lee Minsoo Rhu 16 39 0 25 Oct 2020