Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.13103
Cited By
LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
25 October 2020
Yujeong Choi
Yunseong Kim
Minsoo Rhu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference"
24 / 24 papers shown
Title
Patchwork: A Unified Framework for RAG Serving
Bodun Hu
Luis Pabon
Saurabh Agarwal
Aditya Akella
21
0
0
01 May 2025
SMDP-Based Dynamic Batching for Improving Responsiveness and Energy Efficiency of Batch Services
Yaodan Xu
Sheng Zhou
Zhisheng Niu
31
2
0
04 Jan 2025
PREBA: A Hardware/Software Co-Design for Multi-Instance GPU based AI Inference Servers
Gwangoo Yeo
Jiin Kim
Yujeong Choi
Minsoo Rhu
74
0
0
28 Nov 2024
Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization
Yangjie Zhou
Honglin Zhu
Qian Qiu
Weihao Cui
Zihan Liu
...
Jintao Meng
Haidong Lan
Jingwen Leng
Wenxi Zhu
Minwen Deng
36
0
0
02 Sep 2024
Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling
Kamran Razavi
Saeid Ghafouri
Max Mühlhäuser
Pooyan Jamshidi
Lin Wang
24
3
0
31 Mar 2024
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
Cong Guo
Rui Zhang
Jiale Xu
Jingwen Leng
Zihan Liu
...
Minyi Guo
Hao Wu
Shouren Zhao
Junping Zhao
Ke Zhang
VLM
78
10
0
16 Jan 2024
Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving
Yinwei Dai
Rui Pan
Anand Iyer
Kai Li
Ravi Netravali
24
7
0
08 Dec 2023
vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training
Jehyeon Bang
Yujeong Choi
Myeongwoo Kim
Yongdeok Kim
Minsoo Rhu
22
15
0
27 Nov 2023
A Survey of Serverless Machine Learning Model Inference
Kamil Kojs
30
2
0
22 Nov 2023
MOSEL: Inference Serving Using Dynamic Modality Selection
Bodun Hu
Le Xu
Jeongyoon Moon
N. Yadwadkar
Aditya Akella
11
4
0
27 Oct 2023
EdgeMatrix: A Resource-Redefined Scheduling Framework for SLA-Guaranteed Multi-Tier Edge-Cloud Computing Systems
Shihao Shen
Yuanming Ren
Yanli Ju
Xiaofei Wang
Wenyu Wang
Victor C. M. Leung
16
15
0
01 Aug 2023
Adaptive Scheduling for Edge-Assisted DNN Serving
Jian He
Chen-Shun Yang
Zhaoyuan He
Ghufran Baig
L. Qiu
11
0
0
19 Apr 2023
Clover: Toward Sustainable AI with Carbon-Aware Machine Learning Inference Service
Baolin Li
S. Samsi
V. Gadepally
Devesh Tiwari
20
27
0
19 Apr 2023
SMDP-Based Dynamic Batching for Efficient Inference on GPU-Based Platforms
Yaodan Xu
Jingzhou Sun
Sheng Zhou
Z. Niu
16
5
0
30 Jan 2023
KAIROS: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources
Baolin Li
S. Samsi
V. Gadepally
Devesh Tiwari
17
11
0
12 Oct 2022
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs
Alexandros Kouris
Stylianos I. Venieris
Stefanos Laskaridis
Nicholas D. Lane
30
8
0
27 Sep 2022
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Cong Guo
Chen Zhang
Jingwen Leng
Zihan Liu
Fan Yang
Yun-Bo Liu
Minyi Guo
Yuhao Zhu
MQ
16
54
0
30 Aug 2022
Multi-user Co-inference with Batch Processing Capable Edge Server
Wenqi Shi
Sheng Zhou
Z. Niu
Miao Jiang
Lu Geng
19
21
0
03 Jun 2022
Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation
Liu Ke
Udit Gupta
Mark Hempstead
Carole-Jean Wu
Hsien-Hsin S. Lee
Xuan Zhang
19
21
0
14 Mar 2022
PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Yunseong Kim
Yujeong Choi
Minsoo Rhu
13
15
0
27 Feb 2022
VELTAIR: Towards High-Performance Multi-tenant Deep Learning Services via Adaptive Compilation and Scheduling
Zihan Liu
Jingwen Leng
Zhihui Zhang
Quan Chen
Chao Li
M. Guo
19
46
0
17 Jan 2022
Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning
S. Choi
Sunho Lee
Yeonjae Kim
Jongse Park
Youngjin Kwon
Jaehyuk Huh
19
21
0
01 Sep 2021
GPU Domain Specialization via Composable On-Package Architecture
Yaosheng Fu
Evgeny Bolotin
Niladrish Chatterjee
D. Nellans
S. Keckler
12
12
0
05 Apr 2021
Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training
Youngeun Kwon
Yunjae Lee
Minsoo Rhu
16
39
0
25 Oct 2020
1