Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2505.04021
Cited By
v1
v2 (latest)
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
6 May 2025
Shan Yu
Jiarong Xing
Yifan Qiao
Mingyuan Ma
Y. Li
Yang Wang
Shuo Yang
Zhiqiang Xie
Shiyi Cao
Ke Bao
Ion Stoica
Harry Xu
Ying Sheng
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving"
10 / 10 papers shown
Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity
Wenbin Zhu
Zhaoyan Shen
Z. Shao
Hongjun Dai
Feng Chen
76
0
0
01 Dec 2025
From Models to Operators: Rethinking Autoscaling Granularity for Large Generative Models
Xingqi Cui
Chieh-Jan Mike Liang
Jiarong Xing
Haoran Qiu
152
0
0
04 Nov 2025
xLLM Technical Report
T. Liu
Tao Peng
Peijun Yang
X. Zhao
Xiusheng Lu
...
Hailong Yang
Jing-Jing Li
Guiguang Ding
Ke Zhang
Ke Zhang
214
2
0
16 Oct 2025
FairBatching: Fairness-Aware Batch Formation for LLM Inference
Hongtao Lyu
Boyue Liu
Mingyu Wu
Haibo Chen
124
3
0
16 Oct 2025
Prior-Aligned Meta-RL: Thompson Sampling with Learned Priors and Guarantees in Finite-Horizon MDPs
Runlin Zhou
Chixiang Chen
Elynn Chen
OffRL
173
20
0
06 Oct 2025
FineServe: Precision-Aware KV Slab and Two-Level Scheduling for Heterogeneous Precision LLM Serving
Kyungmin Bin
Seungbeom Choi
Jimyoung Son
Jieun Choi
Daseul Bae
Daehyeon Baek
Kihyo Moon
Minsung Jang
Hyojung Lee
161
3
0
08 Sep 2025
Rethinking Caching for LLM Serving Systems: Beyond Traditional Heuristics
Jungwoo Kim
Minsang Kim
Jaeheon Lee
Chanwoo Moon
Heejin Kim
Taeho Hwang
Woosuk Chung
Yeseong Kim
Sungjin Lee
218
0
0
26 Aug 2025
SLOs-Serve: Optimized Serving of Multi-SLO LLMs
Siyuan Chen
Zhipeng Jia
S. Khan
Arvind Krishnamurthy
Phillip B. Gibbons
302
14
0
05 Apr 2025
Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation
Jingzhi Fang
Yanyan Shen
Yijiao Wang
Lei Chen
200
4
0
21 Mar 2025
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Ramya Prabhu
Ajay Nayak
Jayashree Mohan
Ramachandran Ramjee
Ashish Panwar
VLM
509
81
0
07 May 2024
1
Page 1 of 1