ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.04021
  4. Cited By
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
v1v2 (latest)

Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving

6 May 2025
Shan Yu
Jiarong Xing
Yifan Qiao
Mingyuan Ma
Y. Li
Yang Wang
Shuo Yang
Zhiqiang Xie
Shiyi Cao
Ke Bao
Ion Stoica
Harry Xu
Ying Sheng
ArXiv (abs)PDFHTMLGithub

Papers citing "Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving"

10 / 10 papers shown
Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity
Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity
Wenbin Zhu
Zhaoyan Shen
Z. Shao
Hongjun Dai
Feng Chen
76
0
0
01 Dec 2025
From Models to Operators: Rethinking Autoscaling Granularity for Large Generative Models
From Models to Operators: Rethinking Autoscaling Granularity for Large Generative Models
Xingqi Cui
Chieh-Jan Mike Liang
Jiarong Xing
Haoran Qiu
152
0
0
04 Nov 2025
xLLM Technical Report
xLLM Technical Report
T. Liu
Tao Peng
Peijun Yang
X. Zhao
Xiusheng Lu
...
Hailong Yang
Jing-Jing Li
Guiguang Ding
Ke Zhang
Ke Zhang
214
2
0
16 Oct 2025
FairBatching: Fairness-Aware Batch Formation for LLM Inference
FairBatching: Fairness-Aware Batch Formation for LLM Inference
Hongtao Lyu
Boyue Liu
Mingyu Wu
Haibo Chen
124
3
0
16 Oct 2025
Prior-Aligned Meta-RL: Thompson Sampling with Learned Priors and Guarantees in Finite-Horizon MDPs
Prior-Aligned Meta-RL: Thompson Sampling with Learned Priors and Guarantees in Finite-Horizon MDPs
Runlin Zhou
Chixiang Chen
Elynn Chen
OffRL
173
20
0
06 Oct 2025
FineServe: Precision-Aware KV Slab and Two-Level Scheduling for Heterogeneous Precision LLM Serving
FineServe: Precision-Aware KV Slab and Two-Level Scheduling for Heterogeneous Precision LLM Serving
Kyungmin Bin
Seungbeom Choi
Jimyoung Son
Jieun Choi
Daseul Bae
Daehyeon Baek
Kihyo Moon
Minsung Jang
Hyojung Lee
161
3
0
08 Sep 2025
Rethinking Caching for LLM Serving Systems: Beyond Traditional Heuristics
Rethinking Caching for LLM Serving Systems: Beyond Traditional Heuristics
Jungwoo Kim
Minsang Kim
Jaeheon Lee
Chanwoo Moon
Heejin Kim
Taeho Hwang
Woosuk Chung
Yeseong Kim
Sungjin Lee
218
0
0
26 Aug 2025
SLOs-Serve: Optimized Serving of Multi-SLO LLMs
SLOs-Serve: Optimized Serving of Multi-SLO LLMs
Siyuan Chen
Zhipeng Jia
S. Khan
Arvind Krishnamurthy
Phillip B. Gibbons
302
14
0
05 Apr 2025
Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation
Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation
Jingzhi Fang
Yanyan Shen
Yijiao Wang
Lei Chen
200
4
0
21 Mar 2025
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Ramya Prabhu
Ajay Nayak
Jayashree Mohan
Ramachandran Ramjee
Ashish Panwar
VLM
509
81
0
07 May 2024
1
Page 1 of 1