Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2505.04021
Cited By

Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving

v1v2 (latest)

Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving

6 May 2025

ArXiv (abs)PDF HTML Github

Papers citing "Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving"

10 / 10 papers shown

Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity

Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity

76

0

0

01 Dec 2025

From Models to Operators: Rethinking Autoscaling Granularity for Large Generative Models

From Models to Operators: Rethinking Autoscaling Granularity for Large Generative Models

Chieh-Jan Mike Liang

152

0

0

04 Nov 2025

xLLM Technical Report

xLLM Technical Report

...

Ke Zhang

214

2

0

16 Oct 2025

FairBatching: Fairness-Aware Batch Formation for LLM Inference

FairBatching: Fairness-Aware Batch Formation for LLM Inference

124

3

0

16 Oct 2025

Prior-Aligned Meta-RL: Thompson Sampling with Learned Priors and Guarantees in Finite-Horizon MDPs

Prior-Aligned Meta-RL: Thompson Sampling with Learned Priors and Guarantees in Finite-Horizon MDPs

173

20

0

06 Oct 2025

FineServe: Precision-Aware KV Slab and Two-Level Scheduling for Heterogeneous Precision LLM Serving

FineServe: Precision-Aware KV Slab and Two-Level Scheduling for Heterogeneous Precision LLM Serving

161

3

0

08 Sep 2025

Rethinking Caching for LLM Serving Systems: Beyond Traditional Heuristics

Rethinking Caching for LLM Serving Systems: Beyond Traditional Heuristics

218

0

0

26 Aug 2025

SLOs-Serve: Optimized Serving of Multi-SLO LLMs

SLOs-Serve: Optimized Serving of Multi-SLO LLMs

Arvind Krishnamurthy

Phillip B. Gibbons

302

14

0

05 Apr 2025

Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation

Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation

200

4

0

21 Mar 2025

vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

Jayashree Mohan

Ramachandran Ramjee

509

81

0

07 May 2024

Page 1 of 1