ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.08784
  4. Cited By
SLOs-Serve: Optimized Serving of Multi-SLO LLMs

SLOs-Serve: Optimized Serving of Multi-SLO LLMs

5 April 2025
Siyuan Chen
Zhipeng Jia
S. Khan
Arvind Krishnamurthy
Phillip B. Gibbons
ArXiv (abs)PDFHTML

Papers citing "SLOs-Serve: Optimized Serving of Multi-SLO LLMs"

5 / 5 papers shown
Title
TokenFlow: Responsive LLM Text Streaming Serving under Request Burst via Preemptive Scheduling
TokenFlow: Responsive LLM Text Streaming Serving under Request Burst via Preemptive Scheduling
Junyi Chen
Chuheng Du
Renyuan Liu
Shuochao Yao
Dingtian Yan
Jiang Liao
Shengzhong Liu
Fan Wu
Guihai Chen
97
1
0
03 Oct 2025
HyperFlexis: Joint Design of Algorithms and Systems for Multi-SLO Serving and Fast Scaling
HyperFlexis: Joint Design of Algorithms and Systems for Multi-SLO Serving and Fast Scaling
Zahra Yousefijamarani
Xinglu Wang
Qian Wang
Morgan Lindsay Heisler
Taha Shabani
...
Xiaolong Bai
Jiannan Wang
Ying Xiong
Yong Zhang
Zhenan Fan
104
1
0
21 Aug 2025
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
Shan Yu
Jiarong Xing
Yifan Qiao
Mingyuan Ma
Y. Li
...
Shiyi Cao
Ke Bao
Ion Stoica
Harry Xu
Ying Sheng
251
12
0
06 May 2025
Patchwork: A Unified Framework for RAG Serving
Patchwork: A Unified Framework for RAG Serving
Bodun Hu
Luis Pabon
Saurabh Agarwal
Aditya Akella
199
0
0
01 May 2025
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Ramya Prabhu
Ajay Nayak
Jayashree Mohan
Ramachandran Ramjee
Ashish Panwar
VLM
329
57
0
07 May 2024
1