Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.08147
Cited By
P/D-Serve: Serving Disaggregated Large Language Model at Scale
15 August 2024
Yibo Jin
Tao Wang
Huimin Lin
Mingyang Song
Peiyang Li
Yipeng Ma
Yicheng Shan
Zhengfan Yuan
Cailong Li
Yajing Sun
Tiandeng Wu
Xing Chu
Ruizhi Huan
Li Ma
Xiao You
Wenting Zhou
Yunpeng Ye
Wen Liu
Xiangkun Xu
Yongsheng Zhang
Tiantian Dong
Jiawei Zhu
Zhe Wang
Xijian Ju
Jianxun Song
Haoliang Cheng
Xiaojing Li
Jiandong Ding
Hefei Guo
Zhengyong Zhang
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"P/D-Serve: Serving Disaggregated Large Language Model at Scale"
7 / 7 papers shown
Title
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
J. Li
Yixin Ji
Z. Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Z. Wang
Baoxing Huai
M. Zhang
LLMAG
77
0
0
28 Apr 2025
Seesaw: High-throughput LLM Inference via Model Re-sharding
Qidong Su
Wei Zhao
X. Li
Muralidhar Andoorveedu
Chenhao Jiang
Zhanda Zhu
Kevin Song
Christina Giannoula
Gennady Pekhimenko
LRM
70
0
0
09 Mar 2025
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
Yulei Qian
Fengcun Li
Xiangyang Ji
Xiaoyu Zhao
Jianchao Tan
K. Zhang
Xunliang Cai
MoE
68
2
0
16 Oct 2024
TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text
Songshuo Lu
Hua Wang
Yutian Rong
Zhi Chen
Yaohua Tang
VLM
28
11
0
10 Oct 2024
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Xiaozhe Ren
Pingyi Zhou
Xinfan Meng
Xinjing Huang
Yadao Wang
...
Jiansheng Wei
Xin Jiang
Teng Su
Qun Liu
Jun Yao
ALM
MoE
67
59
0
20 Mar 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
144
365
0
13 Mar 2023
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
1