Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.01136
Cited By
LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
2 March 2024
Juntao Zhao
Borui Wan
Yanghua Peng
Haibin Lin
Chuan Wu
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"
11 / 11 papers shown
Title
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
Shan Yu
Jiarong Xing
Yifan Qiao
Mingyuan Ma
Y. Li
...
Shiyi Cao
Ke Bao
Ion Stoica
Harry Xu
Ying Sheng
29
0
0
06 May 2025
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
J. Li
Yixin Ji
Z. Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Z. Wang
Baoxing Huai
M. Zhang
LLMAG
77
0
0
28 Apr 2025
Efficient LLM Serving on Hybrid Real-time and Best-effort Requests
Wan Borui
Zhao Juntao
Jiang Chenyu
Guo Chuanxiong
Wu Chuan
VLM
74
1
0
13 Apr 2025
Seesaw: High-throughput LLM Inference via Model Re-sharding
Qidong Su
Wei Zhao
X. Li
Muralidhar Andoorveedu
Chenhao Jiang
Zhanda Zhu
Kevin Song
Christina Giannoula
Gennady Pekhimenko
LRM
72
0
0
09 Mar 2025
DILEMMA: Joint LLM Quantization and Distributed LLM Inference Over Edge Computing Systems
Minoo Hosseinzadeh
Hana Khamfroush
70
0
0
03 Mar 2025
QSpec: Speculative Decoding with Complementary Quantization Schemes
Juntao Zhao
Wenhao Lu
Sheng Wang
Lingpeng Kong
Chuan Wu
MQ
64
5
0
15 Oct 2024
InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference
Xiurui Pan
Endian Li
Qiao Li
Shengwen Liang
Yizhou Shan
Ke Zhou
Yingwei Luo
Xiaolin Wang
Jie Zhang
43
10
0
08 Sep 2024
On-Device Language Models: A Comprehensive Review
Jiajun Xu
Zhiyuan Li
Wei Chen
Qun Wang
Xin Gao
Qi Cai
Ziyuan Ling
39
27
0
26 Aug 2024
Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference Workloads
Grant Wilkins
Srinivasan Keshav
Richard Mortier
29
9
0
25 Apr 2024
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
144
366
0
13 Mar 2023
EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models
Jiangsu Du
Ziming Liu
Jiarui Fang
Shenggui Li
Yongbin Li
Yutong Lu
Yang You
MoE
27
4
0
06 Sep 2022
1