Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.13144
Cited By
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
22 May 2023
Zangwei Zheng
Xiaozhe Ren
Fuzhao Xue
Yang Luo
Xin Jiang
Yang You
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline"
33 / 33 papers shown
Title
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
J. Li
Yixin Ji
Z. Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Z. Wang
Baoxing Huai
M. Zhang
LLMAG
77
0
0
28 Apr 2025
An Empirical Study on Prompt Compression for Large Language Models
Z. Zhang
Jinyi Li
Yihuai Lan
X. Wang
Hao Wang
MQ
39
0
0
24 Apr 2025
High-Throughput LLM inference on Heterogeneous Clusters
Yi Xiong
Jinqi Huang
Wenjie Huang
Xuebing Yu
Entong Li
Zhixiong Ning
Jinhua Zhou
Li Zeng
Xin Chen
17
0
0
18 Apr 2025
Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation
Jingzhi Fang
Yanyan Shen
Y. Wang
Lei Chen
33
2
0
21 Mar 2025
AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications
Haiying Shen
Tanmoy Sen
32
0
0
17 Mar 2025
Mitigating KV Cache Competition to Enhance User Experience in LLM Inference
Haiying Shen
Tanmoy Sen
Masahiro Tanaka
67
0
0
17 Mar 2025
Queueing, Predictions, and LLMs: Challenges and Open Problems
Michael Mitzenmacher
Rana Shahout
AI4TS
LRM
31
1
0
10 Mar 2025
EEG Emotion Copilot: Optimizing Lightweight LLMs for Emotional EEG Interpretation with Assisted Medical Record Generation
Hongyu Chen
Weiming Zeng
C. L. P. Chen
Luhui Cai
Fei-Yue Wang
...
Wei Zhang
Y. Li
Hongjie Yan
W. Siok
Nizhuan Wang
34
0
0
08 Jan 2025
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Haozhao Wang
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
110
1
0
18 Dec 2024
ALISE: Accelerating Large Language Model Serving with Speculative Scheduling
Youpeng Zhao
Jun Wang
22
0
0
31 Oct 2024
BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching
Peizhuang Cong
Qizhi Chen
Haochen Zhao
Tong Yang
KELM
15
0
0
24 Oct 2024
Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs
Ferdi Kossmann
Bruce Fontaine
Daya Khudia
Michael Cafarella
Samuel Madden
38
1
0
23 Oct 2024
Don't Stop Me Now: Embedding Based Scheduling for LLMs
Rana Shahout
Eran Malach
Chunwei Liu
Weifan Jiang
Minlan Yu
Michael Mitzenmacher
AI4TS
13
4
0
01 Oct 2024
Efficient LLM Scheduling by Learning to Rank
Yichao Fu
Siqi Zhu
Runlong Su
Aurick Qiao
Ion Stoica
Hao Zhang
35
19
0
28 Aug 2024
On-Device Language Models: A Comprehensive Review
Jiajun Xu
Zhiyuan Li
Wei Chen
Qun Wang
Xin Gao
Qi Cai
Ziyuan Ling
26
24
0
26 Aug 2024
DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
Jovan Stojkovic
Chaojie Zhang
Íñigo Goiri
Josep Torrellas
Esha Choukse
27
2
0
01 Aug 2024
Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text Generation: A State-of-the-Art Investigation
Joy Mahapatra
Utpal Garain
24
8
0
19 Jul 2024
LLM Inference Serving: Survey of Recent Advances and Opportunities
Baolin Li
Yankai Jiang
V. Gadepally
Devesh Tiwari
64
15
0
17 Jul 2024
Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems
Grant Wilkins
Srinivasan Keshav
Richard Mortier
21
4
0
04 Jul 2024
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Ke Cheng
Wen Hu
Zhi Wang
Hongen Peng
Jianguo Li
Sheng Zhang
27
7
0
19 Jun 2024
Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction
Ke Cheng
Wen Hu
Zhi Wang
Peng Du
Jianguo Li
Sheng Zhang
24
10
0
07 Jun 2024
Distributed Inference Performance Optimization for LLMs on CPUs
Pujiang He
Shan Zhou
Changqing Li
Wenhuan Huang
Weifei Yu
Duyi Wang
Chen Meng
Sheng Gui
15
1
0
16 May 2024
Aladdin: Joint Placement and Scaling for SLO-Aware LLM Serving
Chengyi Nie
Rodrigo Fonseca
Zhenhua Liu
16
4
0
11 May 2024
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Tyler Griggs
Xiaoxuan Liu
Jiaxiang Yu
Doyoung Kim
Wei-Lin Chiang
Alvin Cheung
Ion Stoica
24
14
0
22 Apr 2024
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction
Haoran Qiu
Weichao Mao
Archit Patke
Shengkun Cui
Saurabh Jha
Chen Wang
Hubertus Franke
Zbigniew T. Kalbarczyk
Tamer Basar
Ravishankar K. Iyer
12
23
0
12 Apr 2024
BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems
Yuxin Wang
Yuhan Chen
Zeyu Li
Xueze Kang
Zhenheng Tang
...
Rui Guo
Xin Wang
Qiang-qiang Wang
Amelie Chi Zhou
Xiaowen Chu
28
9
0
31 Jan 2024
Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads
Cunchen Hu
Heyang Huang
Liangliang Xu
Xusheng Chen
Jiang Xu
...
Chenxi Wang
Sa Wang
Yungang Bao
Ninghui Sun
Yizhou Shan
DRL
13
58
0
20 Jan 2024
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Alicia Golden
Samuel Hsia
Fei Sun
Bilge Acun
Basil Hosmer
...
Zachary DeVito
Jeff Johnson
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
VLM
DiffM
12
3
0
22 Dec 2023
From Query Tools to Causal Architects: Harnessing Large Language Models for Advanced Causal Discovery from Data
Taiyu Ban
Lyvzhou Chen
Xiangyu Wang
Huanhuan Chen
ELM
17
58
0
29 Jun 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
138
208
0
13 Mar 2023
Leveraging Large Language Models for Multiple Choice Question Answering
Joshua Robinson
Christopher Rytting
David Wingate
ELM
121
181
0
22 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1