Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.04785
Cited By
Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction
7 June 2024
Ke Cheng
Wen Hu
Zhi Wang
Peng Du
Jianguo Li
Sheng Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction"
8 / 8 papers shown
Title
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
J. Li
Yixin Ji
Z. Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Z. Wang
Baoxing Huai
M. Zhang
LLMAG
77
0
0
28 Apr 2025
Queueing, Predictions, and LLMs: Challenges and Open Problems
Michael Mitzenmacher
Rana Shahout
AI4TS
LRM
36
1
0
10 Mar 2025
Multi-Bin Batching for Increasing LLM Inference Throughput
Ozgur Guldogan
Jackson Kunde
Kangwook Lee
Ramtin Pedarsani
LRM
56
2
0
03 Dec 2024
Software Performance Engineering for Foundation Model-Powered Software (FMware)
Haoxiang Zhang
Shi Chang
Arthur Leung
Kishanthan Thangarajah
Boyuan Chen
Hanan Lutfiyya
Ahmed E. Hassan
54
0
0
14 Nov 2024
Don't Stop Me Now: Embedding Based Scheduling for LLMs
Rana Shahout
Eran Malach
Chunwei Liu
Weifan Jiang
Minlan Yu
Michael Mitzenmacher
AI4TS
18
4
0
01 Oct 2024
Efficient LLM Scheduling by Learning to Rank
Yichao Fu
Siqi Zhu
Runlong Su
Aurick Qiao
Ion Stoica
Hao Zhang
40
19
0
28 Aug 2024
Text Detoxification using Large Pre-trained Neural Models
David Dale
Anton Voronov
Daryna Dementieva
V. Logacheva
Olga Kozlova
Nikita Semenov
Alexander Panchenko
39
71
0
18 Sep 2021
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
Alexey Svyatkovskiy
...
Nan Duan
Neel Sundaresan
Shao Kun Deng
Shengyu Fu
Shujie Liu
ELM
190
853
0
09 Feb 2021
1