Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.14351
Cited By
ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models
25 January 2024
Yao Fu
Leyang Xue
Yeqi Huang
Andrei-Octavian Brabete
Dmitrii Ustiugov
Yuvraj Patel
Luo Mai
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models"
9 / 9 papers shown
Title
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
179
0
0
08 Jan 2025
SkyServe: Serving AI Models across Regions and Clouds with Spot Instances
Ziming Mao
Tian Xia
Zhanghao Wu
Wei-Lin Chiang
Tyler Griggs
Romil Bhardwaj
Zongheng Yang
S. Shenker
Ion Stoica
56
2
0
03 Nov 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
42
43
0
09 Jul 2024
PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation
Zhuobin Huang
Xingda Wei
Yingyi Hao
Rong Chen
Mingcong Han
Jinyu Gu
Haibo Chen
24
4
0
20 May 2024
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
368
0
13 Mar 2023
FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute
Ao Wang
Shuai Chang
Huangshi Tian
Hongqi Wang
Haoran Yang
Huiba Li
Rui Du
Yue Cheng
38
104
0
24 May 2021
Faa
T
:
A
T
r
a
n
s
p
a
r
e
n
t
A
u
t
o
−
S
c
a
l
i
n
g
C
a
c
h
e
f
o
r
S
e
r
v
e
r
l
e
s
s
A
p
p
l
i
c
a
t
i
o
n
s
T: A Transparent Auto-Scaling Cache for Serverless Applications
T
:
A
T
r
an
s
p
a
re
n
t
A
u
t
o
−
S
c
a
l
in
g
C
a
c
h
e
f
or
S
er
v
er
l
ess
A
ppl
i
c
a
t
i
o
n
s
Francisco Romero
G. Chaudhry
Íñigo Goiri
Pragna Gopa
Paul Batum
N. Yadwadkar
Rodrigo Fonseca
Christos Kozyrakis
Ricardo Bianchini
58
111
0
28 Apr 2021
Benchmarking, Analysis, and Optimization of Serverless Function Snapshots
Dmitrii Ustiugov
Plamen Petrov
Marios Kogias
Edouard Bugnion
Boris Grot
44
171
0
16 Jan 2021
Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider
Mohammad Shahrad
Rodrigo Fonseca
Íñigo Goiri
G. Chaudhry
Paul Batum
Jason Cooke
Eduardo Laureano
Colby Tresness
M. Russinovich
Ricardo Bianchini
81
601
0
06 Mar 2020
1