ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.11665
  4. Cited By
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep
  Learning Serving
v1v2 (latest)

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

22 February 2023
Zhuohan Li
Lianmin Zheng
Yinmin Zhong
Vincent Liu
Ying Sheng
Xin Jin
Yanping Huang
Zhifeng Chen
Hao Zhang
Joseph E. Gonzalez
Ion Stoica
    MoE
ArXiv (abs)PDFHTML

Papers citing "AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving"

14 / 64 papers shown
Learned Best-Effort LLM Serving
Learned Best-Effort LLM Serving
Siddharth Jha
Coleman Hooper
Xiaoxuan Liu
Sehoon Kim
Kurt Keutzer
106
4
0
15 Jan 2024
OTAS: An Elastic Transformer Serving System via Token Adaptation
OTAS: An Elastic Transformer Serving System via Token AdaptationIEEE Conference on Computer Communications (INFOCOM), 2024
Jinyu Chen
Wenchao Xu
Zicong Hong
Song Guo
Yining Qi
Jie Zhang
Deze Zeng
194
4
0
10 Jan 2024
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention
  and Distributed KVCache
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Bin Lin
Chen Zhang
Tao Peng
Hanyu Zhao
Wencong Xiao
...
Shen Li
Zhigang Ji
Tao Xie
Yong Li
Jialin Li
302
76
0
05 Jan 2024
Training and Serving System of Foundation Models: A Comprehensive Survey
Training and Serving System of Foundation Models: A Comprehensive Survey
Jiahang Zhou
Yanyu Chen
Zicong Hong
Wuhui Chen
Yue Yu
Tao Zhang
Hui Wang
Chuan-fu Zhang
Zibin Zheng
ALM
223
14
0
05 Jan 2024
Fairness in Serving Large Language Models
Fairness in Serving Large Language ModelsUSENIX Symposium on Operating Systems Design and Implementation (OSDI), 2023
Ying Sheng
Shiyi Cao
Dacheng Li
Banghua Zhu
Zhuohan Li
Danyang Zhuo
Joseph E. Gonzalez
Ion Stoica
326
79
0
31 Dec 2023
SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads
SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads
Alind Khare
Dhruv Garg
Sukrit Kalra
Snigdha Grandhi
Ion Stoica
Alexey Tumanov
165
13
0
27 Dec 2023
DEAP: Design Space Exploration for DNN Accelerator Parallelism
DEAP: Design Space Exploration for DNN Accelerator Parallelism
Ekansh Agrawal
Xiangyu Sam Xu
188
1
0
24 Dec 2023
Splitwise: Efficient generative LLM inference using phase splitting
Splitwise: Efficient generative LLM inference using phase splittingInternational Symposium on Computer Architecture (ISCA), 2023
Pratyush Patel
Esha Choukse
Chaojie Zhang
Aashaka Shah
Íñigo Goiri
Saeed Maleki
Ricardo Bianchini
269
447
0
30 Nov 2023
SpotServe: Serving Generative Large Language Models on Preemptible
  Instances
SpotServe: Serving Generative Large Language Models on Preemptible InstancesInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2023
Xupeng Miao
Chunan Shi
Jiangfei Duan
Xiaoli Xi
Dahua Lin
Tengjiao Wang
Zhihao Jia
VLM
238
105
0
27 Nov 2023
HexGen: Generative Inference of Large Language Model over Heterogeneous
  Environment
HexGen: Generative Inference of Large Language Model over Heterogeneous Environment
Youhe Jiang
Ran Yan
Xiaozhe Yao
Yang Zhou
Beidi Chen
Binhang Yuan
SyDa
224
32
0
20 Nov 2023
Efficient Memory Management for Large Language Model Serving with
  PagedAttention
Efficient Memory Management for Large Language Model Serving with PagedAttentionSymposium on Operating Systems Principles (SOSP), 2023
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
1.6K
4,229
0
12 Sep 2023
Resource Management for GPT-based Model Deployed on Clouds: Challenges,
  Solutions, and Future Directions
Resource Management for GPT-based Model Deployed on Clouds: Challenges, Solutions, and Future DirectionsInternational Conference on Algorithms and Architectures for Parallel Processing (ICA3PP), 2023
Yongkang Dang
Minxian Xu
Kejiang Ye
102
2
0
05 Aug 2023
Computron: Serving Distributed Deep Learning Models with Model Parallel
  Swapping
Computron: Serving Distributed Deep Learning Models with Model Parallel Swapping
Daniel Zou
X. Jin
Xueyang Yu
Haotian Zhang
J. Demmel
MoE
198
1
0
24 Jun 2023
Fast Distributed Inference Serving for Large Language Models
Fast Distributed Inference Serving for Large Language Models
Bingyang Wu
Yinmin Zhong
Zili Zhang
Gang Huang
Xuanzhe Liu
Xin Jin
220
143
0
10 May 2023
Previous
12