Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.14527
Cited By
Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
22 April 2024
Tyler Griggs
Xiaoxuan Liu
Jiaxiang Yu
Doyoung Kim
Wei-Lin Chiang
Alvin Cheung
Ion Stoica
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity"
9 / 9 papers shown
Title
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
Shan Yu
Jiarong Xing
Yifan Qiao
Mingyuan Ma
Y. Li
...
Shiyi Cao
Ke Bao
Ion Stoica
Harry Xu
Ying Sheng
24
0
0
06 May 2025
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
J. Li
Yixin Ji
Z. Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Z. Wang
Baoxing Huai
M. Zhang
LLMAG
77
0
0
28 Apr 2025
Efficient Algorithms for Verifying Kruskal Rank in Sparse Linear Regression and Related Applications
Fengqin Zhou
43
3
0
06 Mar 2025
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
71
0
0
08 Jan 2025
Software Performance Engineering for Foundation Model-Powered Software (FMware)
Haoxiang Zhang
Shi Chang
Arthur Leung
Kishanthan Thangarajah
Boyuan Chen
Hanan Lutfiyya
Ahmed E. Hassan
52
0
0
14 Nov 2024
A Strategy to Combine 1stGen Transformers and Open LLMs for Automatic Text Classification
Claudio Andrade
Washington Cunha
Davi Reis
Adriana S. Pagano
Leonardo Rocha
Marcos André Gonçalves
22
2
0
19 Aug 2024
LLM Inference Serving: Survey of Recent Advances and Opportunities
Baolin Li
Yankai Jiang
V. Gadepally
Devesh Tiwari
73
15
0
17 Jul 2024
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
Ramya Prabhu
Ajay Nayak
Jayashree Mohan
R. Ramjee
Ashish Panwar
VLM
50
24
0
07 May 2024
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
249
1,982
0
28 Jul 2020
1