Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.03285
Cited By
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
6 November 2023
Ying Sheng
Shiyi Cao
Dacheng Li
Coleman Hooper
Nicholas Lee
Shuo Yang
Christopher Chou
Banghua Zhu
Lianmin Zheng
Kurt Keutzer
Joseph E. Gonzalez
Ion Stoica
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"S-LoRA: Serving Thousands of Concurrent LoRA Adapters"
18 / 18 papers shown
Title
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
Shan Yu
Jiarong Xing
Yifan Qiao
Mingyuan Ma
Y. Li
...
Shiyi Cao
Ke Bao
Ion Stoica
Harry Xu
Ying Sheng
24
0
0
06 May 2025
HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models
Zheng Lin
Yuxin Zhang
Zhe Chen
Zihan Fang
Xianhao Chen
Praneeth Vepakomma
Wei Ni
Jun-Jie Luo
Yue Gao
MoE
34
1
0
05 May 2025
A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data
Minh Tran
Yutong Pang
Debjyoti Paul
Laxmi Pandey
Kevin Jiang
Jinxi Guo
Ke Li
Shun Zhang
X. Zhang
Xin Lei
AI4CE
31
0
0
21 Jan 2025
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
68
5
0
28 Oct 2024
Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs
Ferdi Kossmann
Bruce Fontaine
Daya Khudia
Michael Cafarella
Samuel Madden
71
2
0
23 Oct 2024
EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models
Junhao Hu
Wenrui Huang
H. Wang
Weidong Wang
Tiancheng Hu
Qin Zhang
Hao Feng
Xusheng Chen
Yizhou Shan
Tao Xie
RALM
LLMAG
29
2
0
20 Oct 2024
LoRTA: Low Rank Tensor Adaptation of Large Language Models
Ignacio Hounie
Charilaos I. Kanatsoulis
Arnuv Tandon
Alejandro Ribeiro
29
0
0
05 Oct 2024
DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models
Yuxuan Zhang
Ruizhe Li
MoMe
53
0
0
02 Oct 2024
Evaluating Zero-Shot Long-Context LLM Compression
Chenyu Wang
Yihan Wang
Kai Li
49
0
0
10 Jun 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao
Zhenyu (Allen) Zhang
Beidi Chen
Zhangyang Wang
A. Anandkumar
Yuandong Tian
41
173
0
06 Mar 2024
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Xupeng Miao
Gabriele Oliaro
Xinhao Cheng
Vineeth Kada
Ruohan Gao
...
April Yang
Yingcheng Wang
Mengdi Wu
Colin Unger
Zhihao Jia
MoE
88
9
0
29 Feb 2024
LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild
Ziyu Zhao
Leilei Gan
Guoyin Wang
Wangchunshu Zhou
Hongxia Yang
Kun Kuang
Fei Wu
MoMe
21
28
0
15 Feb 2024
Institutional Platform for Secure Self-Service Large Language Model Exploration
V. Bumgardner
Mitchell A. Klusty
W. V. Logan
Samuel E. Armstrong
Caylin D. Hickey
Jeff Talbert
Caylin Hickey
Jeff Talbert
44
1
0
01 Feb 2024
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
144
366
0
13 Mar 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
Xiao Liu
Kaixuan Ji
Yicheng Fu
Weng Lam Tam
Zhengxiao Du
Zhilin Yang
Jie Tang
VLM
236
804
0
14 Oct 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,835
0
18 Apr 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,815
0
17 Sep 2019
1