ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.18547
  4. Cited By
Punica: Multi-Tenant LoRA Serving

Punica: Multi-Tenant LoRA Serving

28 October 2023
Lequn Chen
Zihao Ye
Yongji Wu
Danyang Zhuo
Luis Ceze
Arvind Krishnamurthy
ArXivPDFHTML

Papers citing "Punica: Multi-Tenant LoRA Serving"

24 / 24 papers shown
Title
HMI: Hierarchical Knowledge Management for Efficient Multi-Tenant Inference in Pretrained Language Models
HMI: Hierarchical Knowledge Management for Efficient Multi-Tenant Inference in Pretrained Language Models
J. Zhang
J. Wang
H. Li
Lidan Shou
Ke Chen
Gang Chen
Qin Xie
Guiming Xie
Xuejian Gong
28
0
0
24 Apr 2025
Improving the Serving Performance of Multi-LoRA Large Language Models via Efficient LoRA and KV Cache Management
Improving the Serving Performance of Multi-LoRA Large Language Models via Efficient LoRA and KV Cache Management
Hang Zhang
Jiuchen Shi
Yixiao Wang
Quan Chen
Yizhou Shan
Minyi Guo
25
0
0
19 Apr 2025
LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits
LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits
Zikai Zhou
Qizheng Zhang
Hermann Kumbong
Kunle Olukotun
MQ
163
0
0
12 Feb 2025
HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location
HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location
Ting Sun
Penghan Wang
Fan Lai
83
1
0
15 Jan 2025
Deploying Foundation Model Powered Agent Services: A Survey
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Haozhao Wang
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
112
1
0
18 Dec 2024
Quantized Delta Weight Is Safety Keeper
Quantized Delta Weight Is Safety Keeper
Yule Liu
Zhen Sun
Xinlei He
Xinyi Huang
80
2
0
29 Nov 2024
AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation
  with Out-of-order Execution
AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation with Out-of-order Execution
Zhiqiang Xie
Hao Kang
Ying Sheng
Tushar Krishna
Kayvon Fatahalian
Christos Kozyrakis
LRM
AI4CE
LLMAG
LM&Ro
35
1
0
05 Nov 2024
AdaptGCD: Multi-Expert Adapter Tuning for Generalized Category Discovery
AdaptGCD: Multi-Expert Adapter Tuning for Generalized Category Discovery
Yuxun Qu
Yongqiang Tang
Chenyang Zhang
Wensheng Zhang
24
0
0
29 Oct 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
68
5
0
28 Oct 2024
MoIN: Mixture of Introvert Experts to Upcycle an LLM
MoIN: Mixture of Introvert Experts to Upcycle an LLM
Ajinkya Tejankar
K. Navaneet
Ujjawal Panchal
Kossar Pourahmadi
Hamed Pirsiavash
MoE
29
0
0
13 Oct 2024
MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture
  of Shards
MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards
Sheng Wang
Liheng Chen
Pengan Chen
Jingwei Dong
Boyang Xue
Jiyue Jiang
Lingpeng Kong
Chuan Wu
MoE
24
7
0
01 Oct 2024
PEDRO: Parameter-Efficient Fine-tuning with Prompt DEpenDent
  Representation MOdification
PEDRO: Parameter-Efficient Fine-tuning with Prompt DEpenDent Representation MOdification
Tianfang Xie
Tianjing Li
Wei Zhu
Wei Han
Yi Zhao
22
5
0
26 Sep 2024
Post-Training Sparse Attention with Double Sparsity
Post-Training Sparse Attention with Double Sparsity
Shuo Yang
Ying Sheng
Joseph E. Gonzalez
Ion Stoica
Lianmin Zheng
28
7
0
11 Aug 2024
A Survey on LoRA of Large Language Models
A Survey on LoRA of Large Language Models
Yuren Mao
Yuhang Ge
Yijiang Fan
Wenyi Xu
Yu Mi
Zhonghao Hu
Yunjun Gao
ALM
52
23
0
08 Jul 2024
SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules
SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules
Suyi Li
Lingyun Yang
Xiaoxiao Jiang
Hanfeng Lu
Zhipeng Di
...
Tao Lan
Guodong Yang
Lin Qu
Liping Zhang
Wei Wang
23
2
0
02 Jul 2024
TorchOpera: A Compound AI System for LLM Safety
TorchOpera: A Compound AI System for LLM Safety
Shanshan Han
Yuhang Yao
Zijian Hu
Dimitris Stripelis
Zhaozhuo Xu
Chaoyang He
LLMAG
36
0
0
16 Jun 2024
ME-Switch: A Memory-Efficient Expert Switching Framework for Large
  Language Models
ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models
Jing Liu
Ruihao Gong
Mingyang Zhang
Yefei He
Jianfei Cai
Bohan Zhuang
MoE
37
0
0
13 Jun 2024
Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution
Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution
Yechen Xu
Xinhao Kong
Tingjun Chen
Danyang Zhuo
LLMAG
22
2
0
29 May 2024
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han
Chao Gao
Jinyang Liu
Jeff Zhang
Sai Qian Zhang
139
305
0
21 Mar 2024
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Xupeng Miao
Gabriele Oliaro
Xinhao Cheng
Vineeth Kada
Ruohan Gao
...
April Yang
Yingcheng Wang
Mengdi Wu
Colin Unger
Zhihao Jia
MoE
88
9
0
29 Feb 2024
Institutional Platform for Secure Self-Service Large Language Model Exploration
Institutional Platform for Secure Self-Service Large Language Model Exploration
V. Bumgardner
Mitchell A. Klusty
W. V. Logan
Samuel E. Armstrong
Caylin D. Hickey
Jeff Talbert
Caylin Hickey
Jeff Talbert
44
1
0
01 Feb 2024
Computing in the Era of Large Generative Models: From Cloud-Native to
  AI-Native
Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native
Yao Lu
Song Bian
Lequn Chen
Yongjun He
Yulong Hui
...
Huanchen Zhang
Minjia Zhang
Qizhen Zhang
Tianyi Zhou
Danyang Zhuo
13
7
0
17 Jan 2024
FlexGen: High-Throughput Generative Inference of Large Language Models
  with a Single GPU
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
144
365
0
13 Mar 2023
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,815
0
17 Sep 2019
1