ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.18789
  4. Cited By
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

29 February 2024
Xupeng Miao
Gabriele Oliaro
Xinhao Cheng
Vineeth Kada
Ruohan Gao
Yingyi Huang
Remi Delacourt
April Yang
Yingcheng Wang
Mengdi Wu
Colin Unger
Zhihao Jia
    MoE
ArXivPDFHTML

Papers citing "FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning"

12 / 12 papers shown
Title
Alchemist: Towards the Design of Efficient Online Continual Learning System
Yuyang Huang
Yuhan Liu
Haryadi S. Gunawi
Beibin Li
Changho Hwang
CLL
OnRL
72
4
0
03 Mar 2025
Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI
  Framework for Personal LLMs Fine-Tuning
Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-Tuning
Bei Ouyang
Shengyuan Ye
Liekang Zeng
Tianyi Qian
Jingyi Li
Xu Chen
18
1
0
20 Aug 2024
LLM Inference Serving: Survey of Recent Advances and Opportunities
LLM Inference Serving: Survey of Recent Advances and Opportunities
Baolin Li
Yankai Jiang
V. Gadepally
Devesh Tiwari
46
2
0
17 Jul 2024
Snake Learning: A Communication- and Computation-Efficient Distributed Learning Framework for 6G
Snake Learning: A Communication- and Computation-Efficient Distributed Learning Framework for 6G
Xiaoxue Yu
Xingfu Yi
Rongpeng Li
Fei Wang
Chenghui Peng
Zhifeng Zhao
Honggang Zhang
28
1
0
06 May 2024
FlexGen: High-Throughput Generative Inference of Large Language Models
  with a Single GPU
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
112
198
0
13 Mar 2023
Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy,
  Challenges and Vision
Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision
Wei Gao
Qi Hu
Zhisheng Ye
Peng Sun
Xiaolin Wang
Yingwei Luo
Tianwei Zhang
Yonggang Wen
40
17
0
24 May 2022
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
267
2,978
0
18 Apr 2021
What Makes Good In-Context Examples for GPT-$3$?
What Makes Good In-Context Examples for GPT-333?
Jiachang Liu
Dinghan Shen
Yizhe Zhang
Bill Dolan
Lawrence Carin
Weizhu Chen
AAML
RALM
263
991
0
17 Jan 2021
Serverless in the Wild: Characterizing and Optimizing the Serverless
  Workload at a Large Cloud Provider
Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider
Mohammad Shahrad
Rodrigo Fonseca
Íñigo Goiri
G. Chaudhry
Paul Batum
Jason Cooke
Eduardo Laureano
Colby Tresness
M. Russinovich
Ricardo Bianchini
36
466
0
06 Mar 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural
  Language Inference
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
237
1,375
0
21 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
237
1,412
0
17 Sep 2019
Neural Architecture Search with Reinforcement Learning
Neural Architecture Search with Reinforcement Learning
Barret Zoph
Quoc V. Le
257
5,034
0
05 Nov 2016
1