ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.15030
  4. Cited By
SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with
  Tunable Memory Budget

SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget

29 August 2023
Rui Kong
Yuanchun Li
Qingtian Feng
Weijun Wang
Xiaozhou Ye
Ye Ouyang
L. Kong
Yunxin Liu
    MoE
ArXivPDFHTML

Papers citing "SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget"

12 / 12 papers shown
Title
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
J. Zhang
Jue Wang
Y. Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
43
0
0
09 May 2025
CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory
CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory
Jiashun Suo
Xiaojian Liao
Limin Xiao
Li Ruan
Jinquan Wang
Xiao Su
Zhisheng Huo
65
0
0
04 Mar 2025
fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving
fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving
Hanfei Yu
Xingqi Cui
H. M. Zhang
H. Wang
Hao Wang
MoE
54
0
0
07 Feb 2025
HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE
  Inference
HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference
Peng Tang
Jiacheng Liu
X. Hou
Yifei Pu
Jing Wang
Pheng-Ann Heng
C. Li
M. Guo
MoE
59
7
0
03 Nov 2024
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
Xiaoniu Song
Zihang Zhong
Rong Chen
Haibo Chen
MoE
53
4
0
29 Oct 2024
ElasticTrainer: Speeding Up On-Device Training with Runtime Elastic
  Tensor Selection
ElasticTrainer: Speeding Up On-Device Training with Runtime Elastic Tensor Selection
Kai Huang
Boyuan Yang
Wei Gao
13
18
0
21 Dec 2023
AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse
  Edge Environments
AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments
Hao Wen
Yuanchun Li
Zunshuai Zhang
Shiqi Jiang
Xiaozhou Ye
Ouyang Ye
Yaqin Zhang
Yunxin Liu
51
28
0
13 Mar 2023
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert
  (MoE) Inference
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Haiyang Huang
Newsha Ardalani
Anna Y. Sun
Liu Ke
Hsien-Hsin S. Lee
Anjali Sridhar
Shruti Bhosale
Carole-Jean Wu
Benjamin C. Lee
MoE
65
22
0
10 Mar 2023
Beyond Distillation: Task-level Mixture-of-Experts for Efficient
  Inference
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
119
104
0
24 Sep 2021
Scalable and Efficient MoE Training for Multitask Multilingual Models
Scalable and Efficient MoE Training for Multitask Multilingual Models
Young Jin Kim
A. A. Awan
Alexandre Muzio
Andres Felipe Cruz Salinas
Liyang Lu
Amr Hendy
Samyam Rajbhandari
Yuxiong He
Hany Awadalla
MoE
94
83
0
22 Sep 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,424
0
23 Jan 2020
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
282
39,170
0
01 Sep 2014
1