Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.22134
Cited By
ProMoE: Fast MoE-based LLM Serving using Proactive Caching
29 October 2024
Xiaoniu Song
Zihang Zhong
Rong Chen
Haibo Chen
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ProMoE: Fast MoE-based LLM Serving using Proactive Caching"
4 / 4 papers shown
Title
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
J. Zhang
Jue Wang
Y. Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
32
0
0
09 May 2025
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
Shuzhang Zhong
Y. Sun
Ling Liang
Runsheng Wang
R. Huang
Meng Li
MoE
52
0
0
08 Apr 2025
MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching
Tairan Xu
Leyang Xue
Zhan Lu
Adrian Jackson
Luo Mai
MoE
80
1
0
12 Mar 2025
fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving
Hanfei Yu
Xingqi Cui
H. M. Zhang
H. Wang
Hao Wang
MoE
44
0
0
07 Feb 2025
1