ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.22134
  4. Cited By
ProMoE: Fast MoE-based LLM Serving using Proactive Caching

ProMoE: Fast MoE-based LLM Serving using Proactive Caching

29 October 2024
Xiaoniu Song
Zihang Zhong
Rong Chen
Haibo Chen
    MoE
ArXivPDFHTML

Papers citing "ProMoE: Fast MoE-based LLM Serving using Proactive Caching"

4 / 4 papers shown
Title
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
J. Zhang
Jue Wang
Y. Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
32
0
0
09 May 2025
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
Shuzhang Zhong
Y. Sun
Ling Liang
Runsheng Wang
R. Huang
Meng Li
MoE
52
0
0
08 Apr 2025
MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching
Tairan Xu
Leyang Xue
Zhan Lu
Adrian Jackson
Luo Mai
MoE
80
1
0
12 Mar 2025
fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving
fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving
Hanfei Yu
Xingqi Cui
H. M. Zhang
H. Wang
Hao Wang
MoE
44
0
0
07 Feb 2025
1