ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.17954
  4. Cited By
ExpertFlow: Efficient Mixture-of-Experts Inference via Predictive Expert Caching and Token Scheduling
v1v2 (latest)

ExpertFlow: Efficient Mixture-of-Experts Inference via Predictive Expert Caching and Token Scheduling

23 October 2024
Xin He
Shunkang Zhang
Yuxin Wang
Haiyan Yin
Zihao Zeng
Shaohuai Shi
Zhenheng Tang
Xiaowen Chu
Ivor Tsang
Ong Yew Soon
Yew Soon Ong
    MoE
ArXiv (abs)PDFHTMLGithub

Papers citing "ExpertFlow: Efficient Mixture-of-Experts Inference via Predictive Expert Caching and Token Scheduling"

8 / 8 papers shown
xLLM Technical Report
xLLM Technical Report
T. Liu
Tao Peng
Peijun Yang
X. Zhao
Xiusheng Lu
...
Hailong Yang
Jing-Jing Li
Guiguang Ding
Ke Zhang
Ke Zhang
217
2
0
16 Oct 2025
Accelerating Mixture-of-Expert Inference with Adaptive Expert Split Mechanism
Accelerating Mixture-of-Expert Inference with Adaptive Expert Split Mechanism
Jiaming Yan
Jianchun Liu
Hongli Xu
Liusheng Huang
MoE
188
6
0
10 Sep 2025
SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference
SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference
Qian Chen
Xianhao Chen
Kaibin Huang
MoE
341
4
0
09 Jul 2025
Brain-Like Processing Pathways Form in Models With Heterogeneous Experts
Brain-Like Processing Pathways Form in Models With Heterogeneous Experts
Jack Cook
Danyal Akarca
Rui Ponte Costa
Jascha Achterberg
MoE
435
4
0
03 Jun 2025
Advancing Expert Specialization for Better MoE
Advancing Expert Specialization for Better MoE
Hongcan Guo
Haolang Lu
Guoshun Nan
Bolun Chu
Jialin Zhuang
...
Wenhao Che
Sicong Leng
Qimei Cui
Xudong Jiang
Xudong Jiang
MoEMoMe
565
23
0
28 May 2025
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
Jingcong Liang
Siyuan Wang
Miren Tian
Yitong Li
Duyu Tang
Zhongyu Wei
MoE
399
2
0
21 May 2025
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE InferenceDesign Automation Conference (DAC), 2025
Shuzhang Zhong
Yizhou Sun
Ling Liang
Runsheng Wang
R. Huang
Meng Li
MoE
301
11
0
08 Apr 2025
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts ModelsInternational Conference on Learning Representations (ICLR), 2024
Keisuke Kamahori
Tian Tang
Yile Gu
Kan Zhu
Baris Kasikci
575
55
0
10 Feb 2024
1
Page 1 of 1