Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2410.17954
Cited By

ExpertFlow: Efficient Mixture-of-Experts Inference via Predictive Expert Caching and Token Scheduling

v1v2 (latest)

ExpertFlow: Efficient Mixture-of-Experts Inference via Predictive Expert Caching and Token Scheduling

23 October 2024

Yew Soon Ong

ArXiv (abs)PDF HTML Github

Papers citing "ExpertFlow: Efficient Mixture-of-Experts Inference via Predictive Expert Caching and Token Scheduling"

8 / 8 papers shown

xLLM Technical Report

xLLM Technical Report

...

Ke Zhang

217

2

0

16 Oct 2025

Accelerating Mixture-of-Expert Inference with Adaptive Expert Split Mechanism

Accelerating Mixture-of-Expert Inference with Adaptive Expert Split Mechanism

188

6

0

10 Sep 2025

SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference

SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference

341

4

0

09 Jul 2025

Brain-Like Processing Pathways Form in Models With Heterogeneous Experts

Brain-Like Processing Pathways Form in Models With Heterogeneous Experts

Rui Ponte Costa

Jascha Achterberg

435

4

0

03 Jun 2025

Advancing Expert Specialization for Better MoE

Advancing Expert Specialization for Better MoE

...

Xudong Jiang

565

23

0

28 May 2025

Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models

Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models

399

2

0

21 May 2025

HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference

HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE InferenceDesign Automation Conference (DAC), 2025

301

11

0

08 Apr 2025

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts ModelsInternational Conference on Learning Representations (ICLR), 2024

Keisuke Kamahori

575

55

0

10 Feb 2024

Page 1 of 1