HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE InferenceDesign Automation Conference (DAC), 2025 |
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts ModelsInternational Conference on Learning Representations (ICLR), 2024 |