Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit
Quantization and Robustness

Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness

3 October 2023

Young Jin Kim

Papers citing "Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness"

4 / 4 papers shown

Title
$D$^{2}$MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving$ D $^{2}$ MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving Haodong Wang Qihua Zhou Zicong Hong Song Guo MoE 42 0 0 17 Apr 2025
CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory Jiashun Suo Xiaojian Liao Limin Xiao Li Ruan Jinquan Wang Xiao Su Zhisheng Huo 55 0 0 04 Mar 2025
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference Sneha Kudugunta Yanping Huang Ankur Bapna M. Krikun Dmitry Lepikhin Minh-Thang Luong Orhan Firat MoE 116 87 0 24 Sep 2021
Scalable and Efficient MoE Training for Multitask Multilingual Models Young Jin Kim A. A. Awan Alexandre Muzio Andres Felipe Cruz Salinas Liyang Lu Amr Hendy Samyam Rajbhandari Yuxiong He Hany Awadalla MoE 88 82 0 22 Sep 2021