Mixture-of-Experts with Expert Choice Routing

Mixture-of-Experts with Expert Choice Routing

18 February 2022

Papers citing "Mixture-of-Experts with Expert Choice Routing"

9 / 9 papers shown

Title
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing Piotr Piekos Róbert Csordás Jürgen Schmidhuber MoE VLM 59 0 0 01 May 2025
MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification Yichu Xu Di Wang Hongzan Jiao L. Zhang L. Zhang Mamba 61 0 0 29 Apr 2025
Taming the Titans: A Survey of Efficient LLM Inference Serving Ranran Zhen J. Li Yixin Ji Z. Yang Tong Liu Qingrong Xia Xinyu Duan Z. Wang Baoxing Huai M. Zhang LLMAG 54 101 0 28 Apr 2025
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication Athinagoras Skiadopoulos Mark Zhao Swapnil Gandhi Thomas Norrie Shrijeet Mukherjee Christos Kozyrakis MoE 68 50 0 28 Apr 2025
BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts Qingyue Wang Qi Pang Xixun Lin Shuai Wang Daoyuan Wu MoE 39 0 0 24 Apr 2025
Tricks for Training Sparse Translation Models Dheeru Dua Shruti Bhosale Vedanuj Goswami James Cross M. Lewis Angela Fan MoE 110 18 0 15 Oct 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 215 3,054 0 23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 224 1,436 0 17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 275 6,003 0 20 Apr 2018