Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.08383
Cited By
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference
16 January 2024
Jinghan Yao
Quentin G. Anthony
A. Shafi
Hari Subramoni
Dhabaleswar K.
Panda
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference"
11 / 11 papers shown
Title
Accelerating MoE Model Inference with Expert Sharding
Oana Balmau
Anne-Marie Kermarrec
Rafael Pires
André Loureiro Espírito Santo
M. Vos
Milos Vujasinovic
MoE
62
0
0
11 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
118
1
0
10 Mar 2025
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling
Yan Li
Pengfei Zheng
Shuang Chen
Zewei Xu
Yuanhao Lai
Yunfei Du
Z. Wang
MoE
107
0
0
06 Mar 2025
CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory
Jiashun Suo
Xiaojian Liao
Limin Xiao
Li Ruan
Jinquan Wang
Xiao Su
Zhisheng Huo
65
0
0
04 Mar 2025
MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing
Seokjin Go
Divya Mahajan
MoE
67
0
0
10 Feb 2025
LLM Inference Serving: Survey of Recent Advances and Opportunities
Baolin Li
Yankai Jiang
V. Gadepally
Devesh Tiwari
75
18
0
17 Jul 2024
Tutel: Adaptive Mixture-of-Experts at Scale
Changho Hwang
Wei Cui
Yifan Xiong
Ziyue Yang
Ze Liu
...
Joe Chau
Peng Cheng
Fan Yang
Mao Yang
Y. Xiong
MoE
92
109
0
07 Jun 2022
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
149
327
0
18 Feb 2022
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler
Dan Alistarh
Tal Ben-Nun
Nikoli Dryden
Alexandra Peste
MQ
139
684
0
31 Jan 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
248
1,986
0
31 Dec 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,817
0
17 Sep 2019
1