Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.10017
Cited By
Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud Scale Production
18 November 2022
Young Jin Kim
Rawn Henry
Raffy Fahim
Hany Awadalla
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud Scale Production"
5 / 5 papers shown
Title
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Haojie Duanmu
Xiuhong Li
Zhihang Yuan
Size Zheng
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
MoE
137
0
0
09 May 2025
TileLang: A Composable Tiled Programming Model for AI Systems
Lei Wang
Yu Cheng
Yining Shi
Zhengju Tang
Zhiwen Mo
...
Lingxiao Ma
Yuqing Xia
Jilong Xue
Fan Yang
Z. Yang
56
1
0
24 Apr 2025
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
Yulei Qian
Fengcun Li
Xiangyang Ji
Xiaoyu Zhao
Jianchao Tan
K. Zhang
Xunliang Cai
MoE
68
3
0
16 Oct 2024
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Yujun Lin
Haotian Tang
Shang Yang
Zhekai Zhang
Guangxuan Xiao
Chuang Gan
Song Han
77
76
0
07 May 2024
Scalable and Efficient MoE Training for Multitask Multilingual Models
Young Jin Kim
A. A. Awan
Alexandre Muzio
Andres Felipe Cruz Salinas
Liyang Lu
Amr Hendy
Samyam Rajbhandari
Yuxiong He
Hany Awadalla
MoE
94
84
0
22 Sep 2021
1