Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.06066
Cited By
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
11 January 2024
Damai Dai
Chengqi Deng
Chenggang Zhao
R. X. Xu
Huazuo Gao
Deli Chen
Jiashi Li
Wangding Zeng
Xingkai Yu
Yu-Huan Wu
Zhenda Xie
Y. K. Li
Panpan Huang
Fuli Luo
Chong Ruan
Zhifang Sui
W. Liang
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models"
4 / 54 papers shown
Title
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Xiaozhe Ren
Pingyi Zhou
Xinfan Meng
Xinjing Huang
Yadao Wang
...
Jiansheng Wei
Xin Jiang
Teng Su
Qun Liu
Jun Yao
ALM
MoE
67
60
0
20 Mar 2023
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
147
326
0
18 Feb 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,986
0
31 Dec 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,815
0
17 Sep 2019
Previous
1
2