ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.13262
  4. Cited By
FastMoE: A Fast Mixture-of-Expert Training System

FastMoE: A Fast Mixture-of-Expert Training System

24 March 2021
Jiaao He
J. Qiu
Aohan Zeng
Zhilin Yang
Jidong Zhai
Jie Tang
    ALM
    MoE
ArXivPDFHTML

Papers citing "FastMoE: A Fast Mixture-of-Expert Training System"

13 / 63 papers shown
Title
Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained
  Language Models
Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models
Ze-Feng Gao
Peiyu Liu
Wayne Xin Zhao
Zhong-Yi Lu
Ji-Rong Wen
MoE
16
26
0
02 Mar 2022
A Survey on Dynamic Neural Networks for Natural Language Processing
A Survey on Dynamic Neural Networks for Natural Language Processing
Canwen Xu
Julian McAuley
AI4CE
24
28
0
15 Feb 2022
CoST: Contrastive Learning of Disentangled Seasonal-Trend
  Representations for Time Series Forecasting
CoST: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting
Gerald Woo
Chenghao Liu
Doyen Sahoo
Akshat Kumar
Steven C. H. Hoi
AI4TS
111
394
0
03 Feb 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to
  Power Next-Generation AI Scale
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
30
283
0
14 Jan 2022
SpeechMoE2: Mixture-of-Experts Model with Improved Routing
SpeechMoE2: Mixture-of-Experts Model with Improved Routing
Zhao You
Shulin Feng
Dan Su
Dong Yu
MoE
9
31
0
23 Nov 2021
Transformer-S2A: Robust and Efficient Speech-to-Animation
Transformer-S2A: Robust and Efficient Speech-to-Animation
Liyang Chen
Zhiyong Wu
Jun Ling
Runnan Li
Xu Tan
Sheng Zhao
16
18
0
18 Nov 2021
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
MoE
19
115
0
05 Oct 2021
CPM-2: Large-scale Cost-effective Pre-trained Language Models
CPM-2: Large-scale Cost-effective Pre-trained Language Models
Zhengyan Zhang
Yuxian Gu
Xu Han
Shengqi Chen
Chaojun Xiao
...
Minlie Huang
Wentao Han
Yang Liu
Xiaoyan Zhu
Maosong Sun
MoE
26
86
0
20 Jun 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
24
811
0
14 Jun 2021
M6-T: Exploring Sparse Expert Models and Beyond
M6-T: Exploring Sparse Expert Models and Beyond
An Yang
Junyang Lin
Rui Men
Chang Zhou
Le Jiang
...
Dingyang Zhang
Wei Lin
Lin Qu
Jingren Zhou
Hongxia Yang
MoE
31
24
0
31 May 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,453
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,817
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,950
0
20 Apr 2018
Previous
12