Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.04260
Cited By
Taming Sparsely Activated Transformer with Stochastic Experts
8 October 2021
Simiao Zuo
Xiaodong Liu
Jian Jiao
Young Jin Kim
Hany Hassan
Ruofei Zhang
T. Zhao
Jianfeng Gao
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Taming Sparsely Activated Transformer with Stochastic Experts"
33 / 83 papers shown
Title
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Yongqian Huang
Peng Ye
Xiaoshui Huang
Sheng R. Li
Tao Chen
Tong He
Wanli Ouyang
MoMe
11
8
0
11 Aug 2023
FedJETs: Efficient Just-In-Time Personalization with Federated Mixture of Experts
Chen Dun
Mirian Hipolito Garcia
Guoqing Zheng
Ahmed Hassan Awadallah
Robert Sim
Anastasios Kyrillidis
Dimitrios Dimitriadis
FedML
MoE
24
6
0
14 Jun 2023
Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories
Shizhe Diao
Tianyang Xu
Ruijia Xu
Jiawei Wang
Tong Zhang
MoE
AI4CE
11
36
0
08 Jun 2023
Soft Merging of Experts with Adaptive Routing
Mohammed Muqeeth
Haokun Liu
Colin Raffel
MoMe
MoE
24
45
0
06 Jun 2023
COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local Search
Shibal Ibrahim
Wenyu Chen
Hussein Hazimeh
Natalia Ponomareva
Zhe Zhao
Rahul Mazumder
MoE
19
3
0
05 Jun 2023
Brainformers: Trading Simplicity for Efficiency
Yan-Quan Zhou
Nan Du
Yanping Huang
Daiyi Peng
Chang Lan
...
Zhifeng Chen
Quoc V. Le
Claire Cui
J.H.J. Laundon
J. Dean
MoE
8
23
0
29 May 2023
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
Sheng Shen
Le Hou
Yan-Quan Zhou
Nan Du
Shayne Longpre
...
Vincent Zhao
Hongkun Yu
Kurt Keutzer
Trevor Darrell
Denny Zhou
ALM
MoE
25
54
0
24 May 2023
One-stop Training of Multiple Capacity Models
Lan Jiang
Haoyang Huang
Dongdong Zhang
R. Jiang
Furu Wei
26
0
0
23 May 2023
Chain-of-Skills: A Configurable Model for Open-domain Question Answering
Kaixin Ma
Hao Cheng
Yu Zhang
Xiaodong Liu
Eric Nyberg
Jianfeng Gao
LRM
12
15
0
04 May 2023
Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity
Da Xu
Maha Elbayad
Kenton W. Murray
Jean Maillard
Vedanuj Goswami
MoE
39
3
0
03 May 2023
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training
Siddharth Singh
Olatunji Ruwase
A. A. Awan
Samyam Rajbhandari
Yuxiong He
A. Bhatele
MoE
30
30
0
11 Mar 2023
Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers
Tianlong Chen
Zhenyu (Allen) Zhang
Ajay Jaiswal
Shiwei Liu
Zhangyang Wang
MoE
25
46
0
02 Mar 2023
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
Jingjing Jiang
Nanning Zheng
MoE
32
6
0
02 Mar 2023
Modular Deep Learning
Jonas Pfeiffer
Sebastian Ruder
Ivan Vulić
E. Ponti
MoMe
OOD
19
73
0
22 Feb 2023
Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual Machine Translation
Maha Elbayad
Anna Y. Sun
Shruti Bhosale
MoE
41
8
0
15 Dec 2022
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning
Yaqing Wang
Sahaj Agarwal
Subhabrata Mukherjee
Xiaodong Liu
Jing Gao
Ahmed Hassan Awadallah
Jianfeng Gao
MoE
11
117
0
31 Oct 2022
Accelerating Distributed MoE Training and Inference with Lina
Jiamin Li
Yimin Jiang
Yibo Zhu
Cong Wang
Hong-Yu Xu
MoE
17
57
0
31 Oct 2022
AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation
Ganesh Jawahar
Subhabrata Mukherjee
Xiaodong Liu
Young Jin Kim
Muhammad Abdul-Mageed
L. Lakshmanan
Ahmed Hassan Awadallah
Sébastien Bubeck
Jianfeng Gao
MoE
17
5
0
14 Oct 2022
A Review of Sparse Expert Models in Deep Learning
W. Fedus
J. Dean
Barret Zoph
MoE
10
144
0
04 Sep 2022
ADMoE: Anomaly Detection with Mixture-of-Experts from Noisy Labels
Yue Zhao
Guoqing Zheng
Subhabrata Mukherjee
R. McCann
Ahmed Hassan Awadallah
NoLa
15
25
0
24 Aug 2022
Is a Modular Architecture Enough?
Sarthak Mittal
Yoshua Bengio
Guillaume Lajoie
11
47
0
06 Jun 2022
Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Model
Xuan-Phi Nguyen
Shafiq R. Joty
Wu Kui
A. Aw
LRM
10
3
0
31 May 2022
Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers
R. Liu
Young Jin Kim
Alexandre Muzio
Hany Awadalla
MoE
34
22
0
28 May 2022
Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners
Shashank Gupta
Subhabrata Mukherjee
K. Subudhi
Eduardo Gonzalez
Damien Jose
Ahmed Hassan Awadallah
Jianfeng Gao
MoE
11
49
0
16 Apr 2022
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Simiao Zuo
Qingru Zhang
Chen Liang
Pengcheng He
T. Zhao
Weizhu Chen
MoE
14
38
0
15 Apr 2022
METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals
Payal Bajaj
Chenyan Xiong
Guolin Ke
Xiaodong Liu
Di He
Saurabh Tiwary
Tie-Yan Liu
Paul N. Bennett
Xia Song
Jianfeng Gao
42
32
0
13 Apr 2022
Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models
Ze-Feng Gao
Peiyu Liu
Wayne Xin Zhao
Zhong-Yi Lu
Ji-Rong Wen
MoE
16
26
0
02 Mar 2022
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
149
326
0
18 Feb 2022
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Barret Zoph
Irwan Bello
Sameer Kumar
Nan Du
Yanping Huang
J. Dean
Noam M. Shazeer
W. Fedus
MoE
24
181
0
17 Feb 2022
A Survey on Dynamic Neural Networks for Natural Language Processing
Canwen Xu
Julian McAuley
AI4CE
24
28
0
15 Feb 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
30
282
0
14 Jan 2022
Scalable and Efficient MoE Training for Multitask Multilingual Models
Young Jin Kim
A. A. Awan
Alexandre Muzio
Andres Felipe Cruz Salinas
Liyang Lu
Amr Hendy
Samyam Rajbhandari
Yuxiong He
Hany Awadalla
MoE
94
84
0
22 Sep 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
Previous
1
2