Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.14336
Cited By
Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers
28 May 2022
R. Liu
Young Jin Kim
Alexandre Muzio
Hany Awadalla
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers"
19 / 19 papers shown
Title
Importance Sampling via Score-based Generative Models
Heasung Kim
Taekyun Lee
Hyeji Kim
Gustavo de Veciana
MedIm
DiffM
129
1
0
07 Feb 2025
Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation
Fahao Chen
Peng Li
Zicong Hong
Zhou Su
Song Guo
MoMe
MoE
67
0
0
23 Nov 2024
Exploring the Benefit of Activation Sparsity in Pre-training
Zhengyan Zhang
Chaojun Xiao
Qiujieli Qin
Yankai Lin
Zhiyuan Zeng
Xu Han
Zhiyuan Liu
Ruobing Xie
Maosong Sun
Jie Zhou
MoE
58
3
0
04 Oct 2024
Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules
Xinglin Pan
Wenxiang Lin
S. Shi
Xiaowen Chu
Weinong Sun
Bo Li
MoE
41
3
0
30 Jun 2024
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture
Fei Wang
Dan Guo
Kun Li
Zhun Zhong
Mengqing Wang
34
16
0
12 Mar 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
36
46
0
15 Feb 2024
Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness
Young Jin Kim
Raffy Fahim
Hany Awadalla
MQ
MoE
58
19
0
03 Oct 2023
Task-Based MoE for Multitask Multilingual Machine Translation
Hai Pham
Young Jin Kim
Subhabrata Mukherjee
David P. Woodruff
Barnabás Póczós
Hany Awadalla
MoE
28
4
0
30 Aug 2023
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs
Young Jin Kim
Rawn Henry
Raffy Fahim
Hany Awadalla
MQ
23
19
0
16 Aug 2023
Soft Merging of Experts with Adaptive Routing
Mohammed Muqeeth
Haokun Liu
Colin Raffel
MoMe
MoE
24
45
0
06 Jun 2023
Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity
Da Xu
Maha Elbayad
Kenton W. Murray
Jean Maillard
Vedanuj Goswami
MoE
39
3
0
03 May 2023
Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual Machine Translation
Maha Elbayad
Anna Y. Sun
Shruti Bhosale
MoE
46
8
0
15 Dec 2022
Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud Scale Production
Young Jin Kim
Rawn Henry
Raffy Fahim
Hany Awadalla
MoE
21
23
0
18 Nov 2022
AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation
Ganesh Jawahar
Subhabrata Mukherjee
Xiaodong Liu
Young Jin Kim
Muhammad Abdul-Mageed
L. Lakshmanan
Ahmed Hassan Awadallah
Sébastien Bubeck
Jianfeng Gao
MoE
22
5
0
14 Oct 2022
MoEC: Mixture of Expert Clusters
Yuan Xie
Shaohan Huang
Tianyu Chen
Furu Wei
MoE
40
11
0
19 Jul 2022
Transformer with Memory Replay
R. Liu
Barzan Mozafari
OffRL
62
4
0
19 May 2022
Scalable and Efficient MoE Training for Multitask Multilingual Models
Young Jin Kim
A. A. Awan
Alexandre Muzio
Andres Felipe Cruz Salinas
Liyang Lu
Amr Hendy
Samyam Rajbhandari
Yuxiong He
Hany Awadalla
MoE
94
84
0
22 Sep 2021
Dropout: Explicit Forms and Capacity Control
R. Arora
Peter L. Bartlett
Poorya Mianjy
Nathan Srebro
50
37
0
06 Mar 2020
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Y. Gal
Zoubin Ghahramani
UQCV
BDL
249
9,134
0
06 Jun 2015
1