Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1701.06538
Cited By
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
23 January 2017
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer"
50 / 499 papers shown
Title
Modular Deep Learning
Jonas Pfeiffer
Sebastian Ruder
Ivan Vulić
E. Ponti
MoMe
OOD
32
73
0
22 Feb 2023
Reusable Slotwise Mechanisms
Trang Nguyen
Amin Mansouri
Kanika Madan
Khuong N. Nguyen
Kartik Ahuja
Dianbo Liu
Yoshua Bengio
OCL
28
4
0
21 Feb 2023
NU-AIR -- A Neuromorphic Urban Aerial Dataset for Detection and Localization of Pedestrians and Vehicles
Craig Iaboni
Thomas Kelly
Pramod Abichandani
21
2
0
18 Feb 2023
Massively Multilingual Shallow Fusion with Large Language Models
Ke Hu
Tara N. Sainath
Bo-wen Li
Nan Du
Yanping Huang
Andrew M. Dai
Yu Zhang
Rodrigo Cabrera
Z. Chen
Trevor Strohman
35
13
0
17 Feb 2023
Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform
Shiwei Zhang
Lansong Diao
Siyu Wang
Zongyan Cao
Yiliang Gu
Chang Si
Ziji Shi
Zhen Zheng
Chuan Wu
W. Lin
AI4CE
27
4
0
16 Feb 2023
Balanced Audiovisual Dataset for Imbalance Analysis
Wenke Xia
Xu Zhao
Xincheng Pang
Changqing Zhang
Di Hu
37
1
0
14 Feb 2023
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models
Alexandra Chronopoulou
Matthew E. Peters
Alexander Fraser
Jesse Dodge
MoMe
32
65
0
14 Feb 2023
A Study on ReLU and Softmax in Transformer
Kai Shen
Junliang Guo
Xuejiao Tan
Siliang Tang
Rui Wang
Jiang Bian
24
53
0
13 Feb 2023
Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions
Vladimir Feinberg
Xinyi Chen
Y. Jennifer Sun
Rohan Anil
Elad Hazan
29
12
0
07 Feb 2023
Multipath agents for modular multitask ML systems
Andrea Gesmundo
28
1
0
06 Feb 2023
Improving Domain Generalization with Domain Relations
Huaxiu Yao
Xinyu Yang
Xinyi Pan
Shengchao Liu
Pang Wei Koh
Chelsea Finn
OOD
AI4CE
52
8
0
06 Feb 2023
Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
Michael E. Sander
J. Puigcerver
Josip Djolonga
Gabriel Peyré
Mathieu Blondel
21
18
0
02 Feb 2023
Towards Inference Efficient Deep Ensemble Learning
Ziyue Li
Kan Ren
Yifan Yang
Xinyang Jiang
Yuqing Yang
Dongsheng Li
BDL
23
12
0
29 Jan 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
30
31
0
27 Jan 2023
Federated Automatic Differentiation
Keith Rush
Zachary B. Charles
Zachary Garrett
FedML
34
1
0
18 Jan 2023
Gated Self-supervised Learning For Improving Supervised Learning
Erland Hilman Fuadi
Aristo Renaldo Ruslim
Putu Wahyu Kusuma Wardhana
N. Yudistira
SSL
23
0
0
14 Jan 2023
Multilingual Entity and Relation Extraction from Unified to Language-specific Training
Zixiang Wang
Jian Yang
Tongliang Li
Jiaheng Liu
Ying Mo
Jiaqi Bai
Longtao He
Zhoujun Li
20
2
0
11 Jan 2023
Deep Multi-stream Network for Video-based Calving Sign Detection
Ryosuke Hyodo
Teppei Nakano
Tetsuji Ogawa
14
0
0
10 Jan 2023
Dynamic Grained Encoder for Vision Transformers
Lin Song
Songyang Zhang
Songtao Liu
Zeming Li
Xuming He
Hongbin Sun
Jian Sun
Nanning Zheng
ViT
26
34
0
10 Jan 2023
AdaEnsemble: Learning Adaptively Sparse Structured Ensemble Network for Click-Through Rate Prediction
Yachen Yan
Liubo Li
14
3
0
06 Jan 2023
Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation
Fei Yuan
Yinquan Lu
Wenhao Zhu
Lingpeng Kong
Lei Li
Yu Qiao
Jingjing Xu
MoE
38
22
0
20 Dec 2022
RepMode: Learning to Re-parameterize Diverse Experts for Subcellular Structure Prediction
Donghao Zhou
Chunbin Gu
Junde Xu
Furui Liu
Qiong Wang
Guangyong Chen
Pheng-Ann Heng
MoE
13
4
0
20 Dec 2022
Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model
Yeskendir Koishekenov
Alexandre Berard
Vassilina Nikoulina
MoE
35
29
0
19 Dec 2022
Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual Machine Translation
Maha Elbayad
Anna Y. Sun
Shruti Bhosale
MoE
54
8
0
15 Dec 2022
ViTPose++: Vision Transformer for Generic Body Pose Estimation
Yufei Xu
Jing Zhang
Qiming Zhang
Dacheng Tao
ViT
42
40
0
07 Dec 2022
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Conglong Li
Z. Yao
Xiaoxia Wu
Minjia Zhang
Connor Holmes
Cheng Li
Yuxiong He
27
24
0
07 Dec 2022
Data-Efficient Finetuning Using Cross-Task Nearest Neighbors
Hamish Ivison
Noah A. Smith
Hannaneh Hajishirzi
Pradeep Dasigi
33
19
0
01 Dec 2022
Learning Label Modular Prompts for Text Classification in the Wild
Hailin Chen
Amrita Saha
Chenyu You
Steven C. H. Hoi
OOD
VLM
20
5
0
30 Nov 2022
SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers
Ameet Deshpande
Md Arafat Sultan
Anthony Ferritto
Ashwin Kalyan
Karthik Narasimhan
Avirup Sil
MoE
33
1
0
29 Nov 2022
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Trevor Gale
Deepak Narayanan
C. Young
Matei A. Zaharia
MoE
19
102
0
29 Nov 2022
Automatically Extracting Information in Medical Dialogue: Expert System And Attention for Labelling
Xinshi Wang
Daniel Tang
26
2
0
28 Nov 2022
AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning
Enneng Yang
Junwei Pan
Ximei Wang
Haibin Yu
Li Shen
Xihua Chen
Lei Xiao
Jie Jiang
G. Guo
38
43
0
28 Nov 2022
Spatial Mixture-of-Experts
Nikoli Dryden
Torsten Hoefler
MoE
34
9
0
24 Nov 2022
A Short Survey of Systematic Generalization
Yuanpeng Li
AI4CE
41
1
0
22 Nov 2022
HMOE: Hypernetwork-based Mixture of Experts for Domain Generalization
Jingang Qu
T. Faney
Zehao Wang
Patrick Gallinari
Soleiman Yousef
J. D. Hemptinne
OOD
24
7
0
15 Nov 2022
Hierarchically Structured Task-Agnostic Continual Learning
Heinke Hihn
Daniel A. Braun
BDL
CLL
19
8
0
14 Nov 2022
Efficiently Scaling Transformer Inference
Reiner Pope
Sholto Douglas
Aakanksha Chowdhery
Jacob Devlin
James Bradbury
Anselm Levskaya
Jonathan Heek
Kefan Xiao
Shivani Agrawal
J. Dean
34
295
0
09 Nov 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
116
2,310
0
09 Nov 2022
Using Deep Mixture-of-Experts to Detect Word Meaning Shift for TempoWiC
Ze Chen
Kangxu Wang
Zijian Cai
Jiewen Zheng
Jiarong He
Max Gao
Jason Zhang
MoE
14
3
0
07 Nov 2022
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Yogesh Balaji
Seungjun Nah
Xun Huang
Arash Vahdat
Jiaming Song
...
Timo Aila
S. Laine
Bryan Catanzaro
Tero Karras
Xuan Li
VLM
MoE
41
803
0
02 Nov 2022
Learning an Artificial Language for Knowledge-Sharing in Multilingual Translation
Danni Liu
Jan Niehues
19
5
0
02 Nov 2022
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning
Yaqing Wang
Sahaj Agarwal
Subhabrata Mukherjee
Xiaodong Liu
Jing Gao
Ahmed Hassan Awadallah
Jianfeng Gao
MoE
22
117
0
31 Oct 2022
M
3
^3
3
ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
Hanxue Liang
Zhiwen Fan
Rishov Sarkar
Ziyu Jiang
Tianlong Chen
Kai Zou
Yu Cheng
Cong Hao
Zhangyang Wang
MoE
42
81
0
26 Oct 2022
Iterative Patch Selection for High-Resolution Image Recognition
Benjamin Bergner
C. Lippert
Aravindh Mahendran
20
12
0
24 Oct 2022
BotsTalk: Machine-sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Datasets
Minju Kim
Chaehyeong Kim
Yongho Song
Seung-won Hwang
Jinyoung Yeo
39
13
0
23 Oct 2022
Disentangling Reasoning Capabilities from Language Models with Compositional Reasoning Transformers
Wanjun Zhong
Tingting Ma
Jiahai Wang
Jian Yin
T. Zhao
Chin-Yew Lin
Nan Duan
LRM
CoGe
33
2
0
20 Oct 2022
On the Adversarial Robustness of Mixture of Experts
J. Puigcerver
Rodolphe Jenatton
C. Riquelme
Pranjal Awasthi
Srinadh Bhojanapalli
OOD
AAML
MoE
42
18
0
19 Oct 2022
Federated Learning with Privacy-Preserving Ensemble Attention Distillation
Xuan Gong
Liangchen Song
Rishi Vedula
Abhishek Sharma
Meng Zheng
...
Arun Innanje
Terrence Chen
Junsong Yuan
David Doermann
Ziyan Wu
FedML
23
27
0
16 Oct 2022
Neural Routing in Meta Learning
Jicang Cai
Saeed Vahidian
Weijia Wang
M. Joneidi
Bill Lin
18
0
0
14 Oct 2022
Mind's Eye: Grounded Language Model Reasoning through Simulation
Ruibo Liu
Jason W. Wei
S. Gu
Te-Yen Wu
Soroush Vosoughi
Claire Cui
Denny Zhou
Andrew M. Dai
ReLM
LRM
118
79
0
11 Oct 2022
Previous
1
2
3
...
10
5
6
7
8
9
Next