Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1701.06538
Cited By
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
23 January 2017
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer"
50 / 495 papers shown
Title
The De-democratization of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research
N. Ahmed
Muntasir Wahed
23
106
0
22 Oct 2020
Anti-Distillation: Improving reproducibility of deep networks
G. Shamir
Lorenzo Coviello
42
20
0
19 Oct 2020
THIN: THrowable Information Networks and Application for Facial Expression Recognition In The Wild
Estèphe Arnaud
Arnaud Dapogny
Kévin Bailly
CVBM
29
23
0
15 Oct 2020
High-Capacity Expert Binary Networks
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
MQ
27
57
0
07 Oct 2020
Scalable Transfer Learning with Expert Models
J. Puigcerver
C. Riquelme
Basil Mustafa
Cédric Renggli
André Susano Pinto
Sylvain Gelly
Daniel Keysers
N. Houlsby
34
62
0
28 Sep 2020
Pruning Convolutional Filters using Batch Bridgeout
Najeeb Khan
Ian Stavness
23
3
0
23 Sep 2020
Multi-modal Experts Network for Autonomous Driving
Shihong Fang
A. Choromańska
MoE
20
5
0
18 Sep 2020
Anomaly Detection by Recombining Gated Unsupervised Experts
Jan-Philipp Schulze
Philip Sperl
Konstantin Böttinger
29
1
0
31 Aug 2020
S2RMs: Spatially Structured Recurrent Modules
Nasim Rahaman
Anirudh Goyal
Muhammad Waleed Gondal
M. Wuthrich
Stefan Bauer
Yash Sharma
Yoshua Bengio
Bernhard Schölkopf
21
14
0
13 Jul 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Z. Chen
MoE
25
1,106
0
30 Jun 2020
Attention-based Quantum Tomography
Peter Cha
P. Ginsparg
Felix Wu
Juan Carrasquilla
Peter L. McMahon
Eun-Ah Kim
26
72
0
22 Jun 2020
The Depth-to-Width Interplay in Self-Attention
Yoav Levine
Noam Wies
Or Sharir
Hofit Bata
Amnon Shashua
30
45
0
22 Jun 2020
Learning to Branch for Multi-Task Learning
Pengsheng Guo
Chen-Yu Lee
Daniel Ulbricht
18
174
0
02 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
35
40,023
0
28 May 2020
Consistent and Flexible Selectivity Estimation for High-Dimensional Data
Yaoshu Wang
Chuan Xiao
Jianbin Qin
Rui Mao
Onizuka Makoto
Wei Wang
Rui Zhang
Yoshiharu Ishikawa
33
14
0
20 May 2020
TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators Towards Local and in Time Domain
Weitao Li
Pengfei Xu
Yang Katie Zhao
Haitong Li
Yuan Xie
Yingyan Lin
9
68
0
03 May 2020
Computation on Sparse Neural Networks: an Inspiration for Future Hardware
Fei Sun
Minghai Qin
Tianyun Zhang
Liu Liu
Yen-kuang Chen
Yuan Xie
29
7
0
24 Apr 2020
Conditional Channel Gated Networks for Task-Aware Continual Learning
Davide Abati
Jakub M. Tomczak
Tijmen Blankevoort
Simone Calderara
Rita Cucchiara
B. Bejnordi
CLL
33
184
0
31 Mar 2020
Learning Dynamic Routing for Semantic Segmentation
Yanwei Li
Lin Song
Yukang Chen
Zeming Li
Xinming Zhang
Xingang Wang
Jian Sun
SSeg
88
163
0
23 Mar 2020
Resolution Adaptive Networks for Efficient Inference
Le Yang
Yizeng Han
Xi Chen
Shiji Song
Jifeng Dai
Gao Huang
24
215
0
16 Mar 2020
Sparse Sinkhorn Attention
Yi Tay
Dara Bahri
Liu Yang
Donald Metzler
Da-Cheng Juan
23
330
0
26 Feb 2020
Learning to Continually Learn
Shawn L. E. Beaulieu
Lapo Frati
Thomas Miconi
Joel Lehman
Kenneth O. Stanley
Jeff Clune
Nick Cheney
KELM
CLL
46
146
0
21 Feb 2020
A Survey of Deep Learning Techniques for Neural Machine Translation
Shu Yang
Yuxin Wang
X. Chu
VLM
AI4TS
AI4CE
22
138
0
18 Feb 2020
Adversarial Robustness for Code
Pavol Bielik
Martin Vechev
AAML
16
89
0
11 Feb 2020
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Max Ryabinin
Anton I. Gusev
FedML
22
48
0
10 Feb 2020
Multi-site fMRI Analysis Using Privacy-preserving Federated Learning and Domain Adaptation: ABIDE Results
Xiaoxiao Li
Yufeng Gu
Nicha Dvornek
Lawrence H. Staib
P. Ventola
James S. Duncan
FedML
OOD
10
352
0
16 Jan 2020
Attention over Parameters for Dialogue Systems
Andrea Madotto
Zhaojiang Lin
Chien-Sheng Wu
Jamin Shin
Pascale Fung
30
20
0
07 Jan 2020
A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning
Soochan Lee
Junsoo Ha
Dongsu Zhang
Gunhee Kim
BDL
CLL
11
209
0
03 Jan 2020
Machine Unlearning
Lucas Bourtoule
Varun Chandrasekaran
Christopher A. Choquette-Choo
Hengrui Jia
Adelin Travers
Baiwu Zhang
David Lie
Nicolas Papernot
MU
27
807
0
09 Dec 2019
AdaFilter: Adaptive Filter Fine-tuning for Deep Transfer Learning
Yunhui Guo
Yandong Li
Liqiang Wang
Tajana Simunic
35
41
0
21 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
88
19,440
0
23 Oct 2019
Tree-gated Deep Mixture-of-Experts For Pose-robust Face Alignment
Estèphe Arnaud
Arnaud Dapogny
Kévin Bailly
CVBM
27
10
0
21 Oct 2019
Span Selection Pre-training for Question Answering
Michael R. Glass
A. Gliozzo
Rishav Chakravarti
Anthony Ferritto
Lin Pan
G P Shrivatsa Bhargav
Dinesh Garg
Avirup Sil
RALM
38
70
0
09 Sep 2019
Situational Fusion of Visual Representation for Visual Navigation
Bokui (William) Shen
Danfei Xu
Yuke Zhu
Leonidas J. Guibas
Fei-Fei Li
Silvio Savarese
SSL
24
62
0
24 Aug 2019
Deep Learning Based Chatbot Models
Richard Csaky
29
46
0
23 Aug 2019
Convergence Rates for Gaussian Mixtures of Experts
Nhat Ho
Chiao-Yu Yang
Michael I. Jordan
21
40
0
09 Jul 2019
4K-Memristor Analog-Grade Passive Crossbar Circuit
Hyungjin Kim
H. Nili
Mahmood Mahmoodi
D. Strukov
9
165
0
27 Jun 2019
Conditional Computation for Continual Learning
Min-Bin Lin
Jie Fu
Yoshua Bengio
CLL
18
10
0
16 Jun 2019
Lightweight Network Architecture for Real-Time Action Recognition
Alexander Kozlov
Vadim Andronov
Y. Gritsenko
ViT
25
33
0
21 May 2019
Priority-based Parameter Propagation for Distributed DNN Training
Anand Jayarajan
Jinliang Wei
Garth A. Gibson
Alexandra Fedorova
Gennady Pekhimenko
AI4CE
19
178
0
10 May 2019
Routing Networks and the Challenges of Modular and Compositional Computation
Clemens Rosenbaum
Ignacio Cases
Matthew D Riemer
Tim Klinger
37
78
0
29 Apr 2019
Ray Interference: a Source of Plateaus in Deep Reinforcement Learning
Tom Schaul
Diana Borsa
Joseph Modayil
Razvan Pascanu
11
63
0
25 Apr 2019
Declarative Recursive Computation on an RDBMS, or, Why You Should Use a Database For Distributed Machine Learning
Dimitrije Jankov
Shangyu Luo
Binhang Yuan
Zhuhua Cai
Jia Zou
C. Jermaine
Zekai J. Gao
23
60
0
25 Apr 2019
Model Slicing for Supporting Complex Analytics with Elastic Inference Cost and Resource Constraints
Shaofeng Cai
Gang Chen
Beng Chin Ooi
Jinyang Gao
25
19
0
03 Apr 2019
Many Task Learning with Task Routing
Gjorgji Strezoski
Nanne van Noord
M. Worring
MoE
24
96
0
28 Mar 2019
Continual Learning via Neural Pruning
Siavash Golkar
Michael Kagan
Kyunghyun Cho
CLL
21
158
0
11 Mar 2019
Mixture Models for Diverse Machine Translation: Tricks of the Trade
T. Shen
Myle Ott
Michael Auli
MarcÁurelio Ranzato
MoE
33
148
0
20 Feb 2019
Choosing the Right Word: Using Bidirectional LSTM Tagger for Writing Support Systems
Victor Makarenkov
Lior Rokach
Bracha Shapira
16
35
0
08 Jan 2019
The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos
Hazel Doughty
W. Mayol-Cuevas
Dima Damen
30
138
0
13 Dec 2018
Deep Positron: A Deep Neural Network Using the Posit Number System
Zachariah Carmichael
Seyed Hamed Fatemi Langroudi
Char Khazanov
Jeffrey Lillie
J. Gustafson
Dhireesha Kudithipudi
MQ
9
96
0
05 Dec 2018
Previous
1
2
3
...
10
8
9
Next