ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1701.06538
  4. Cited By
Outrageously Large Neural Networks: The Sparsely-Gated
  Mixture-of-Experts Layer

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

23 January 2017
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
    MoE
ArXivPDFHTML

Papers citing "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer"

50 / 499 papers shown
Title
Task-Aware Specialization for Efficient and Robust Dense Retrieval for
  Open-Domain Question Answering
Task-Aware Specialization for Efficient and Robust Dense Retrieval for Open-Domain Question Answering
Hao Cheng
Hao Fang
Xiaodong Liu
Jianfeng Gao
RALM
35
5
0
11 Oct 2022
Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from
  Mixture-of-Experts
Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts
Tao Zhong
Zhixiang Chi
Li Gu
Yang Wang
Yuanhao Yu
Jingshan Tang
OOD
66
29
0
08 Oct 2022
Few-Shot Anaphora Resolution in Scientific Protocols via Mixtures of
  In-Context Experts
Few-Shot Anaphora Resolution in Scientific Protocols via Mixtures of In-Context Experts
Nghia T. Le
Fan Bai
Alan Ritter
35
12
0
07 Oct 2022
Granularity-aware Adaptation for Image Retrieval over Multiple Tasks
Granularity-aware Adaptation for Image Retrieval over Multiple Tasks
Jon Almazán
ByungSoo Ko
Geonmo Gu
Diane Larlus
Yannis Kalantidis
ObjD
VLM
36
7
0
05 Oct 2022
SIMPLE: A Gradient Estimator for $k$-Subset Sampling
SIMPLE: A Gradient Estimator for kkk-Subset Sampling
Kareem Ahmed
Zhe Zeng
Mathias Niepert
Mathias Niepert
BDL
45
24
0
04 Oct 2022
Modular Approach to Machine Reading Comprehension: Mixture of Task-Aware
  Experts
Modular Approach to Machine Reading Comprehension: Mixture of Task-Aware Experts
Anirudha Rayasam
Anush Kamath
G. B. Kalejaiye
MoE
19
0
0
04 Oct 2022
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for
  End-to-End Speech Recognition
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition
Ye Bai
Jie Li
W. Han
Hao Ni
Kaituo Xu
Zhuo Zhang
Cheng Yi
Xiaorui Wang
MoE
26
1
0
17 Sep 2022
Adapting to Non-Centered Languages for Zero-shot Multilingual
  Translation
Adapting to Non-Centered Languages for Zero-shot Multilingual Translation
Zhi Qu
Taro Watanabe
44
7
0
09 Sep 2022
The Role Of Biology In Deep Learning
The Role Of Biology In Deep Learning
Robert Bain
27
0
0
07 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
30
109
0
31 Aug 2022
Efficient Sparsely Activated Transformers
Efficient Sparsely Activated Transformers
Salar Latifi
Saurav Muralidharan
M. Garland
MoE
19
2
0
31 Aug 2022
Accelerating Vision Transformer Training via a Patch Sampling Schedule
Accelerating Vision Transformer Training via a Patch Sampling Schedule
Bradley McDanel
C. Huynh
ViT
27
1
0
19 Aug 2022
Redesigning Multi-Scale Neural Network for Crowd Counting
Redesigning Multi-Scale Neural Network for Crowd Counting
Zhipeng Du
Miaojing Shi
Jiankang Deng
S. Zafeiriou
31
44
0
04 Aug 2022
Towards Understanding Mixture of Experts in Deep Learning
Towards Understanding Mixture of Experts in Deep Learning
Zixiang Chen
Yihe Deng
Yue-bo Wu
Quanquan Gu
Yuan-Fang Li
MLT
MoE
27
53
0
04 Aug 2022
An Optimal Likelihood Free Method for Biological Model Selection
An Optimal Likelihood Free Method for Biological Model Selection
Vincent D. Zaballa
E. Hui
34
0
0
03 Aug 2022
On-Demand Resource Management for 6G Wireless Networks Using
  Knowledge-Assisted Dynamic Neural Networks
On-Demand Resource Management for 6G Wireless Networks Using Knowledge-Assisted Dynamic Neural Networks
Longfei Ma
Nan Cheng
Xiucheng Wang
Ruijin Sun
Ning Lu
16
14
0
02 Aug 2022
The Neural Race Reduction: Dynamics of Abstraction in Gated Networks
The Neural Race Reduction: Dynamics of Abstraction in Gated Networks
Andrew M. Saxe
Shagun Sodhani
Sam Lewallen
AI4CE
30
34
0
21 Jul 2022
Adaptive Mixture of Experts Learning for Generalizable Face
  Anti-Spoofing
Adaptive Mixture of Experts Learning for Generalizable Face Anti-Spoofing
Qianyu Zhou
Ke-Yue Zhang
Taiping Yao
Ran Yi
Shouhong Ding
Lizhuang Ma
OOD
CVBM
25
47
0
20 Jul 2022
ERA: Expert Retrieval and Assembly for Early Action Prediction
ERA: Expert Retrieval and Assembly for Early Action Prediction
Lin Geng Foo
Tianjiao Li
Hossein Rahmani
Qiuhong Ke
Xiaozhong Liu
19
15
0
20 Jul 2022
MoEC: Mixture of Expert Clusters
MoEC: Mixture of Expert Clusters
Yuan Xie
Shaohan Huang
Tianyu Chen
Furu Wei
MoE
40
11
0
19 Jul 2022
Neural Implicit Dictionary via Mixture-of-Expert Training
Neural Implicit Dictionary via Mixture-of-Expert Training
Peihao Wang
Zhiwen Fan
Tianlong Chen
Zhangyang Wang
25
12
0
08 Jul 2022
Device-Cloud Collaborative Recommendation via Meta Controller
Device-Cloud Collaborative Recommendation via Meta Controller
Jiangchao Yao
Feng Wang
Xichen Ding
Shaohu Chen
Bo Han
Jingren Zhou
Hongxia Yang
30
17
0
07 Jul 2022
TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s
TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s
Felix Chern
Blake A. Hechtman
Andy Davis
Ruiqi Guo
David Majnemer
Surinder Kumar
102
22
0
28 Jun 2022
Recommender Transformers with Behavior Pathways
Recommender Transformers with Behavior Pathways
Zhiyu Yao
Xinyang Chen
Sinan Wang
Qinyan Dai
Yumeng Li
Tanchao Zhu
Mingsheng Long
19
3
0
13 Jun 2022
Towards Universal Sequence Representation Learning for Recommender
  Systems
Towards Universal Sequence Representation Learning for Recommender Systems
Yupeng Hou
Shanlei Mu
Wayne Xin Zhao
Yaliang Li
Bolin Ding
Ji-Rong Wen
AI4TS
24
200
0
13 Jun 2022
Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models
Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models
Yang Shu
Zhangjie Cao
Ziyang Zhang
Jianmin Wang
Mingsheng Long
17
4
0
08 Jun 2022
Tutel: Adaptive Mixture-of-Experts at Scale
Tutel: Adaptive Mixture-of-Experts at Scale
Changho Hwang
Wei Cui
Yifan Xiong
Ziyue Yang
Ze Liu
...
Joe Chau
Peng Cheng
Fan Yang
Mao Yang
Y. Xiong
MoE
97
110
0
07 Jun 2022
Text2Human: Text-Driven Controllable Human Image Generation
Text2Human: Text-Driven Controllable Human Image Generation
Yuming Jiang
Shuai Yang
Haonan Qiu
Wayne Wu
Chen Change Loy
Ziwei Liu
DiffM
116
46
0
31 May 2022
Gating Dropout: Communication-efficient Regularization for Sparsely
  Activated Transformers
Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers
R. Liu
Young Jin Kim
Alexandre Muzio
Hany Awadalla
MoE
50
22
0
28 May 2022
Analyzing Tree Architectures in Ensembles via Neural Tangent Kernel
Analyzing Tree Architectures in Ensembles via Neural Tangent Kernel
Ryuichi Kanoh
M. Sugiyama
31
2
0
25 May 2022
Eliciting and Understanding Cross-Task Skills with Task-Level
  Mixture-of-Experts
Eliciting and Understanding Cross-Task Skills with Task-Level Mixture-of-Experts
Qinyuan Ye
Juan Zha
Xiang Ren
MoE
18
12
0
25 May 2022
Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT
Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT
James Lee-Thorp
Joshua Ainslie
MoE
32
11
0
24 May 2022
Unified Modeling of Multi-Domain Multi-Device ASR Systems
Soumyajit Mitra
Swayambhu Nath Ray
Bharat Padi
Arunasish Sen
Raghavendra Bilgi
Harish Arsikere
Shalini Ghosh
A. Srinivasamurthy
Sri Garimella
34
3
0
13 May 2022
Fast Conditional Network Compression Using Bayesian HyperNetworks
Fast Conditional Network Compression Using Bayesian HyperNetworks
Phuoc Nguyen
T. Tran
Ky Le
Sunil R. Gupta
Santu Rana
Dang Nguyen
Trong Nguyen
S. Ryan
Svetha Venkatesh
BDL
30
6
0
13 May 2022
Lifting the Curse of Multilinguality by Pre-training Modular
  Transformers
Lifting the Curse of Multilinguality by Pre-training Modular Transformers
Jonas Pfeiffer
Naman Goyal
Xi Lin
Xian Li
James Cross
Sebastian Riedel
Mikel Artetxe
LRM
40
139
0
12 May 2022
Spot-adaptive Knowledge Distillation
Spot-adaptive Knowledge Distillation
Mingli Song
Ying Chen
Jingwen Ye
Mingli Song
22
72
0
05 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
Dongdong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
40
45
0
03 May 2022
Residual Mixture of Experts
Residual Mixture of Experts
Lemeng Wu
Mengchen Liu
Yinpeng Chen
Dongdong Chen
Xiyang Dai
Lu Yuan
MoE
22
36
0
20 Apr 2022
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
  Adaptation
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Simiao Zuo
Qingru Zhang
Chen Liang
Pengcheng He
T. Zhao
Weizhu Chen
MoE
22
38
0
15 Apr 2022
What Language Model Architecture and Pretraining Objective Work Best for
  Zero-Shot Generalization?
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
Thomas Wang
Adam Roberts
Daniel Hesslow
Teven Le Scao
Hyung Won Chung
Iz Beltagy
Julien Launay
Colin Raffel
31
167
0
12 Apr 2022
E^2TAD: An Energy-Efficient Tracking-based Action Detector
E^2TAD: An Energy-Efficient Tracking-based Action Detector
Xin Hu
Zhenyu Wu
Haoyuan Miao
Siqi Fan
Taiyu Long
...
Pengcheng Pi
Yi Wu
Zhou Ren
Zhangyang Wang
G. Hua
24
2
0
09 Apr 2022
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech
  recognition
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition
Zhao You
Shulin Feng
Dan Su
Dong Yu
22
9
0
07 Apr 2022
Dynamic Multimodal Fusion
Dynamic Multimodal Fusion
Zihui Xue
R. Marculescu
39
48
0
31 Mar 2022
CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow
CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow
Xiuchao Sui
Shaohua Li
Xue Geng
Yan Wu
Xinxing Xu
Yong Liu
Rick Siow Mong Goh
Erik Cambria
ViT
37
95
0
31 Mar 2022
Efficient Reflectance Capture with a Deep Gated Mixture-of-Experts
Efficient Reflectance Capture with a Deep Gated Mixture-of-Experts
Xiaohe Ma
Ya-Qi Yu
Hongzhi Wu
Kun Zhou
18
0
0
29 Mar 2022
Pathways: Asynchronous Distributed Dataflow for ML
Pathways: Asynchronous Distributed Dataflow for ML
P. Barham
Aakanksha Chowdhery
J. Dean
Sanjay Ghemawat
Steven Hand
...
Parker Schuh
Ryan Sepassi
Laurent El Shafey
C. A. Thekkath
Yonghui Wu
GNN
MoE
45
126
0
23 Mar 2022
Efficient Language Modeling with Sparse all-MLP
Efficient Language Modeling with Sparse all-MLP
Ping Yu
Mikel Artetxe
Myle Ott
Sam Shleifer
Hongyu Gong
Ves Stoyanov
Xian Li
MoE
23
11
0
14 Mar 2022
SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for
  Abstractive Summarization
SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization
Mathieu Ravaut
Chenyu You
Nancy F. Chen
MoE
16
91
0
13 Mar 2022
SkillNet-NLU: A Sparsely Activated Model for General-Purpose Natural
  Language Understanding
SkillNet-NLU: A Sparsely Activated Model for General-Purpose Natural Language Understanding
Fan Zhang
Duyu Tang
Yong Dai
Cong Zhou
Shuangzhi Wu
Shuming Shi
CLL
MoE
33
12
0
07 Mar 2022
Combining Modular Skills in Multitask Learning
Combining Modular Skills in Multitask Learning
E. Ponti
Alessandro Sordoni
Yoshua Bengio
Siva Reddy
MoE
12
37
0
28 Feb 2022
Previous
123...106789
Next