ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.06905
  4. Cited By
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

13 December 2021
Nan Du
Yanping Huang
Andrew M. Dai
Simon Tong
Dmitry Lepikhin
Yuanzhong Xu
M. Krikun
Yanqi Zhou
Adams Wei Yu
Orhan Firat
Barret Zoph
Liam Fedus
Maarten Bosma
Zongwei Zhou
Tao Wang
Yu Emma Wang
Kellie Webster
Marie Pellat
Kevin Robinson
Kathy Meier-Hellstern
Toju Duke
Lucas Dixon
Kun Zhang
Quoc V. Le
Yonghui Wu
Z. Chen
Claire Cui
    ALM
    MoE
ArXivPDFHTML

Papers citing "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts"

42 / 142 papers shown
Title
SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers
SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers
A. Deshpande
Md Arafat Sultan
Anthony Ferritto
A. Kalyan
Karthik Narasimhan
Avirup Sil
MoE
33
1
0
29 Nov 2022
Spatial Mixture-of-Experts
Spatial Mixture-of-Experts
Nikoli Dryden
Torsten Hoefler
MoE
24
9
0
24 Nov 2022
Program of Thoughts Prompting: Disentangling Computation from Reasoning
  for Numerical Reasoning Tasks
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Wenhu Chen
Xueguang Ma
Xinyi Wang
William W. Cohen
ReLM
ReCod
LRM
56
732
0
22 Nov 2022
Coreference Resolution through a seq2seq Transition-Based System
Coreference Resolution through a seq2seq Transition-Based System
Bernd Bohnet
Chris Alberti
Michael Collins
14
39
0
22 Nov 2022
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large
  Language Models
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Guangxuan Xiao
Ji Lin
Mickael Seznec
Hao Wu
Julien Demouth
Song Han
MQ
61
731
0
18 Nov 2022
HMOE: Hypernetwork-based Mixture of Experts for Domain Generalization
HMOE: Hypernetwork-based Mixture of Experts for Domain Generalization
Jingang Qu
T. Faney
Zehao Wang
Patrick Gallinari
Soleiman Yousef
J. D. Hemptinne
OOD
16
7
0
15 Nov 2022
A Universal Discriminator for Zero-Shot Generalization
A Universal Discriminator for Zero-Shot Generalization
Haike Xu
Zongyu Lin
Jing Zhou
Yanan Zheng
Zhilin Yang
AI4CE
13
14
0
15 Nov 2022
Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black
  Magic?
Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black Magic?
Jean-Baptiste Döderlein
M. Acher
D. Khelladi
B. Combemale
34
33
0
26 Oct 2022
Will we run out of data? Limits of LLM scaling based on human-generated
  data
Will we run out of data? Limits of LLM scaling based on human-generated data
Pablo Villalobos
A. Ho
J. Sevilla
T. Besiroglu
Lennart Heim
Marius Hobbhahn
ALM
33
108
0
26 Oct 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
60
2,987
0
20 Oct 2022
On the Adversarial Robustness of Mixture of Experts
On the Adversarial Robustness of Mixture of Experts
J. Puigcerver
Rodolphe Jenatton
C. Riquelme
Pranjal Awasthi
Srinadh Bhojanapalli
OOD
AAML
MoE
37
18
0
19 Oct 2022
Zero-Shot Learners for Natural Language Understanding via a Unified
  Multiple Choice Perspective
Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective
Ping Yang
Junjie Wang
Ruyi Gan
Xinyu Zhu
Lin Zhang
Ziwei Wu
Xinyu Gao
Jiaxing Zhang
Tetsuya Sakai
BDL
14
25
0
16 Oct 2022
Mind's Eye: Grounded Language Model Reasoning through Simulation
Mind's Eye: Grounded Language Model Reasoning through Simulation
Ruibo Liu
Jason W. Wei
S. Gu
Te-Yen Wu
Soroush Vosoughi
Claire Cui
Denny Zhou
Andrew M. Dai
ReLM
LRM
113
79
0
11 Oct 2022
Few-Shot Anaphora Resolution in Scientific Protocols via Mixtures of
  In-Context Experts
Few-Shot Anaphora Resolution in Scientific Protocols via Mixtures of In-Context Experts
Nghia T. Le
Fan Bai
Alan Ritter
29
12
0
07 Oct 2022
Generate rather than Retrieve: Large Language Models are Strong Context
  Generators
Generate rather than Retrieve: Large Language Models are Strong Context Generators
W. Yu
Dan Iter
Shuohang Wang
Yichong Xu
Mingxuan Ju
Soumya Sanyal
Chenguang Zhu
Michael Zeng
Meng-Long Jiang
RALM
AIMat
221
321
0
21 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
28
109
0
31 Aug 2022
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq
  Model
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Saleh Soltan
Shankar Ananthakrishnan
Jack G. M. FitzGerald
Rahul Gupta
Wael Hamza
...
Mukund Sridhar
Fabian Triefenbach
Apurv Verma
Gökhan Tür
Premkumar Natarajan
39
82
0
02 Aug 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
95
1,061
0
22 Jun 2022
Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
43
2,333
0
15 Jun 2022
Language Models are General-Purpose Interfaces
Language Models are General-Purpose Interfaces
Y. Hao
Haoyu Song
Li Dong
Shaohan Huang
Zewen Chi
Wenhui Wang
Shuming Ma
Furu Wei
MLLM
21
95
0
13 Jun 2022
Tutel: Adaptive Mixture-of-Experts at Scale
Tutel: Adaptive Mixture-of-Experts at Scale
Changho Hwang
Wei Cui
Yifan Xiong
Ziyue Yang
Ze Liu
...
Joe Chau
Peng Cheng
Fan Yang
Mao Yang
Y. Xiong
MoE
92
109
0
07 Jun 2022
Decentralized Training of Foundation Models in Heterogeneous
  Environments
Decentralized Training of Foundation Models in Heterogeneous Environments
Binhang Yuan
Yongjun He
Jared Davis
Tianyi Zhang
Tri Dao
Beidi Chen
Percy Liang
Christopher Ré
Ce Zhang
20
90
0
02 Jun 2022
Eliciting and Understanding Cross-Task Skills with Task-Level
  Mixture-of-Experts
Eliciting and Understanding Cross-Task Skills with Task-Level Mixture-of-Experts
Qinyuan Ye
Juan Zha
Xiang Ren
MoE
15
12
0
25 May 2022
UL2: Unifying Language Learning Paradigms
UL2: Unifying Language Learning Paradigms
Yi Tay
Mostafa Dehghani
Vinh Q. Tran
Xavier Garcia
Jason W. Wei
...
Tal Schuster
H. Zheng
Denny Zhou
N. Houlsby
Donald Metzler
AI4CE
57
294
0
10 May 2022
Empirical Evaluation and Theoretical Analysis for Representation
  Learning: A Survey
Empirical Evaluation and Theoretical Analysis for Representation Learning: A Survey
Kento Nozawa
Issei Sato
AI4TS
16
4
0
18 Apr 2022
SkillNet-NLU: A Sparsely Activated Model for General-Purpose Natural
  Language Understanding
SkillNet-NLU: A Sparsely Activated Model for General-Purpose Natural Language Understanding
Fan Zhang
Duyu Tang
Yong Dai
Cong Zhou
Shuangzhi Wu
Shuming Shi
CLL
MoE
25
12
0
07 Mar 2022
DeepNet: Scaling Transformers to 1,000 Layers
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
15
155
0
01 Mar 2022
Transformer Quality in Linear Time
Transformer Quality in Linear Time
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
71
222
0
21 Feb 2022
Mixture-of-Experts with Expert Choice Routing
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
151
327
0
18 Feb 2022
Unified Scaling Laws for Routed Language Models
Unified Scaling Laws for Routed Language Models
Aidan Clark
Diego de Las Casas
Aurelia Guy
A. Mensch
Michela Paganini
...
Oriol Vinyals
Jack W. Rae
Erich Elsen
Koray Kavukcuoglu
Karen Simonyan
MoE
27
177
0
02 Feb 2022
One Student Knows All Experts Know: From Sparse to Dense
One Student Knows All Experts Know: From Sparse to Dense
Fuzhao Xue
Xiaoxin He
Xiaozhe Ren
Yuxuan Lou
Yang You
MoMe
MoE
27
20
0
26 Jan 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to
  Power Next-Generation AI Scale
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
30
283
0
14 Jan 2022
Efficient Large Scale Language Modeling with Mixtures of Experts
Efficient Large Scale Language Modeling with Mixtures of Experts
Mikel Artetxe
Shruti Bhosale
Naman Goyal
Todor Mihaylov
Myle Ott
...
Jeff Wang
Luke Zettlemoyer
Mona T. Diab
Zornitsa Kozareva
Ves Stoyanov
MoE
50
188
0
20 Dec 2021
Tricks for Training Sparse Translation Models
Tricks for Training Sparse Translation Models
Dheeru Dua
Shruti Bhosale
Vedanuj Goswami
James Cross
M. Lewis
Angela Fan
MoE
145
19
0
15 Oct 2021
Challenges in Detoxifying Language Models
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
242
193
0
15 Sep 2021
Finetuned Language Models Are Zero-Shot Learners
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
31
3,561
0
03 Sep 2021
GSPMD: General and Scalable Parallelization for ML Computation Graphs
GSPMD: General and Scalable Parallelization for ML Computation Graphs
Yuanzhong Xu
HyoukJoong Lee
Dehao Chen
Blake A. Hechtman
Yanping Huang
...
Noam M. Shazeer
Shibo Wang
Tao Wang
Yonghui Wu
Zhifeng Chen
MoE
28
127
0
10 May 2021
Carbon Emissions and Large Neural Network Training
Carbon Emissions and Large Neural Network Training
David A. Patterson
Joseph E. Gonzalez
Quoc V. Le
Chen Liang
Lluís-Miquel Munguía
D. Rothchild
David R. So
Maud Texier
J. Dean
AI4CE
239
643
0
21 Apr 2021
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
272
1,812
0
14 Dec 2020
Efficient Transformers: A Survey
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
74
1,101
0
14 Sep 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,817
0
17 Sep 2019
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
228
31,253
0
16 Jan 2013
Previous
123