Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.05567
Cited By
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
8 April 2024
Bowen Pan
Yikang Shen
Haokun Liu
Mayank Mishra
Gaoyuan Zhang
Aude Oliva
Colin Raffel
Rameswar Panda
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models"
7 / 7 papers shown
Title
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
28
4
0
24 Oct 2024
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition
Artem Basharin
Andrei Chertkov
Ivan V. Oseledets
34
1
0
23 Oct 2024
Mixture of Attention Heads: Selecting Attention Heads Per Token
Xiaofeng Zhang
Yikang Shen
Zeyu Huang
Jie Zhou
Wenge Rong
Zhang Xiong
MoE
93
42
0
11 Oct 2022
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
147
323
0
18 Feb 2022
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
119
104
0
24 Sep 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,977
0
31 Dec 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
223
4,424
0
23 Jan 2020
1