Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.06537
Cited By
A Mixture of
h
−
1
h-1
h
−
1
Heads is Better than
h
h
h
Heads
13 May 2020
Hao Peng
Roy Schwartz
Dianqi Li
Noah A. Smith
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Mixture of $h-1$ Heads is Better than $h$ Heads"
7 / 7 papers shown
Title
RouterKT: Mixture-of-Experts for Knowledge Tracing
Han Liao
Shuaishuai Zu
38
0
0
11 Apr 2025
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
26
6
0
11 Apr 2022
Universal Simultaneous Machine Translation with Mixture-of-Experts Wait-k Policy
Shaolei Zhang
Yang Feng
MoE
20
39
0
11 Sep 2021
Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives
Ben Saunders
Necati Cihan Camgöz
Richard Bowden
SLR
25
50
0
23 Jul 2021
Multi-head or Single-head? An Empirical Comparison for Transformer Training
Liyuan Liu
Jialu Liu
Jiawei Han
21
32
0
17 Jun 2021
Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data
Jonathan Pilault
Amine Elhattami
C. Pal
CLL
MoE
19
89
0
19 Sep 2020
Classical Structured Prediction Losses for Sequence to Sequence Learning
Sergey Edunov
Myle Ott
Michael Auli
David Grangier
MarcÁurelio Ranzato
AIMat
48
185
0
14 Nov 2017
1