Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.05144
Cited By
Mixture of Attention Heads: Selecting Attention Heads Per Token
11 October 2022
Xiaofeng Zhang
Yikang Shen
Zeyu Huang
Jie Zhou
Wenge Rong
Zhang Xiong
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mixture of Attention Heads: Selecting Attention Heads Per Token"
6 / 6 papers shown
Title
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
88
0
0
01 May 2025
MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification
Yichu Xu
Di Wang
Hongzan Jiao
L. Zhang
L. Zhang
Mamba
80
0
0
29 Apr 2025
RouterKT: Mixture-of-Experts for Knowledge Tracing
Han Liao
Shuaishuai Zu
31
0
0
11 Apr 2025
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
46
2
0
13 Aug 2024
Tricks for Training Sparse Translation Models
Dheeru Dua
Shruti Bhosale
Vedanuj Goswami
James Cross
M. Lewis
Angela Fan
MoE
131
18
0
15 Oct 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1