ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.05144
  4. Cited By
Mixture of Attention Heads: Selecting Attention Heads Per Token

Mixture of Attention Heads: Selecting Attention Heads Per Token

11 October 2022
Xiaofeng Zhang
Yikang Shen
Zeyu Huang
Jie Zhou
Wenge Rong
Zhang Xiong
    MoE
ArXivPDFHTML

Papers citing "Mixture of Attention Heads: Selecting Attention Heads Per Token"

6 / 6 papers shown
Title
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
88
0
0
01 May 2025
MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification
MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification
Yichu Xu
Di Wang
Hongzan Jiao
L. Zhang
L. Zhang
Mamba
80
0
0
29 Apr 2025
RouterKT: Mixture-of-Experts for Knowledge Tracing
RouterKT: Mixture-of-Experts for Knowledge Tracing
Han Liao
Shuaishuai Zu
31
0
0
11 Apr 2025
Layerwise Recurrent Router for Mixture-of-Experts
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
46
2
0
13 Aug 2024
Tricks for Training Sparse Translation Models
Tricks for Training Sparse Translation Models
Dheeru Dua
Shruti Bhosale
Vedanuj Goswami
James Cross
M. Lewis
Angela Fan
MoE
131
18
0
15 Oct 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1