MoH: Multi-Head Attention as Mixture-of-Head Attention

15 October 2024

Papers citing "MoH: Multi-Head Attention as Mixture-of-Head Attention"

5 / 5 papers shown

Title
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing Piotr Piekos Róbert Csordás Jürgen Schmidhuber MoE VLM 88 0 0 01 May 2025
GMAR: Gradient-Driven Multi-Head Attention Rollout for Vision Transformer Interpretability Sehyeong Jo Gangjae Jang Haesol Park 27 0 0 28 Apr 2025
RouterKT: Mixture-of-Experts for Knowledge Tracing Han Liao Shuaishuai Zu 31 0 0 11 Apr 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications Siyuan Mu Sen Lin MoE 64 1 0 10 Mar 2025
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts Peng Jin Bo Zhu Li Yuan Shuicheng Yan MoE 24 4 0 09 Oct 2024