ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.11842
  4. Cited By
MoH: Multi-Head Attention as Mixture-of-Head Attention

MoH: Multi-Head Attention as Mixture-of-Head Attention

15 October 2024
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
    MoE
ArXivPDFHTML

Papers citing "MoH: Multi-Head Attention as Mixture-of-Head Attention"

5 / 5 papers shown
Title
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
88
0
0
01 May 2025
GMAR: Gradient-Driven Multi-Head Attention Rollout for Vision Transformer Interpretability
GMAR: Gradient-Driven Multi-Head Attention Rollout for Vision Transformer Interpretability
Sehyeong Jo
Gangjae Jang
Haesol Park
27
0
0
28 Apr 2025
RouterKT: Mixture-of-Experts for Knowledge Tracing
RouterKT: Mixture-of-Experts for Knowledge Tracing
Han Liao
Shuaishuai Zu
31
0
0
11 Apr 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
64
1
0
10 Mar 2025
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation
  Experts
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
24
4
0
09 Oct 2024
1