ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.15045
  4. Cited By
Multi-Head Mixture-of-Experts

Multi-Head Mixture-of-Experts

23 April 2024
Xun Wu
Shaohan Huang
Wenhui Wang
Furu Wei
    MoE
ArXivPDFHTML

Papers citing "Multi-Head Mixture-of-Experts"

12 / 12 papers shown
Title
UMoE: Unifying Attention and FFN with Shared Experts
UMoE: Unifying Attention and FFN with Shared Experts
Yuanhang Yang
Chaozheng Wang
Jing Li
MoE
29
0
0
12 May 2025
MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness
MoQa: Rethinking MoE Quantization with Multi-stage Data-model Distribution Awareness
Zihao Zheng
Xiuping Cui
Size Zheng
Maoliang Li
Jiayu Chen
Liang
Xiang Chen
MQ
MoE
49
0
0
27 Mar 2025
How Do Consumers Really Choose: Exposing Hidden Preferences with the Mixture of Experts Model
Diego Vallarino
34
1
0
03 Mar 2025
BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR
BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR
Guodong Ma
Wenxuan Wang
Lifeng Zhou
Yuting Yang
Yuke Li
Binbin Du
MoE
77
0
0
22 Jan 2025
Mixture of Hidden-Dimensions Transformer
Mixture of Hidden-Dimensions Transformer
Yilong Chen
Junyuan Shang
Zhengyu Zhang
Jiawei Sheng
Tingwen Liu
Shuohuan Wang
Yu Sun
Hua-Hong Wu
Haifeng Wang
MoE
68
0
0
07 Dec 2024
MH-MoE: Multi-Head Mixture-of-Experts
MH-MoE: Multi-Head Mixture-of-Experts
Shaohan Huang
Xun Wu
Shuming Ma
Furu Wei
MoE
69
1
0
25 Nov 2024
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation
  Experts
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
32
4
0
09 Oct 2024
A Dynamic Approach to Stock Price Prediction: Comparing RNN and Mixture
  of Experts Models Across Different Volatility Profiles
A Dynamic Approach to Stock Price Prediction: Comparing RNN and Mixture of Experts Models Across Different Volatility Profiles
Diego Vallarino
18
4
0
04 Oct 2024
Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for
  Vision
Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision
Minglei Li
Peng Ye
Yongqi Huang
Lin Zhang
Tao Chen
Tong He
Jiayuan Fan
Wanli Ouyang
MoE
32
4
0
05 Jun 2024
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Yongxin Guo
Zhenglin Cheng
Xiaoying Tang
Tao R. Lin
Tao Lin
MoE
53
7
0
23 May 2024
Learning More Generalized Experts by Merging Experts in
  Mixture-of-Experts
Learning More Generalized Experts by Merging Experts in Mixture-of-Experts
Sejik Park
FedML
CLL
MoMe
32
5
0
19 May 2024
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,081
0
17 Feb 2021
1