ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.13262
  4. Cited By
FastMoE: A Fast Mixture-of-Expert Training System

FastMoE: A Fast Mixture-of-Expert Training System

24 March 2021
Jiaao He
J. Qiu
Aohan Zeng
Zhilin Yang
Jidong Zhai
Jie Tang
    ALM
    MoE
ArXivPDFHTML

Papers citing "FastMoE: A Fast Mixture-of-Expert Training System"

50 / 63 papers shown
Title
Faster MoE LLM Inference for Extremely Large Models
Faster MoE LLM Inference for Extremely Large Models
Haoqi Yang
Luohe Shi
Qiwei Li
Zuchao Li
Ping Wang
Bo Du
Mengjia Shen
Hai Zhao
MoE
61
0
0
06 May 2025
Taming the Titans: A Survey of Efficient LLM Inference Serving
Taming the Titans: A Survey of Efficient LLM Inference Serving
Ranran Zhen
J. Li
Yixin Ji
Z. Yang
Tong Liu
Qingrong Xia
Xinyu Duan
Z. Wang
Baoxing Huai
M. Zhang
LLMAG
77
0
0
28 Apr 2025
Accelerating MoE Model Inference with Expert Sharding
Oana Balmau
Anne-Marie Kermarrec
Rafael Pires
André Loureiro Espírito Santo
M. Vos
Milos Vujasinovic
MoE
62
0
0
11 Mar 2025
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
117
0
0
10 Mar 2025
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling
Yan Li
Pengfei Zheng
Shuang Chen
Zewei Xu
Yuanhao Lai
Yunfei Du
Z. Wang
MoE
107
0
0
06 Mar 2025
Sample Selection via Contrastive Fragmentation for Noisy Label Regression
Sample Selection via Contrastive Fragmentation for Noisy Label Regression
C. Kim
Sangwoo Moon
Jihwan Moon
Dongyeon Woo
Gunhee Kim
NoLa
52
0
0
25 Feb 2025
MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing
Seokjin Go
Divya Mahajan
MoE
67
0
0
10 Feb 2025
Importance Sampling via Score-based Generative Models
Importance Sampling via Score-based Generative Models
Heasung Kim
Taekyun Lee
Hyeji Kim
Gustavo de Veciana
MedIm
DiffM
129
1
0
07 Feb 2025
Hiding Communication Cost in Distributed LLM Training via Micro-batch
  Co-execution
Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution
Haiquan Wang
Chaoyi Ruan
Jia He
Jiaqi Ruan
Chengjie Tang
Xiaosong Ma
Cheng-rong Li
73
1
0
24 Nov 2024
Communication-Efficient Sparsely-Activated Model Training via Sequence
  Migration and Token Condensation
Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation
Fahao Chen
Peng Li
Zicong Hong
Zhou Su
Song Guo
MoMe
MoE
67
0
0
23 Nov 2024
HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO
  Computation Redundancy
HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy
Shuqing Luo
Jie Peng
Pingzhi Li
Tianlong Chen
MoE
31
0
0
02 Nov 2024
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in
  Large Language Models
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Nam V. Nguyen
Thong T. Doan
Luong Tran
Van Nguyen
Quang Pham
MoE
59
1
0
01 Nov 2024
Efficient and Interpretable Grammatical Error Correction with Mixture of
  Experts
Efficient and Interpretable Grammatical Error Correction with Mixture of Experts
Muhammad Reza Qorib
Alham Fikri Aji
Hwee Tou Ng
KELM
MoE
31
0
0
30 Oct 2024
MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced
  Multi-Task Learning
MALoRA: Mixture of Asymmetric Low-Rank Adaptation for Enhanced Multi-Task Learning
Xujia Wang
Haiyan Zhao
Shuo Wang
Hanqing Wang
Zhiyuan Liu
MoMe
MoE
30
0
0
30 Oct 2024
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
Yulei Qian
Fengcun Li
Xiangyang Ji
Xiaoyu Zhao
Jianchao Tan
K. Zhang
Xunliang Cai
MoE
68
3
0
16 Oct 2024
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
71
8
0
29 Jul 2024
Powering In-Database Dynamic Model Slicing for Structured Data Analytics
Powering In-Database Dynamic Model Slicing for Structured Data Analytics
Lingze Zeng
Naili Xing
Shaofeng Cai
Gang Chen
Bengchin Ooi
Jian Pei
Yuncheng Wu
21
1
0
01 May 2024
Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical
  Vision-Language Models
Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models
Songtao Jiang
Tuo Zheng
Yan Zhang
Yeying Jin
Li Yuan
Zuozhu Liu
MoE
29
12
0
16 Apr 2024
Shortcut-connected Expert Parallelism for Accelerating
  Mixture-of-Experts
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
Weilin Cai
Juyong Jiang
Le Qin
Junwei Cui
Sunghun Kim
Jiayi Huang
50
7
0
07 Apr 2024
Vanilla Transformers are Transfer Capability Teachers
Vanilla Transformers are Transfer Capability Teachers
Xin Lu
Yanyan Zhao
Bing Qin
MoE
33
0
0
04 Mar 2024
Rethinking RGB Color Representation for Image Restoration Models
Rethinking RGB Color Representation for Image Restoration Models
Jaerin Lee
J. Park
Sungyong Baik
Kyoung Mu Lee
14
1
0
05 Feb 2024
LocMoE: A Low-Overhead MoE for Large Language Model Training
LocMoE: A Low-Overhead MoE for Large Language Model Training
Jing Li
Zhijie Sun
Xuan He
Li Zeng
Yi Lin
Entong Li
Binfan Zheng
Rongqian Zhao
Xin Chen
MoE
30
11
0
25 Jan 2024
Exploiting Inter-Layer Expert Affinity for Accelerating
  Mixture-of-Experts Model Inference
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference
Jinghan Yao
Quentin G. Anthony
A. Shafi
Hari Subramoni
Dhabaleswar K.
Panda
MoE
26
13
0
16 Jan 2024
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated
  Program Synthesis
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
Shiwei Zhang
Lansong Diao
Chuan Wu
Zongyan Cao
Siyu Wang
Wei Lin
35
12
0
11 Jan 2024
Training and Serving System of Foundation Models: A Comprehensive Survey
Training and Serving System of Foundation Models: A Comprehensive Survey
Jiahang Zhou
Yanyu Chen
Zicong Hong
Wuhui Chen
Yue Yu
Tao Zhang
Hui Wang
Chuan-fu Zhang
Zibin Zheng
ALM
32
5
0
05 Jan 2024
Understanding LLMs: A Comprehensive Overview from Training to Inference
Understanding LLMs: A Comprehensive Overview from Training to Inference
Yi-Hsueh Liu
Haoyang He
Tianle Han
Xu-Yao Zhang
Mengyuan Liu
...
Xintao Hu
Tuo Zhang
Ning Qiang
Tianming Liu
Bao Ge
SyDa
19
64
0
04 Jan 2024
OMG: Towards Open-vocabulary Motion Generation via Mixture of
  Controllers
OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers
Hanming Liang
Jiacheng Bao
Ruichi Zhang
Sihan Ren
Yuecheng Xu
Sibei Yang
Xin Chen
Jingyi Yu
Lan Xu
42
17
0
14 Dec 2023
Routing to the Expert: Efficient Reward-guided Ensemble of Large
  Language Models
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Keming Lu
Hongyi Yuan
Runji Lin
Junyang Lin
Zheng Yuan
Chang Zhou
Jingren Zhou
MoE
LRM
40
52
0
15 Nov 2023
Diversifying the Mixture-of-Experts Representation for Language Models
  with Orthogonal Optimizer
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer
Boan Liu
Liang Ding
Li Shen
Keqin Peng
Yu Cao
Dazhao Cheng
Dacheng Tao
MoE
34
7
0
15 Oct 2023
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its
  Routing Policy
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
Pingzhi Li
Zhenyu (Allen) Zhang
Prateek Yadav
Yi-Lin Sung
Yu Cheng
Mohit Bansal
Tianlong Chen
MoMe
26
33
0
02 Oct 2023
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable
  Mixture-of-Expert Inference
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Ranggi Hwang
Jianyu Wei
Shijie Cao
Changho Hwang
Xiaohu Tang
Ting Cao
Mao Yang
MoE
45
40
0
23 Aug 2023
Experts Weights Averaging: A New General Training Scheme for Vision
  Transformers
Experts Weights Averaging: A New General Training Scheme for Vision Transformers
Yongqian Huang
Peng Ye
Xiaoshui Huang
Sheng R. Li
Tao Chen
Tong He
Wanli Ouyang
MoMe
15
8
0
11 Aug 2023
MediaGPT : A Large Language Model For Chinese Media
MediaGPT : A Large Language Model For Chinese Media
Zhonghao Wang
Zijia Lu
Boshen Jin
Haiying Deng
LM&MA
33
0
0
20 Jul 2023
A Comprehensive Overview of Large Language Models
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Saeed Mian
OffRL
46
523
0
12 Jul 2023
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient
  Vision Transformer
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
Haoran You
Huihong Shi
Yipin Guo
Yingyan Lin
Lin
26
16
0
10 Jun 2023
Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture
  with Task-level Sparsity via Mixture-of-Experts
Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts
Rishov Sarkar
Hanxue Liang
Zhiwen Fan
Zhangyang Wang
Cong Hao
MoE
25
17
0
30 May 2023
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
Zeyue Xue
Guanglu Song
Qiushan Guo
Boxiao Liu
Zhuofan Zong
Yu Liu
Ping Luo
DiffM
31
132
0
29 May 2023
Lifting the Curse of Capacity Gap in Distilling Language Models
Lifting the Curse of Capacity Gap in Distilling Language Models
Chen Zhang
Yang Yang
Jiahao Liu
Jingang Wang
Yunsen Xian
Benyou Wang
Dawei Song
MoE
27
19
0
20 May 2023
UKP-SQuARE v3: A Platform for Multi-Agent QA Research
UKP-SQuARE v3: A Platform for Multi-Agent QA Research
Haritz Puerto
Tim Baumgärtner
Rachneet Sachdeva
Haishuo Fang
Haotian Zhang
Sewin Tariverdian
Kexin Wang
Iryna Gurevych
26
2
0
31 Mar 2023
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert
  (MoE) Inference
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Haiyang Huang
Newsha Ardalani
Anna Y. Sun
Liu Ke
Hsien-Hsin S. Lee
Anjali Sridhar
Shruti Bhosale
Carole-Jean Wu
Benjamin C. Lee
MoE
65
23
0
10 Mar 2023
TA-MoE: Topology-Aware Large Scale Mixture-of-Expert Training
TA-MoE: Topology-Aware Large Scale Mixture-of-Expert Training
Chang-Qin Chen
Min Li
Zhihua Wu
Dianhai Yu
Chao Yang
MoE
6
14
0
20 Feb 2023
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for
  End-to-End Speech Recognition
Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition
Ye Bai
Jie Li
W. Han
Hao Ni
Kaituo Xu
Zhuo Zhang
Cheng Yi
Xiaorui Wang
MoE
16
1
0
17 Sep 2022
Efficient Sparsely Activated Transformers
Efficient Sparsely Activated Transformers
Salar Latifi
Saurav Muralidharan
M. Garland
MoE
8
2
0
31 Aug 2022
Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex
  Logical Queries
Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries
Xiao Liu
Shiyu Zhao
Kai Su
Yukuo Cen
J. Qiu
Mengdi Zhang
Wei Yu Wu
Yuxiao Dong
Jie Tang
33
57
0
16 Aug 2022
Neural Implicit Dictionary via Mixture-of-Expert Training
Neural Implicit Dictionary via Mixture-of-Expert Training
Peihao Wang
Zhiwen Fan
Tianlong Chen
Zhangyang Wang
11
12
0
08 Jul 2022
Optimizing Mixture of Experts using Dynamic Recompilations
Optimizing Mixture of Experts using Dynamic Recompilations
Ferdinand Kossmann
Zhihao Jia
A. Aiken
21
5
0
04 May 2022
Residual Mixture of Experts
Residual Mixture of Experts
Lemeng Wu
Mengchen Liu
Yinpeng Chen
Dongdong Chen
Xiyang Dai
Lu Yuan
MoE
20
36
0
20 Apr 2022
Towards Efficient Single Image Dehazing and Desnowing
Towards Efficient Single Image Dehazing and Desnowing
Tian-Chun Ye
Sixiang Chen
Yun-Peng Liu
Erkang Chen
Yuche Li
19
7
0
19 Apr 2022
HetuMoE: An Efficient Trillion-scale Mixture-of-Expert Distributed
  Training System
HetuMoE: An Efficient Trillion-scale Mixture-of-Expert Distributed Training System
Xiaonan Nie
Pinxue Zhao
Xupeng Miao
Tong Zhao
Bin Cui
MoE
13
35
0
28 Mar 2022
A World-Self Model Towards Understanding Intelligence
A World-Self Model Towards Understanding Intelligence
Yutao Yue
19
2
0
25 Mar 2022
12
Next