ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.00811
  4. Cited By
Sparse Backpropagation for MoE Training

Sparse Backpropagation for MoE Training

1 October 2023
Liyuan Liu
Jianfeng Gao
Weizhu Chen
    MoE
ArXivPDFHTML

Papers citing "Sparse Backpropagation for MoE Training"

10 / 10 papers shown
Title
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Ashwinee Panda
Vatsal Baherwani
Zain Sarwar
Benjamin Thérien
Supriyo Chakraborty
Tom Goldstein
MoE
37
0
0
16 Apr 2025
Adaptive Layer-skipping in Pre-trained LLMs
Adaptive Layer-skipping in Pre-trained LLMs
Xuan Luo
Weizhi Wang
Xifeng Yan
110
0
0
31 Mar 2025
Continual Pre-training of MoEs: How robust is your router?
Benjamin Thérien
Charles-Étienne Joseph
Zain Sarwar
Ashwinee Panda
Anirban Das
Shi-Xiong Zhang
Stephen Rawls
S.
Eugene Belilovsky
Irina Rish
MoE
73
0
0
06 Mar 2025
Advancing On-Device Neural Network Training with TinyPropv2: Dynamic,
  Sparse, and Efficient Backpropagation
Advancing On-Device Neural Network Training with TinyPropv2: Dynamic, Sparse, and Efficient Backpropagation
Marcus Rüb
Axel Sikora
Daniel Mueller-Gritschneder
30
1
0
11 Sep 2024
A Survey on Model MoErging: Recycling and Routing Among Specialized
  Experts for Collaborative Learning
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning
Prateek Yadav
Colin Raffel
Mohammed Muqeeth
Lucas Page-Caccia
Haokun Liu
Tianlong Chen
Mohit Bansal
Leshem Choshen
Alessandro Sordoni
MoMe
41
21
0
13 Aug 2024
Mixture of Experts with Mixture of Precisions for Tuning Quality of
  Service
Mixture of Experts with Mixture of Precisions for Tuning Quality of Service
HamidReza Imani
Abdolah Amirany
Tarek A. El-Ghazawi
MoE
56
6
0
19 Jul 2024
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
  Phone
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin
Sam Ade Jacobs
A. A. Awan
J. Aneja
Ahmed Hassan Awadallah
...
Li Lyna Zhang
Yi Zhang
Yue Zhang
Yunan Zhang
Xiren Zhou
LRM
ALM
56
1,023
0
22 Apr 2024
Bridging Discrete and Backpropagation: Straight-Through and Beyond
Bridging Discrete and Backpropagation: Straight-Through and Beyond
Liyuan Liu
Chengyu Dong
Xiaodong Liu
Bin-Xia Yu
Jianfeng Gao
BDL
18
20
0
17 Apr 2023
Unbiased Gradient Estimation with Balanced Assignments for Mixtures of
  Experts
Unbiased Gradient Estimation with Balanced Assignments for Mixtures of Experts
W. Kool
Chris J. Maddison
A. Mnih
26
10
0
24 Sep 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1