Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.01334
Cited By
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
2 October 2023
Pingzhi Li
Zhenyu (Allen) Zhang
Prateek Yadav
Yi-Lin Sung
Yu Cheng
Mohit Bansal
Tianlong Chen
MoMe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
12 / 12 papers shown
Title
Faster MoE LLM Inference for Extremely Large Models
Haoqi Yang
Luohe Shi
Qiwei Li
Zuchao Li
Ping Wang
Bo Du
Mengjia Shen
Hai Zhao
MoE
59
0
0
06 May 2025
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities
Raman Dutt
Harleen Hanspal
Guoxuan Xia
Petru-Daniel Tudosiu
Alexander Black
Yongxin Yang
Steven G. McDonagh
Sarah Parisot
MoE
38
0
0
28 Mar 2025
CAMEx: Curvature-aware Merging of Experts
Dung V. Nguyen
Minh H. Nguyen
Luc Q. Nguyen
R. Teo
T. Nguyen
Linh Duy Tran
MoMe
73
2
0
26 Feb 2025
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models
Raeid Saqur
Anastasis Kratsios
Florian Krach
Yannick Limmer
Jacob-Junqi Tian
John Willes
Blanka Horvath
Frank Rudzicz
MoE
43
0
0
24 Feb 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
45
0
0
10 Jan 2025
Cuttlefish: Low-Rank Model Training without All the Tuning
Hongyi Wang
Saurabh Agarwal
Pongsakorn U-chupala
Yoshiki Tanaka
Eric P. Xing
Dimitris Papailiopoulos
OffRL
51
21
0
04 May 2023
PopulAtion Parameter Averaging (PAPA)
Alexia Jolicoeur-Martineau
Emy Gervais
Kilian Fatras
Yan Zhang
Simon Lacoste-Julien
MoMe
40
17
0
06 Apr 2023
Git Re-Basin: Merging Models modulo Permutation Symmetries
Samuel K. Ainsworth
J. Hayase
S. Srinivasa
MoMe
239
312
0
11 Sep 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,402
0
28 Jan 2022
Scalable and Efficient MoE Training for Multitask Multilingual Models
Young Jin Kim
A. A. Awan
Alexandre Muzio
Andres Felipe Cruz Salinas
Liyang Lu
Amr Hendy
Samyam Rajbhandari
Yuxiong He
Hany Awadalla
MoE
94
83
0
22 Sep 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,424
0
23 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1