Merge, Then Compress: Demystify Efficient SMoE with Hints from Its
Routing Policy

Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

2 October 2023

Zhenyu (Allen) Zhang

Mohit Bansal

Papers citing "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"

12 / 12 papers shown

Title
Faster MoE LLM Inference for Extremely Large Models Haoqi Yang Luohe Shi Qiwei Li Zuchao Li Ping Wang Bo Du Mengjia Shen Hai Zhao MoE 59 0 0 06 May 2025
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities Raman Dutt Harleen Hanspal Guoxuan Xia Petru-Daniel Tudosiu Alexander Black Yongxin Yang Steven G. McDonagh Sarah Parisot MoE 38 0 0 28 Mar 2025
CAMEx: Curvature-aware Merging of Experts Dung V. Nguyen Minh H. Nguyen Luc Q. Nguyen R. Teo T. Nguyen Linh Duy Tran MoMe 73 2 0 26 Feb 2025
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models Raeid Saqur Anastasis Kratsios Florian Krach Yannick Limmer Jacob-Junqi Tian John Willes Blanka Horvath Frank Rudzicz MoE 43 0 0 24 Feb 2025
Merging Feed-Forward Sublayers for Compressed Transformers Neha Verma Kenton W. Murray Kevin Duh AI4CE 45 0 0 10 Jan 2025
Cuttlefish: Low-Rank Model Training without All the Tuning Hongyi Wang Saurabh Agarwal Pongsakorn U-chupala Yoshiki Tanaka Eric P. Xing Dimitris Papailiopoulos OffRL 51 21 0 04 May 2023
PopulAtion Parameter Averaging (PAPA) Alexia Jolicoeur-Martineau Emy Gervais Kilian Fatras Yan Zhang Simon Lacoste-Julien MoMe 40 17 0 06 Apr 2023
Git Re-Basin: Merging Models modulo Permutation Symmetries Samuel K. Ainsworth J. Hayase S. Srinivasa MoMe 239 312 0 11 Sep 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 315 8,402 0 28 Jan 2022
Scalable and Efficient MoE Training for Multitask Multilingual Models Young Jin Kim A. A. Awan Alexandre Muzio Andres Felipe Cruz Salinas Liyang Lu Amr Hendy Samyam Rajbhandari Yuxiong He Hany Awadalla MoE 94 83 0 22 Sep 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 226 4,424 0 23 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,927 0 20 Apr 2018