Soft Merging of Experts with Adaptive Routing

Soft Merging of Experts with Adaptive Routing

6 June 2023

Mohammed Muqeeth

Papers citing "Soft Merging of Experts with Adaptive Routing"

15 / 15 papers shown

Title
FT-MoE: Sustainable-learning Mixture of Experts Model for Fault-Tolerant Computing with Multiple Tasks Wenjing Xiao Wenhao Song Miaojiang Chen Ruikun Luo Min Chen MoE 44 0 0 29 Apr 2025
CAMEx: Curvature-aware Merging of Experts Dung V. Nguyen Minh H. Nguyen Luc Q. Nguyen R. Teo T. Nguyen Linh Duy Tran MoMe 63 2 0 26 Feb 2025
Towards Modular LLMs by Building and Reusing a Library of LoRAs O. Ostapenko Zhan Su E. Ponti Laurent Charlin Nicolas Le Roux Matheus Pereira Lucas Page-Caccia Alessandro Sordoni MoMe 27 30 0 18 May 2024
From Sparse to Soft Mixtures of Experts J. Puigcerver C. Riquelme Basil Mustafa N. Houlsby MoE 118 114 0 02 Aug 2023
$π$ -Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation Chengyue Wu Teng Wang Yixiao Ge Zeyu Lu Rui-Zhi Zhou Ying Shan Ping Luo MoMe 70 35 0 27 Apr 2023
Git Re-Basin: Merging Models modulo Permutation Symmetries Samuel K. Ainsworth J. Hayase S. Srinivasa MoMe 239 313 0 11 Sep 2022
Linear Connectivity Reveals Generalization Strategies Jeevesh Juneja Rachit Bansal Kyunghyun Cho João Sedoc Naomi Saphra 232 45 0 24 May 2022
Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT James Lee-Thorp Joshua Ainslie MoE 27 11 0 24 May 2022
Mixture-of-Experts with Expert Choice Routing Yan-Quan Zhou Tao Lei Han-Chu Liu Nan Du Yanping Huang Vincent Zhao Andrew M. Dai Zhifeng Chen Quoc V. Le James Laudon MoE 147 323 0 18 Feb 2022
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts Stephen H. Bach Victor Sanh Zheng-Xin Yong Albert Webson Colin Raffel ... Khalid Almubarak Xiangru Tang Dragomir R. Radev Mike Tian-Jian Jiang Alexander M. Rush VLM 212 335 0 02 Feb 2022
Tricks for Training Sparse Translation Models Dheeru Dua Shruti Bhosale Vedanuj Goswami James Cross M. Lewis Angela Fan MoE 139 19 0 15 Oct 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization Victor Sanh Albert Webson Colin Raffel Stephen H. Bach Lintang Sutawika ... T. Bers Stella Biderman Leo Gao Thomas Wolf Alexander M. Rush LRM 203 1,651 0 15 Oct 2021
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference Sneha Kudugunta Yanping Huang Ankur Bapna M. Krikun Dmitry Lepikhin Minh-Thang Luong Orhan Firat MoE 119 104 0 24 Sep 2021
Unbiased Gradient Estimation with Balanced Assignments for Mixtures of Experts W. Kool Chris J. Maddison A. Mnih 14 10 0 24 Sep 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,927 0 20 Apr 2018