Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.04260
Cited By
Taming Sparsely Activated Transformer with Stochastic Experts
8 October 2021
Simiao Zuo
Xiaodong Liu
Jian Jiao
Young Jin Kim
Hany Hassan
Ruofei Zhang
T. Zhao
Jianfeng Gao
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Taming Sparsely Activated Transformer with Stochastic Experts"
50 / 83 papers shown
Title
QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration
HamidReza Imani
Jiaxin Peng
Peiman Mohseni
Abdolah Amirany
Tarek A. El-Ghazawi
MoE
23
0
0
10 May 2025
You Don't Need All Attentions: Distributed Dynamic Fine-Tuning for Foundation Models
Shiwei Ding
Lan Zhang
Zhenlin Wang
Giuseppe Ateniese
Xiaoyong Yuan
32
0
0
16 Apr 2025
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
Mohan Zhang
Pingzhi Li
Jie Peng
Mufan Qiu
Tianlong Chen
MoE
38
0
0
02 Apr 2025
Biologically Inspired Spiking Diffusion Model with Adaptive Lateral Selection Mechanism
Linghao Feng
Dongcheng Zhao
Sicheng Shen
Yi Zeng
67
0
0
31 Mar 2025
Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework
Soham Sane
MoE
62
0
0
26 Mar 2025
Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores
Chenpeng Wu
Qiqi Gu
Heng Shi
Jianguo Yao
Haibing Guan
MoE
48
0
0
13 Mar 2025
eMoE: Task-aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference
Suraiya Tairin
Shohaib Mahmud
Haiying Shen
Anand Iyer
MoE
108
0
0
10 Mar 2025
Efficient Algorithms for Verifying Kruskal Rank in Sparse Linear Regression and Related Applications
Fengqin Zhou
43
0
0
06 Mar 2025
Sample Selection via Contrastive Fragmentation for Noisy Label Regression
C. Kim
Sangwoo Moon
Jihwan Moon
Dongyeon Woo
Gunhee Kim
NoLa
52
0
0
25 Feb 2025
Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization
M. L. Olson
Neale Ratzlaff
Musashi Hinck
Man Luo
Sungduk Yu
Chendi Xue
Vasudev Lal
MoE
LRM
49
1
0
15 Feb 2025
Importance Sampling via Score-based Generative Models
Heasung Kim
Taekyun Lee
Hyeji Kim
Gustavo de Veciana
MedIm
DiffM
127
0
0
07 Feb 2025
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
Hanwen Zhong
Jiaxin Chen
Yutong Zhang
Di Huang
Yunhong Wang
MoE
42
0
0
12 Jan 2025
Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation
Fahao Chen
Peng Li
Zicong Hong
Zhou Su
Song Guo
MoMe
MoE
67
0
0
23 Nov 2024
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
31
3
0
18 Oct 2024
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
Minghao Zhu
Zhengpu Wang
Mengxian Hu
Ronghao Dang
Xiao Lin
Xun Zhou
Chengju Liu
Qijun Chen
30
1
0
14 Oct 2024
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Tongtian Yue
Longteng Guo
Jie Cheng
Xuange Gao
J. Liu
MoE
23
0
0
14 Oct 2024
Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts
Sukwon Yun
Inyoung Choi
Jie Peng
Yangfan Wu
J. Bao
Qiyiwen Zhang
Jiayi Xin
Qi Long
Tianlong Chen
MoE
42
4
0
10 Oct 2024
Exploring the Benefit of Activation Sparsity in Pre-training
Zhengyan Zhang
Chaojun Xiao
Qiujieli Qin
Yankai Lin
Zhiyuan Zeng
Xu Han
Zhiyuan Liu
Ruobing Xie
Maosong Sun
Jie Zhou
MoE
58
3
0
04 Oct 2024
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts
Nikolas Gritsch
Qizhen Zhang
Acyr F. Locatelli
Sara Hooker
A. Ustun
MoE
50
1
0
28 Aug 2024
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
73
2
0
13 Aug 2024
Mixture of Experts with Mixture of Precisions for Tuning Quality of Service
HamidReza Imani
Abdolah Amirany
Tarek A. El-Ghazawi
MoE
56
6
0
19 Jul 2024
Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement
Yongji Wu
Wenjie Qu
Tianyang Tao
Zhuang Wang
Wei Bai
Zhuohao Li
Yuan Tian
Jiaheng Zhang
Matthew Lentz
Danyang Zhuo
55
3
0
05 Jul 2024
Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules
Xinglin Pan
Wenxiang Lin
S. Shi
Xiaowen Chu
Weinong Sun
Bo Li
MoE
36
3
0
30 Jun 2024
A Closer Look into Mixture-of-Experts in Large Language Models
Ka Man Lo
Zeyu Huang
Zihan Qiu
Zili Wang
Jie Fu
MoE
25
10
0
26 Jun 2024
MoE-RBench
\texttt{MoE-RBench}
MoE-RBench
: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen
Xinyu Zhao
Tianlong Chen
Yu Cheng
MoE
62
5
0
17 Jun 2024
MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors
Renzhi Wang
Piji Li
KELM
32
3
0
29 May 2024
MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts
Jianan Zhou
Zhiguang Cao
Yaoxin Wu
Wen Song
Yining Ma
Jie Zhang
Chi Xu
49
17
0
02 May 2024
Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping
Chenyu Jiang
Ye Tian
Zhen Jia
Shuai Zheng
Chuan Wu
Yida Wang
MoMe
16
7
0
30 Apr 2024
Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts
Shengzhuang Chen
Jihoon Tack
Yunqiao Yang
Yee Whye Teh
Jonathan Richard Schwarz
Ying Wei
MoE
35
1
0
13 Mar 2024
Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models
Ning Ding
Yulin Chen
Ganqu Cui
Xingtai Lv
Weilin Zhao
Ruobing Xie
Bowen Zhou
Zhiyuan Liu
Maosong Sun
ALM
MoMe
AI4CE
38
7
0
13 Mar 2024
Harder Tasks Need More Experts: Dynamic Routing in MoE Models
Quzhe Huang
Zhenwei An
Zhuang Nan
Mingxu Tao
Chen Zhang
...
Kun Xu
Kun Xu
Liwei Chen
Songfang Huang
Yansong Feng
MoE
37
25
0
12 Mar 2024
Conditional computation in neural networks: principles and research trends
Simone Scardapane
Alessandro Baiocchi
Alessio Devoto
V. Marsocci
Pasquale Minervini
Jary Pomponi
34
1
0
12 Mar 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
36
46
0
15 Feb 2024
BECoTTA: Input-dependent Online Blending of Experts for Continual Test-time Adaptation
Daeun Lee
Jaehong Yoon
Sung Ju Hwang
CLL
TTA
51
5
0
13 Feb 2024
Differentially Private Training of Mixture of Experts Models
Pierre Tholoniat
Huseyin A. Inan
Janardhan Kulkarni
Robert Sim
MoE
22
1
0
11 Feb 2024
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition
Quang-Cuong Pham
Giang Do
Huy Nguyen
TrungTin Nguyen
Chenghao Liu
...
Binh T. Nguyen
Savitha Ramasamy
Xiaoli Li
Steven C. H. Hoi
Nhat Ho
19
17
0
04 Feb 2024
Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model
Zihan Zhong
Zhiqiang Tang
Tong He
Haoyang Fang
Chun Yuan
33
40
0
31 Jan 2024
MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts
Zhitian Xie
Yinger Zhang
Chenyi Zhuang
Qitao Shi
Zhining Liu
Jinjie Gu
Guannan Zhang
MoE
22
3
0
31 Jan 2024
LocMoE: A Low-Overhead MoE for Large Language Model Training
Jing Li
Zhijie Sun
Xuan He
Li Zeng
Yi Lin
Entong Li
Binfan Zheng
Rongqian Zhao
Xin Chen
MoE
30
11
0
25 Jan 2024
MoSA: Mixture of Sparse Adapters for Visual Efficient Tuning
Qizhe Zhang
Bocheng Zou
Ruichuan An
Jiaming Liu
Shanghang Zhang
MoE
20
2
0
05 Dec 2023
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu
Xia Hu
Yaqing Wang
Bo Pang
Radu Soricut
MoE
14
14
0
01 Dec 2023
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach
Feng Luo
Jin-Peng Xiang
Jun Zhang
Xiao Han
Wei Yang
DiffM
83
10
0
18 Oct 2023
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer
Boan Liu
Liang Ding
Li Shen
Keqin Peng
Yu Cao
Dazhao Cheng
Dacheng Tao
MoE
34
7
0
15 Oct 2023
Adaptive Gating in Mixture-of-Experts based Language Models
Jiamin Li
Qiang Su
Yitao Yang
Yimin Jiang
Cong Wang
Hong-Yu Xu
MoE
22
5
0
11 Oct 2023
Sparse Backpropagation for MoE Training
Liyuan Liu
Jianfeng Gao
Weizhu Chen
MoE
8
9
0
01 Oct 2023
Associative Transformer
Yuwei Sun
H. Ochiai
Zhirong Wu
Stephen Lin
Ryota Kanai
ViT
46
0
0
22 Sep 2023
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning
Ted Zadouri
A. Ustun
Arash Ahmadian
Beyza Ermics
Acyr F. Locatelli
Sara Hooker
MoE
24
88
0
11 Sep 2023
Task-Based MoE for Multitask Multilingual Machine Translation
Hai Pham
Young Jin Kim
Subhabrata Mukherjee
David P. Woodruff
Barnabás Póczós
Hany Awadalla
MoE
26
4
0
30 Aug 2023
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Ranggi Hwang
Jianyu Wei
Shijie Cao
Changho Hwang
Xiaohu Tang
Ting Cao
Mao Yang
MoE
45
40
0
23 Aug 2023
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts
Wenyan Cong
Hanxue Liang
Peihao Wang
Zhiwen Fan
Tianlong Chen
M. Varma
Yi Wang
Zhangyang Wang
MoE
22
21
0
22 Aug 2023
1
2
Next