Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2208.02813
Cited By
Towards Understanding Mixture of Experts in Deep Learning
4 August 2022
Zixiang Chen
Yihe Deng
Yue-bo Wu
Quanquan Gu
Yuan-Fang Li
MLT
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Understanding Mixture of Experts in Deep Learning"
32 / 32 papers shown
Title
Federated Semantic Learning for Privacy-preserving Cross-domain Recommendation
Ziang Lu
Lei Guo
Xu Yu
Zhiyong Cheng
Xiaohui Han
Lei Zhu
40
0
0
29 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
120
1
0
10 Mar 2025
VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer
Xinyu Liu
Ailing Zeng
Wei Xue
Harry Yang
Wenhan Luo
Qifeng Liu
Yike Guo
VGen
157
0
0
09 Feb 2025
Mixture of Link Predictors on Graphs
Li Ma
Haoyu Han
Juanhui Li
Harry Shomer
Hui Liu
Xiaofeng Gao
Jiliang Tang
71
0
0
03 Jan 2025
Learning Mixtures of Experts with EM
Quentin Fruytier
Aryan Mokhtari
Sujay Sanghavi
MoE
26
0
0
09 Nov 2024
Context-Aware Token Selection and Packing for Enhanced Vision Transformer
Tianyi Zhang
B. Li
Jae-sun Seo
Yu Cao
33
0
0
31 Oct 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
28
4
0
24 Oct 2024
Collaborative and Efficient Personalization with Mixtures of Adaptors
Abdulla Jasem Almansoori
Samuel Horváth
Martin Takáč
FedML
42
2
0
04 Oct 2024
Exploring Domain Robust Lightweight Reward Models based on Router Mechanism
Hyuk Namgoong
Jeesu Jung
Sangkeun Jung
Yoonhyung Roh
30
0
0
24 Jul 2024
Extracting thin film structures of energy materials using transformers
Chen Zhang
V. Niemann
Peter Benedek
Thomas F. Jaramillo
Mathieu Doucet
22
0
0
24 Jun 2024
Predicting Exoplanetary Features with a Residual Model for Uniform and Gaussian Distributions
Andrew Sweet
OOD
31
0
0
16 Jun 2024
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models
Benjamin Biggs
Arjun Seshadri
Yang Zou
Achin Jain
Aditya Golatkar
Yusheng Xie
Alessandro Achille
Ashwin Swaminathan
Stefano Soatto
MoMe
DiffM
35
10
0
12 Jun 2024
Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach
Haoyu Han
Juanhui Li
Wei Huang
Xianfeng Tang
Hanqing Lu
Chen Luo
Hui Liu
Jiliang Tang
38
5
0
05 Jun 2024
Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study
Jinze Zhao
Peihao Wang
Zhangyang Wang
MoE
18
2
0
26 Mar 2024
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
Jieming Cui
Tengyu Liu
Nian Liu
Yaodong Yang
Yixin Zhu
Siyuan Huang
45
21
0
19 Mar 2024
MSGNet: Learning Multi-Scale Inter-Series Correlations for Multivariate Time Series Forecasting
Wanlin Cai
Yuxuan Liang
Xianggen Liu
Jianshuai Feng
Yuankai Wu
AI4TS
33
71
0
31 Dec 2023
MoE-AMC: Enhancing Automatic Modulation Classification Performance Using Mixture-of-Experts
Jiaxin Gao
Qinglong Cao
Yuntian Chen
13
5
0
04 Dec 2023
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets
Yash Jain
Harkirat Singh Behl
Z. Kira
Vibhav Vineet
20
12
0
08 Nov 2023
SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data
Ruoxi Sun
Sercan Ö. Arik
Rajarishi Sinha
Hootan Nakhost
Hanjun Dai
Pengcheng Yin
Tomas Pfister
33
13
0
06 Nov 2023
Text Promptable Surgical Instrument Segmentation with Vision-Language Models
Zijian Zhou
Oluwatosin O. Alabi
Meng Wei
Tom Kamiel Magda Vercauteren
Miaojing Shi
MedIm
25
23
0
15 Jun 2023
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks
Mohammed Nowaz Rabbani Chowdhury
Shuai Zhang
M. Wang
Sijia Liu
Pin-Yu Chen
MoE
21
17
0
07 Jun 2023
Towards Understanding Clean Generalization and Robust Overfitting in Adversarial Training
Binghui Li
Yuanzhi Li
AAML
26
3
0
02 Jun 2023
Additive Class Distinction Maps using Branched-GANs
Elnatan Kadar
Jonathan Brokman
Guy Gilboa
GAN
18
0
0
04 May 2023
Solving Regularized Exp, Cosh and Sinh Regression Problems
Zhihang Li
Zhao-quan Song
Tianyi Zhou
23
39
0
28 Mar 2023
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset
Thanh-Dung Le
P. Jouvet
R. Noumeir
MoE
MedIm
67
5
0
22 Mar 2023
The Power of External Memory in Increasing Predictive Model Capacity
Cenk Baykal
D. Cutler
Nishanth Dikkala
Nikhil Ghosh
Rina Panigrahy
Xin Wang
KELM
13
0
0
31 Jan 2023
Alternating Updates for Efficient Transformers
Cenk Baykal
D. Cutler
Nishanth Dikkala
Nikhil Ghosh
Rina Panigrahy
Xin Wang
MoE
40
5
0
30 Jan 2023
Gated Self-supervised Learning For Improving Supervised Learning
Erland Hilman Fuadi
Aristo Renaldo Ruslim
Putu Wahyu Kusuma Wardhana
N. Yudistira
SSL
15
0
0
14 Jan 2023
Vision Transformers provably learn spatial structure
Samy Jelassi
Michael E. Sander
Yuan-Fang Li
ViT
MLT
32
73
0
13 Oct 2022
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization
Difan Zou
Yuan Cao
Yuanzhi Li
Quanquan Gu
MLT
AI4CE
39
37
0
25 Aug 2021
End-To-End Data-Dependent Routing in Multi-Path Neural Networks
Dumindu Tissera
Rukshan Wijesinghe
Kasun Vithanage
A. Xavier
Subha Fernando
Ranga Rodrigo
MoE
18
0
0
06 Jul 2021
Non-asymptotic oracle inequalities for the Lasso in high-dimensional mixture of experts
TrungTin Nguyen
Hien Nguyen
Faicel Chamroukhi
Geoffrey J. McLachlan
21
1
0
22 Sep 2020
1