ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2208.02813
  4. Cited By
Towards Understanding Mixture of Experts in Deep Learning

Towards Understanding Mixture of Experts in Deep Learning

4 August 2022
Zixiang Chen
Yihe Deng
Yue-bo Wu
Quanquan Gu
Yuan-Fang Li
    MLT
    MoE
ArXivPDFHTML

Papers citing "Towards Understanding Mixture of Experts in Deep Learning"

32 / 32 papers shown
Title
Federated Semantic Learning for Privacy-preserving Cross-domain Recommendation
Federated Semantic Learning for Privacy-preserving Cross-domain Recommendation
Ziang Lu
Lei Guo
Xu Yu
Zhiyong Cheng
Xiaohui Han
Lei Zhu
40
0
0
29 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
120
1
0
10 Mar 2025
VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer
VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer
Xinyu Liu
Ailing Zeng
Wei Xue
Harry Yang
Wenhan Luo
Qifeng Liu
Yike Guo
VGen
157
0
0
09 Feb 2025
Mixture of Link Predictors on Graphs
Mixture of Link Predictors on Graphs
Li Ma
Haoyu Han
Juanhui Li
Harry Shomer
Hui Liu
Xiaofeng Gao
Jiliang Tang
71
0
0
03 Jan 2025
Learning Mixtures of Experts with EM
Learning Mixtures of Experts with EM
Quentin Fruytier
Aryan Mokhtari
Sujay Sanghavi
MoE
26
0
0
09 Nov 2024
Context-Aware Token Selection and Packing for Enhanced Vision
  Transformer
Context-Aware Token Selection and Packing for Enhanced Vision Transformer
Tianyi Zhang
B. Li
Jae-sun Seo
Yu Cao
33
0
0
31 Oct 2024
Mixture of Parrots: Experts improve memorization more than reasoning
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
28
4
0
24 Oct 2024
Collaborative and Efficient Personalization with Mixtures of Adaptors
Collaborative and Efficient Personalization with Mixtures of Adaptors
Abdulla Jasem Almansoori
Samuel Horváth
Martin Takáč
FedML
42
2
0
04 Oct 2024
Exploring Domain Robust Lightweight Reward Models based on Router
  Mechanism
Exploring Domain Robust Lightweight Reward Models based on Router Mechanism
Hyuk Namgoong
Jeesu Jung
Sangkeun Jung
Yoonhyung Roh
30
0
0
24 Jul 2024
Extracting thin film structures of energy materials using transformers
Extracting thin film structures of energy materials using transformers
Chen Zhang
V. Niemann
Peter Benedek
Thomas F. Jaramillo
Mathieu Doucet
22
0
0
24 Jun 2024
Predicting Exoplanetary Features with a Residual Model for Uniform and
  Gaussian Distributions
Predicting Exoplanetary Features with a Residual Model for Uniform and Gaussian Distributions
Andrew Sweet
OOD
31
0
0
16 Jun 2024
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models
Benjamin Biggs
Arjun Seshadri
Yang Zou
Achin Jain
Aditya Golatkar
Yusheng Xie
Alessandro Achille
Ashwin Swaminathan
Stefano Soatto
MoMe
DiffM
35
10
0
12 Jun 2024
Node-wise Filtering in Graph Neural Networks: A Mixture of Experts
  Approach
Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach
Haoyu Han
Juanhui Li
Wei Huang
Xianfeng Tang
Hanqing Lu
Chen Luo
Hui Liu
Jiliang Tang
38
5
0
05 Jun 2024
Generalization Error Analysis for Sparse Mixture-of-Experts: A
  Preliminary Study
Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study
Jinze Zhao
Peihao Wang
Zhangyang Wang
MoE
18
2
0
26 Mar 2024
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
Jieming Cui
Tengyu Liu
Nian Liu
Yaodong Yang
Yixin Zhu
Siyuan Huang
45
21
0
19 Mar 2024
MSGNet: Learning Multi-Scale Inter-Series Correlations for Multivariate
  Time Series Forecasting
MSGNet: Learning Multi-Scale Inter-Series Correlations for Multivariate Time Series Forecasting
Wanlin Cai
Yuxuan Liang
Xianggen Liu
Jianshuai Feng
Yuankai Wu
AI4TS
33
71
0
31 Dec 2023
MoE-AMC: Enhancing Automatic Modulation Classification Performance Using
  Mixture-of-Experts
MoE-AMC: Enhancing Automatic Modulation Classification Performance Using Mixture-of-Experts
Jiaxin Gao
Qinglong Cao
Yuntian Chen
13
5
0
04 Dec 2023
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of
  mixture-of-datasets
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets
Yash Jain
Harkirat Singh Behl
Z. Kira
Vibhav Vineet
20
12
0
08 Nov 2023
SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data
SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data
Ruoxi Sun
Sercan Ö. Arik
Rajarishi Sinha
Hootan Nakhost
Hanjun Dai
Pengcheng Yin
Tomas Pfister
33
13
0
06 Nov 2023
Text Promptable Surgical Instrument Segmentation with Vision-Language
  Models
Text Promptable Surgical Instrument Segmentation with Vision-Language Models
Zijian Zhou
Oluwatosin O. Alabi
Meng Wei
Tom Kamiel Magda Vercauteren
Miaojing Shi
MedIm
25
23
0
15 Jun 2023
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient
  for Convolutional Neural Networks
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks
Mohammed Nowaz Rabbani Chowdhury
Shuai Zhang
M. Wang
Sijia Liu
Pin-Yu Chen
MoE
21
17
0
07 Jun 2023
Towards Understanding Clean Generalization and Robust Overfitting in
  Adversarial Training
Towards Understanding Clean Generalization and Robust Overfitting in Adversarial Training
Binghui Li
Yuanzhi Li
AAML
26
3
0
02 Jun 2023
Additive Class Distinction Maps using Branched-GANs
Additive Class Distinction Maps using Branched-GANs
Elnatan Kadar
Jonathan Brokman
Guy Gilboa
GAN
18
0
0
04 May 2023
Solving Regularized Exp, Cosh and Sinh Regression Problems
Solving Regularized Exp, Cosh and Sinh Regression Problems
Zhihang Li
Zhao-quan Song
Tianyi Zhou
23
39
0
28 Mar 2023
Improving Transformer Performance for French Clinical Notes
  Classification Using Mixture of Experts on a Limited Dataset
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset
Thanh-Dung Le
P. Jouvet
R. Noumeir
MoE
MedIm
67
5
0
22 Mar 2023
The Power of External Memory in Increasing Predictive Model Capacity
The Power of External Memory in Increasing Predictive Model Capacity
Cenk Baykal
D. Cutler
Nishanth Dikkala
Nikhil Ghosh
Rina Panigrahy
Xin Wang
KELM
13
0
0
31 Jan 2023
Alternating Updates for Efficient Transformers
Alternating Updates for Efficient Transformers
Cenk Baykal
D. Cutler
Nishanth Dikkala
Nikhil Ghosh
Rina Panigrahy
Xin Wang
MoE
40
5
0
30 Jan 2023
Gated Self-supervised Learning For Improving Supervised Learning
Gated Self-supervised Learning For Improving Supervised Learning
Erland Hilman Fuadi
Aristo Renaldo Ruslim
Putu Wahyu Kusuma Wardhana
N. Yudistira
SSL
15
0
0
14 Jan 2023
Vision Transformers provably learn spatial structure
Vision Transformers provably learn spatial structure
Samy Jelassi
Michael E. Sander
Yuan-Fang Li
ViT
MLT
32
73
0
13 Oct 2022
Understanding the Generalization of Adam in Learning Neural Networks
  with Proper Regularization
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization
Difan Zou
Yuan Cao
Yuanzhi Li
Quanquan Gu
MLT
AI4CE
39
37
0
25 Aug 2021
End-To-End Data-Dependent Routing in Multi-Path Neural Networks
End-To-End Data-Dependent Routing in Multi-Path Neural Networks
Dumindu Tissera
Rukshan Wijesinghe
Kasun Vithanage
A. Xavier
Subha Fernando
Ranga Rodrigo
MoE
18
0
0
06 Jul 2021
Non-asymptotic oracle inequalities for the Lasso in high-dimensional
  mixture of experts
Non-asymptotic oracle inequalities for the Lasso in high-dimensional mixture of experts
TrungTin Nguyen
Hien Nguyen
Faicel Chamroukhi
Geoffrey J. McLachlan
21
1
0
22 Sep 2020
1