ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.03760
  4. Cited By
DSelect-k: Differentiable Selection in the Mixture of Experts with
  Applications to Multi-Task Learning

DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning

7 June 2021
Hussein Hazimeh
Zhe Zhao
Aakanksha Chowdhery
M. Sathiamoorthy
Yihua Chen
Rahul Mazumder
Lichan Hong
Ed H. Chi
    MoE
ArXivPDFHTML

Papers citing "DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning"

22 / 22 papers shown
Title
CoCoAFusE: Beyond Mixtures of Experts via Model Fusion
CoCoAFusE: Beyond Mixtures of Experts via Model Fusion
Aurelio Raffa Ugolini
M. Tanelli
Valentina Breschi
MoE
24
0
0
02 May 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
107
1
0
10 Mar 2025
Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge Conflicts
Wenju Sun
Qingyong Li
Wen Wang
Yangli-ao Geng
Boyang Li
36
2
0
28 Jan 2025
Generate to Discriminate: Expert Routing for Continual Learning
Generate to Discriminate: Expert Routing for Continual Learning
Yewon Byun
Sanket Vaibhav Mehta
Saurabh Garg
Emma Strubell
Michael Oberst
Bryan Wilder
Zachary Chase Lipton
74
0
0
31 Dec 2024
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
Xumeng Han
Longhui Wei
Zhiyang Dou
Zipeng Wang
Chenhui Qiang
Xin He
Yingfei Sun
Zhenjun Han
Qi Tian
MoE
33
3
0
21 Oct 2024
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Tongtian Yue
Longteng Guo
Jie Cheng
Xuange Gao
J. Liu
MoE
23
0
0
14 Oct 2024
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
Jun Luo
C. L. P. Chen
Shandong Wu
FedML
VLM
MoE
36
3
0
14 Oct 2024
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Ghada Sokar
J. Obando-Ceron
Aaron C. Courville
Hugo Larochelle
Pablo Samuel Castro
MoE
87
2
0
02 Oct 2024
SUTRA: Scalable Multilingual Language Model Architecture
SUTRA: Scalable Multilingual Language Model Architecture
Abhijit Bendale
Michael Sapienza
Steven Ripplinger
Simon Gibbs
Jaewon Lee
Pranav Mistry
LRM
ELM
34
4
0
07 May 2024
Multimodal Clinical Trial Outcome Prediction with Large Language Models
Multimodal Clinical Trial Outcome Prediction with Large Language Models
Wenhao Zheng
Dongsheng Peng
Hongxia Xu
Yun-Qing Li
Hongtu Zhu
Tianfan Fu
Huaxiu Yao
Huaxiu Yao
38
5
0
09 Feb 2024
A General Theory for Softmax Gating Multinomial Logistic Mixture of
  Experts
A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
Huy Nguyen
Pedram Akbarian
TrungTin Nguyen
Nhat Ho
16
10
0
22 Oct 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and
  Modality-Aware MoE
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLM
VLM
MoE
48
9
0
23 Aug 2023
Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
Michael E. Sander
J. Puigcerver
Josip Djolonga
Gabriel Peyré
Mathieu Blondel
16
18
0
02 Feb 2023
Adaptive Pattern Extraction Multi-Task Learning for Multi-Step
  Conversion Estimations
Adaptive Pattern Extraction Multi-Task Learning for Multi-Step Conversion Estimations
Xuewen Tao
Mingming Ha
Xiaobo Guo
Qiongxu Ma
Ho Kei Cheng
Wenfang Lin
16
0
0
06 Jan 2023
HMOE: Hypernetwork-based Mixture of Experts for Domain Generalization
HMOE: Hypernetwork-based Mixture of Experts for Domain Generalization
Jingang Qu
T. Faney
Zehao Wang
Patrick Gallinari
Soleiman Yousef
J. D. Hemptinne
OOD
14
7
0
15 Nov 2022
M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task
  Learning with Model-Accelerator Co-design
M3^33ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
Hanxue Liang
Zhiwen Fan
Rishov Sarkar
Ziyu Jiang
Tianlong Chen
Kai Zou
Yu Cheng
Cong Hao
Zhangyang Wang
MoE
24
79
0
26 Oct 2022
Mixture of experts models for multilevel data: modelling framework and
  approximation theory
Mixture of experts models for multilevel data: modelling framework and approximation theory
Tsz Chai Fung
Spark C. Tseung
6
3
0
30 Sep 2022
UFO: Unified Feature Optimization
UFO: Unified Feature Optimization
Teng Xi
Yifan Sun
Deli Yu
Bi Li
Nan Peng
...
Haocheng Feng
Junyu Han
Jingtuo Liu
Errui Ding
Jingdong Wang
32
10
0
21 Jul 2022
LibMTL: A Python Library for Multi-Task Learning
LibMTL: A Python Library for Multi-Task Learning
Baijiong Lin
Yu Zhang
OffRL
AI4CE
10
37
0
27 Mar 2022
Dynamic and Context-Dependent Stock Price Prediction Using Attention
  Modules and News Sentiment
Dynamic and Context-Dependent Stock Price Prediction Using Attention Modules and News Sentiment
Nicole Koenigstein
AIFin
13
1
0
13 Mar 2022
Unified Scaling Laws for Routed Language Models
Unified Scaling Laws for Routed Language Models
Aidan Clark
Diego de Las Casas
Aurelia Guy
A. Mensch
Michela Paganini
...
Oriol Vinyals
Jack W. Rae
Erich Elsen
Koray Kavukcuoglu
Karen Simonyan
MoE
15
177
0
02 Feb 2022
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via
  Dense-To-Sparse Gate
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate
Xiaonan Nie
Xupeng Miao
Shijie Cao
Lingxiao Ma
Qibin Liu
Jilong Xue
Youshan Miao
Yi Liu
Zhi-Xin Yang
Bin Cui
MoMe
MoE
8
22
0
29 Dec 2021
1