Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2210.14793
Cited By
M
3
^3
3
ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
Neural Information Processing Systems (NeurIPS), 2022
26 October 2022
Hanxue Liang
Zhiwen Fan
Rishov Sarkar
Ziyu Jiang
Tianlong Chen
Kai Zou
Yu Cheng
Cong Hao
Zinan Lin
MoE
Re-assign community
ArXiv (abs)
PDF
HTML
Github (118★)
Papers citing
"M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design"
50 / 71 papers shown
Title
3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding
X. Wang
Chen Tang
Xiangyu Yue
Wei-Hong Li
3DV
129
0
0
25 Nov 2025
Parameter Aware Mamba Model for Multi-task Dense Prediction
Xinzhuo Yu
Yunzhi Zhuge
Sitong Gong
Lu Zhang
Pingping Zhang
Huchuan Lu
Mamba
268
0
0
18 Nov 2025
Proto-Former: Unified Facial Landmark Detection by Prototype Transformer
Shengkai Hu
Haozhe Qi
Jun Wan
Jiaxing Huang
Lefei Zhang
Hang Sun
Dacheng Tao
ViT
108
1
0
17 Oct 2025
Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts
Li Bai
Qingqing Ye
Xinwei Zhang
Sen Zhang
Zi Liang
Jianliang Xu
Haibo Hu
FedML
MIACV
MoE
247
0
0
15 Oct 2025
Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps
Do Tien Hai
T. T. N. Mai
T. Nguyen
Nhat Ho
Binh T. Nguyen
Christopher Drovandi
92
0
0
14 Oct 2025
Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning
Minghao Yang
Ren Togo
Guang Li
Takahiro Ogawa
Miki Haseyama
MoE
MoMe
93
0
0
01 Oct 2025
Rethinking Parameter Sharing for LLM Fine-Tuning with Multiple LoRAs
Hao Ban
Kaiyi Ji
MoE
133
0
0
29 Sep 2025
Revisit the Imbalance Optimization in Multi-task Learning: An Experimental Analysis
Yihang Guo
Tianyuan Yu
Liang Bai
Yanming Guo
Yirun Ruan
William Li
Weishi Zheng
100
2
0
28 Sep 2025
Joint Learning using Mixture-of-Expert-Based Representation for Enhanced Speech Generation and Robust Emotion Recognition
Jing-Tong Tzeng
John H. L. Hansen
Chi-Chun Lee
MoE
100
1
0
10 Sep 2025
Multi-Task Dense Prediction Fine-Tuning with Mixture of Fine-Grained Experts
Yangyang Xu
Xi Ye
Duo Su
MoE
MoMe
149
0
0
25 Jul 2025
Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training
Wooseong Jeong
Jegyeong Cho
Youngho Yoon
Kuk-Jin Yoon
TTA
222
0
0
10 Jul 2025
Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning
Wooseong Jeong
Kuk-Jin Yoon
247
0
0
10 Jul 2025
Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation
Szymon Płotka
Gizem Mert
Maciej Chrabaszcz
Ewa Szczurek
Arkadiusz Sitek
Mamba
MoE
172
1
0
08 Jul 2025
StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets
Anh-Quan Cao
Ivan Lopes
Raoul de Charette
171
1
0
09 Jun 2025
EndoARSS: Adapting Spatially-Aware Foundation Model for Efficient Activity Recognition and Semantic Segmentation in Endoscopic Surgery
Advanced Intelligent Systems (Adv. Intell. Syst.), 2025
Guankun Wang
Rui Tang
Mengya Xu
Long Bai
Huxin Gao
Hongliang Ren
121
0
0
07 Jun 2025
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
Yilun Kong
Guozheng Ma
Qi Zhao
Haoyu Wang
Li Shen
Xueqian Wang
Dacheng Tao
MoE
OffRL
165
3
0
30 May 2025
Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks
Computer Vision and Pattern Recognition (CVPR), 2025
Uranik Berisha
Jens Mehnert
Alexandru Paul Condurache
MoE
170
1
0
21 May 2025
Injecting Imbalance Sensitivity for Multi-Task Learning
International Joint Conference on Artificial Intelligence (IJCAI), 2025
Zhipeng Zhou
Liu Liu
Peilin Zhao
Wei Gong
211
0
0
11 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
917
31
0
10 Mar 2025
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
Hanwen Zhong
Jiaxin Chen
Yutong Zhang
Di Huang
Yunhong Wang
MoE
250
0
0
12 Jan 2025
Expert Routing with Synthetic Data for Continual Learning
Yewon Byun
Sanket Vaibhav Mehta
Saurabh Garg
Emma Strubell
Michael Oberst
Bryan Wilder
Zachary Chase Lipton
441
0
0
22 Dec 2024
HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting
Shaohan Yu
Pan Deng
Yu Zhao
Jiaheng Liu
Ziáng Wang
MoE
1.1K
0
0
30 Nov 2024
HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy
Shuqing Luo
Jie Peng
Pingzhi Li
Tianlong Chen
MoE
159
0
0
02 Nov 2024
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Nam V. Nguyen
Thong T. Doan
Luong Tran
Van Nguyen
Quang Pham
MoE
520
4
0
01 Nov 2024
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
Neural Information Processing Systems (NeurIPS), 2024
Ruisi Cai
Yeonju Ro
Geon-Woo Kim
Peihao Wang
Babak Ehteshami Bejnordi
Aditya Akella
Liang Luo
MoE
159
8
0
24 Oct 2024
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
Xumeng Han
Longhui Wei
Bushi Liu
Zipeng Wang
Chenhui Qiang
Xin He
Yingfei Sun
Zhenjun Han
Qi Tian
MoE
381
11
0
21 Oct 2024
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
International Conference on Learning Representations (ICLR), 2024
Yuxiang Lu
Shengcao Cao
Yu-Xiong Wang
419
4
0
18 Oct 2024
Quadratic Gating Mixture of Experts: Statistical Insights into Self-Attention
Pedram Akbarian
Huy Le Nguyen
Xing Han
Nhat Ho
MoE
364
3
0
15 Oct 2024
Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive Regularization
European Conference on Computer Vision (ECCV), 2024
Hongtao Wu
Yijun Yang
Angelica I Aviles-Rivero
Jingjing Ren
Sixiang Chen
Zhaodong Sun
Lei Zhu
143
7
0
10 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
IEEE Circuits and Systems Magazine (IEEE CSM), 2024
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
161
17
0
08 Oct 2024
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild
Neural Information Processing Systems (NeurIPS), 2024
Xinyu Zhao
Zheyu Shen
Ruisi Cai
Yukun Zhou
Pingzhi Li
...
Binhang Yuan
Hongyi Wang
Ang Li
Zhangyang Wang
Tianlong Chen
MoMe
ALM
279
9
0
07 Oct 2024
Panoptic Perception for Autonomous Driving: A Survey
Yunge Li
Lanyu Xu
231
3
0
27 Aug 2024
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning
Yixiao Wang
Yifei Zhang
Mingxiao Huo
Ran Tian
Xiang Zhang
...
Chenfeng Xu
Pengliang Ji
Wei Zhan
Mingyu Ding
Masayoshi Tomizuka
MoE
270
42
0
01 Jul 2024
Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters
International Conference on Machine Learning (ICML), 2024
Yuhang Zhou
Zihua Zhao
Haolin Li
Siyuan Du
Jiangchao Yao
Ya Zhang
Yanfeng Wang
MoMe
MoE
219
6
0
14 Jun 2024
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Neural Information Processing Systems (NeurIPS), 2024
Xingkui Zhu
Yiran Guan
Dingkang Liang
Yuchao Chen
Yuliang Liu
Xiang Bai
MoE
163
10
0
07 Jun 2024
Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation
Rongyu Zhang
Aosong Cheng
Yulin Luo
Gaole Dai
Huanrui Yang
...
Ran Xu
Li Du
Yuan Du
Yanbing Jiang
Shanghang Zhang
MoE
TTA
161
9
0
26 May 2024
Mixture of Experts Meets Prompt-Based Continual Learning
Neural Information Processing Systems (NeurIPS), 2024
Minh Le
An Nguyen
Huy Nguyen
Trang Nguyen
Trang Pham
L. Ngo
Nhat Ho
CLL
431
31
0
23 May 2024
Statistical Advantages of Perturbing Cosine Router in Mixture of Experts
International Conference on Learning Representations (ICLR), 2024
Huy Le Nguyen
Pedram Akbarian
Trang Pham
Trang Nguyen
Shujian Zhang
Nhat Ho
MoE
301
2
0
23 May 2024
Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts
Huy Nguyen
Nhat Ho
Alessandro Rinaldo
297
14
0
22 May 2024
Joint-Task Regularization for Partially Labeled Multi-Task Learning
Computer Vision and Pattern Recognition (CVPR), 2024
Kento Nishi
Junsik Kim
Wanhua Li
Hanspeter Pfister
273
6
0
02 Apr 2024
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Boran Han
Shuai Zhang
Xingjian Shi
Markus Reichstein
188
40
0
01 Apr 2024
Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity
Ruijie Quan
Wenguan Wang
Zhibo Tian
Fan Ma
Yi Yang
179
22
0
29 Mar 2024
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Yuqi Yang
Peng-Tao Jiang
Qibin Hou
Hao Zhang
Jinwei Chen
Yue Liu
MoE
219
54
0
26 Mar 2024
Block Selective Reprogramming for On-device Training of Vision Transformers
Sreetama Sarkar
Souvik Kundu
Kai Zheng
Peter A. Beerel
174
5
0
25 Mar 2024
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data
Computer Vision and Pattern Recognition (CVPR), 2024
Hanrong Ye
Dan Xu
DiffM
177
12
0
22 Mar 2024
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Nanyi Fei
Guoxing Yang
Zhiwu Lu
234
4
0
07 Mar 2024
InterroGate: Learning to Share, Specialize, and Prune Representations for Multi-task Learning
B. Bejnordi
Gaurav Kumar
Amelie Royer
Christos Louizos
Tijmen Blankevoort
Mohsen Ghafoorian
CVBM
181
1
0
26 Feb 2024
Mixtures of Experts Unlock Parameter Scaling for Deep RL
J. Obando-Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob N. Foerster
Gintare Karolina Dziugaite
Doina Precup
Pablo Samuel Castro
426
56
0
13 Feb 2024
Differentially Private Training of Mixture of Experts Models
Pierre Tholoniat
Huseyin A. Inan
Janardhan Kulkarni
Robert Sim
MoE
144
2
0
11 Feb 2024
On Parameter Estimation in Deviated Gaussian Mixture of Experts
Huy Nguyen
Khai Nguyen
Nhat Ho
174
1
0
07 Feb 2024
1
2
Next