ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.14793
  4. Cited By
M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task
  Learning with Model-Accelerator Co-design

M3^33ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

Neural Information Processing Systems (NeurIPS), 2022
26 October 2022
Hanxue Liang
Zhiwen Fan
Rishov Sarkar
Ziyu Jiang
Tianlong Chen
Kai Zou
Yu Cheng
Cong Hao
Zinan Lin
    MoE
ArXiv (abs)PDFHTMLGithub (118★)

Papers citing "M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design"

50 / 71 papers shown
Title
3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding
3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding
X. Wang
Chen Tang
Xiangyu Yue
Wei-Hong Li
3DV
129
0
0
25 Nov 2025
Parameter Aware Mamba Model for Multi-task Dense Prediction
Parameter Aware Mamba Model for Multi-task Dense Prediction
Xinzhuo Yu
Yunzhi Zhuge
Sitong Gong
Lu Zhang
Pingping Zhang
Huchuan Lu
Mamba
268
0
0
18 Nov 2025
Proto-Former: Unified Facial Landmark Detection by Prototype Transformer
Proto-Former: Unified Facial Landmark Detection by Prototype Transformer
Shengkai Hu
Haozhe Qi
Jun Wan
Jiaxing Huang
Lefei Zhang
Hang Sun
Dacheng Tao
ViT
108
1
0
17 Oct 2025
Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts
Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts
Li Bai
Qingqing Ye
Xinwei Zhang
Sen Zhang
Zi Liang
Jianliang Xu
Haibo Hu
FedMLMIACVMoE
247
0
0
15 Oct 2025
Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps
Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency without Model Sweeps
Do Tien Hai
T. T. N. Mai
T. Nguyen
Nhat Ho
Binh T. Nguyen
Christopher Drovandi
92
0
0
14 Oct 2025
Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning
Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning
Minghao Yang
Ren Togo
Guang Li
Takahiro Ogawa
Miki Haseyama
MoEMoMe
93
0
0
01 Oct 2025
Rethinking Parameter Sharing for LLM Fine-Tuning with Multiple LoRAs
Rethinking Parameter Sharing for LLM Fine-Tuning with Multiple LoRAs
Hao Ban
Kaiyi Ji
MoE
133
0
0
29 Sep 2025
Revisit the Imbalance Optimization in Multi-task Learning: An Experimental Analysis
Revisit the Imbalance Optimization in Multi-task Learning: An Experimental Analysis
Yihang Guo
Tianyuan Yu
Liang Bai
Yanming Guo
Yirun Ruan
William Li
Weishi Zheng
100
2
0
28 Sep 2025
Joint Learning using Mixture-of-Expert-Based Representation for Enhanced Speech Generation and Robust Emotion Recognition
Joint Learning using Mixture-of-Expert-Based Representation for Enhanced Speech Generation and Robust Emotion Recognition
Jing-Tong Tzeng
John H. L. Hansen
Chi-Chun Lee
MoE
100
1
0
10 Sep 2025
Multi-Task Dense Prediction Fine-Tuning with Mixture of Fine-Grained Experts
Multi-Task Dense Prediction Fine-Tuning with Mixture of Fine-Grained Experts
Yangyang Xu
Xi Ye
Duo Su
MoEMoMe
149
0
0
25 Jul 2025
Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training
Synchronizing Task Behavior: Aligning Multiple Tasks during Test-Time Training
Wooseong Jeong
Jegyeong Cho
Youngho Yoon
Kuk-Jin Yoon
TTA
222
0
0
10 Jul 2025
Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning
Resolving Token-Space Gradient Conflicts: Token Space Manipulation for Transformer-Based Multi-Task Learning
Wooseong Jeong
Kuk-Jin Yoon
247
0
0
10 Jul 2025
Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation
Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation
Szymon Płotka
Gizem Mert
Maciej Chrabaszcz
Ewa Szczurek
Arkadiusz Sitek
MambaMoE
172
1
0
08 Jul 2025
StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets
StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets
Anh-Quan Cao
Ivan Lopes
Raoul de Charette
171
1
0
09 Jun 2025
EndoARSS: Adapting Spatially-Aware Foundation Model for Efficient Activity Recognition and Semantic Segmentation in Endoscopic Surgery
EndoARSS: Adapting Spatially-Aware Foundation Model for Efficient Activity Recognition and Semantic Segmentation in Endoscopic SurgeryAdvanced Intelligent Systems (Adv. Intell. Syst.), 2025
Guankun Wang
Rui Tang
Mengya Xu
Long Bai
Huxin Gao
Hongliang Ren
121
0
0
07 Jun 2025
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
Yilun Kong
Guozheng Ma
Qi Zhao
Haoyu Wang
Li Shen
Xueqian Wang
Dacheng Tao
MoEOffRL
165
3
0
30 May 2025
Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks
Efficient Data Driven Mixture-of-Expert Extraction from Trained NetworksComputer Vision and Pattern Recognition (CVPR), 2025
Uranik Berisha
Jens Mehnert
Alexandru Paul Condurache
MoE
170
1
0
21 May 2025
Injecting Imbalance Sensitivity for Multi-Task LearningInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Zhipeng Zhou
Liu Liu
Peilin Zhao
Wei Gong
211
0
0
11 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
917
31
0
10 Mar 2025
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
Hanwen Zhong
Jiaxin Chen
Yutong Zhang
Di Huang
Yunhong Wang
MoE
250
0
0
12 Jan 2025
Expert Routing with Synthetic Data for Continual Learning
Expert Routing with Synthetic Data for Continual Learning
Yewon Byun
Sanket Vaibhav Mehta
Saurabh Garg
Emma Strubell
Michael Oberst
Bryan Wilder
Zachary Chase Lipton
441
0
0
22 Dec 2024
HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting
Shaohan Yu
Pan Deng
Yu Zhao
Jiaheng Liu
Ziáng Wang
MoE
1.1K
0
0
30 Nov 2024
HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO
  Computation Redundancy
HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy
Shuqing Luo
Jie Peng
Pingzhi Li
Tianlong Chen
MoE
159
0
0
02 Nov 2024
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Nam V. Nguyen
Thong T. Doan
Luong Tran
Van Nguyen
Quang Pham
MoE
520
4
0
01 Nov 2024
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with
  System Co-Design
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-DesignNeural Information Processing Systems (NeurIPS), 2024
Ruisi Cai
Yeonju Ro
Geon-Woo Kim
Peihao Wang
Babak Ehteshami Bejnordi
Aditya Akella
Liang Luo
MoE
159
8
0
24 Oct 2024
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
Xumeng Han
Longhui Wei
Bushi Liu
Zipeng Wang
Chenhui Qiang
Xin He
Yingfei Sun
Zhenjun Han
Qi Tian
MoE
381
11
0
21 Oct 2024
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task LearningInternational Conference on Learning Representations (ICLR), 2024
Yuxiang Lu
Shengcao Cao
Yu-Xiong Wang
419
4
0
18 Oct 2024
Quadratic Gating Mixture of Experts: Statistical Insights into Self-Attention
Quadratic Gating Mixture of Experts: Statistical Insights into Self-Attention
Pedram Akbarian
Huy Le Nguyen
Xing Han
Nhat Ho
MoE
364
3
0
15 Oct 2024
Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts
  and Distribution-Driven Contrastive Regularization
Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive RegularizationEuropean Conference on Computer Vision (ECCV), 2024
Hongtao Wu
Yijun Yang
Angelica I Aviles-Rivero
Jingjing Ren
Sixiang Chen
Zhaodong Sun
Lei Zhu
143
7
0
10 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large
  Language Models
A Survey: Collaborative Hardware and Software Design in the Era of Large Language ModelsIEEE Circuits and Systems Magazine (IEEE CSM), 2024
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
161
17
0
08 Oct 2024
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the WildNeural Information Processing Systems (NeurIPS), 2024
Xinyu Zhao
Zheyu Shen
Ruisi Cai
Yukun Zhou
Pingzhi Li
...
Binhang Yuan
Hongyi Wang
Ang Li
Zhangyang Wang
Tianlong Chen
MoMeALM
279
9
0
07 Oct 2024
Panoptic Perception for Autonomous Driving: A Survey
Panoptic Perception for Autonomous Driving: A Survey
Yunge Li
Lanyu Xu
231
3
0
27 Aug 2024
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for
  Robot Learning
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning
Yixiao Wang
Yifei Zhang
Mingxiao Huo
Ran Tian
Xiang Zhang
...
Chenfeng Xu
Pengliang Ji
Wei Zhan
Mingyu Ding
Masayoshi Tomizuka
MoE
270
42
0
01 Jul 2024
Exploring Training on Heterogeneous Data with Mixture of Low-rank
  Adapters
Exploring Training on Heterogeneous Data with Mixture of Low-rank AdaptersInternational Conference on Machine Learning (ICML), 2024
Yuhang Zhou
Zihua Zhao
Haolin Li
Siyuan Du
Jiangchao Yao
Ya Zhang
Yanfeng Wang
MoMeMoE
219
6
0
14 Jun 2024
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for
  Vision Tasks
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision TasksNeural Information Processing Systems (NeurIPS), 2024
Xingkui Zhu
Yiran Guan
Dingkang Liang
Yuchao Chen
Yuliang Liu
Xiang Bai
MoE
163
10
0
07 Jun 2024
Decomposing the Neurons: Activation Sparsity via Mixture of Experts for
  Continual Test Time Adaptation
Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation
Rongyu Zhang
Aosong Cheng
Yulin Luo
Gaole Dai
Huanrui Yang
...
Ran Xu
Li Du
Yuan Du
Yanbing Jiang
Shanghang Zhang
MoETTA
161
9
0
26 May 2024
Mixture of Experts Meets Prompt-Based Continual Learning
Mixture of Experts Meets Prompt-Based Continual LearningNeural Information Processing Systems (NeurIPS), 2024
Minh Le
An Nguyen
Huy Nguyen
Trang Nguyen
Trang Pham
L. Ngo
Nhat Ho
CLL
431
31
0
23 May 2024
Statistical Advantages of Perturbing Cosine Router in Mixture of Experts
Statistical Advantages of Perturbing Cosine Router in Mixture of ExpertsInternational Conference on Learning Representations (ICLR), 2024
Huy Le Nguyen
Pedram Akbarian
Trang Pham
Trang Nguyen
Shujian Zhang
Nhat Ho
MoE
301
2
0
23 May 2024
Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture
  of Experts
Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts
Huy Nguyen
Nhat Ho
Alessandro Rinaldo
297
14
0
22 May 2024
Joint-Task Regularization for Partially Labeled Multi-Task Learning
Joint-Task Regularization for Partially Labeled Multi-Task LearningComputer Vision and Pattern Recognition (CVPR), 2024
Kento Nishi
Junsik Kim
Wanhua Li
Hanspeter Pfister
273
6
0
02 Apr 2024
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Boran Han
Shuai Zhang
Xingjian Shi
Markus Reichstein
188
40
0
01 Apr 2024
Psychometry: An Omnifit Model for Image Reconstruction from Human Brain
  Activity
Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity
Ruijie Quan
Wenguan Wang
Zhibo Tian
Fan Ma
Yi Yang
179
22
0
29 Mar 2024
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Yuqi Yang
Peng-Tao Jiang
Qibin Hou
Hao Zhang
Jinwei Chen
Yue Liu
MoE
219
54
0
26 Mar 2024
Block Selective Reprogramming for On-device Training of Vision
  Transformers
Block Selective Reprogramming for On-device Training of Vision Transformers
Sreetama Sarkar
Souvik Kundu
Kai Zheng
Peter A. Beerel
174
5
0
25 Mar 2024
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from
  Partially Annotated Data
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated DataComputer Vision and Pattern Recognition (CVPR), 2024
Hanrong Ye
Dan Xu
DiffM
177
12
0
22 Mar 2024
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Nanyi Fei
Guoxing Yang
Zhiwu Lu
234
4
0
07 Mar 2024
InterroGate: Learning to Share, Specialize, and Prune Representations
  for Multi-task Learning
InterroGate: Learning to Share, Specialize, and Prune Representations for Multi-task Learning
B. Bejnordi
Gaurav Kumar
Amelie Royer
Christos Louizos
Tijmen Blankevoort
Mohsen Ghafoorian
CVBM
181
1
0
26 Feb 2024
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Mixtures of Experts Unlock Parameter Scaling for Deep RL
J. Obando-Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob N. Foerster
Gintare Karolina Dziugaite
Doina Precup
Pablo Samuel Castro
426
56
0
13 Feb 2024
Differentially Private Training of Mixture of Experts Models
Differentially Private Training of Mixture of Experts Models
Pierre Tholoniat
Huseyin A. Inan
Janardhan Kulkarni
Robert Sim
MoE
144
2
0
11 Feb 2024
On Parameter Estimation in Deviated Gaussian Mixture of Experts
On Parameter Estimation in Deviated Gaussian Mixture of Experts
Huy Nguyen
Khai Nguyen
Nhat Ho
174
1
0
07 Feb 2024
12
Next