ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.14793
  4. Cited By
M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task
  Learning with Model-Accelerator Co-design

M3^33ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

26 October 2022
Hanxue Liang
Zhiwen Fan
Rishov Sarkar
Ziyu Jiang
Tianlong Chen
Kai Zou
Yu Cheng
Cong Hao
Zhangyang Wang
    MoE
ArXivPDFHTML

Papers citing "M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design"

50 / 58 papers shown
Title
Injecting Imbalance Sensitivity for Multi-Task Learning
Zhipeng Zhou
Liu Liu
Peilin Zhao
Wei Gong
53
0
0
11 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
87
1
0
10 Mar 2025
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning
Hanwen Zhong
Jiaxin Chen
Yutong Zhang
Di Huang
Yunhong Wang
MoE
42
0
0
12 Jan 2025
Generate to Discriminate: Expert Routing for Continual Learning
Generate to Discriminate: Expert Routing for Continual Learning
Yewon Byun
Sanket Vaibhav Mehta
Saurabh Garg
Emma Strubell
Michael Oberst
Bryan Wilder
Zachary Chase Lipton
69
0
0
31 Dec 2024
HiMoE: Heterogeneity-Informed Mixture-of-Experts for Fair Spatial-Temporal Forecasting
Shaohan Yu
Pan Deng
Yu Zhao
J. Liu
Ziáng Wang
MoE
104
0
0
30 Nov 2024
HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO
  Computation Redundancy
HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy
Shuqing Luo
Jie Peng
Pingzhi Li
Tianlong Chen
MoE
26
0
0
02 Nov 2024
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with
  System Co-Design
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
Ruisi Cai
Yeonju Ro
Geon-Woo Kim
Peihao Wang
Babak Ehteshami Bejnordi
Aditya Akella
Z. Wang
MoE
25
2
0
24 Oct 2024
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
Xumeng Han
Longhui Wei
Zhiyang Dou
Zipeng Wang
Chenhui Qiang
Xin He
Yingfei Sun
Zhenjun Han
Qi Tian
MoE
33
3
0
21 Oct 2024
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Yuxiang Lu
Shengcao Cao
Yu-xiong Wang
43
1
0
18 Oct 2024
Quadratic Gating Functions in Mixture of Experts: A Statistical Insight
Quadratic Gating Functions in Mixture of Experts: A Statistical Insight
Pedram Akbarian
Huy Le Nguyen
Xing Han
Nhat Ho
MoE
32
0
0
15 Oct 2024
Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts
  and Distribution-Driven Contrastive Regularization
Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive Regularization
Hongtao Wu
Yijun Yang
Angelica I Aviles-Rivero
Jingjing Ren
Sixiang Chen
Haoyu Chen
Lei Zhu
37
0
0
10 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large
  Language Models
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
29
5
0
08 Oct 2024
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild
Xinyu Zhao
Guoheng Sun
Ruisi Cai
Yukun Zhou
Pingzhi Li
...
Binhang Yuan
Hongyi Wang
Ang Li
Zhangyang Wang
Tianlong Chen
MoMe
ALM
28
2
0
07 Oct 2024
Panoptic Perception for Autonomous Driving: A Survey
Panoptic Perception for Autonomous Driving: A Survey
Yunge Li
Lanyu Xu
27
2
0
27 Aug 2024
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for
  Robot Learning
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning
Yixiao Wang
Yifei Zhang
Mingxiao Huo
Ran Tian
Xiang Zhang
...
Chenfeng Xu
Pengliang Ji
Wei Zhan
Mingyu Ding
M. Tomizuka
MoE
36
17
0
01 Jul 2024
Exploring Training on Heterogeneous Data with Mixture of Low-rank
  Adapters
Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters
Yuhang Zhou
Zihua Zhao
Haolin Li
Siyuan Du
Jiangchao Yao
Ya Zhang
Yanfeng Wang
MoMe
MoE
27
3
0
14 Jun 2024
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for
  Vision Tasks
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Xingkui Zhu
Yiran Guan
Dingkang Liang
Yuchao Chen
Yuliang Liu
Xiang Bai
MoE
35
5
0
07 Jun 2024
Decomposing the Neurons: Activation Sparsity via Mixture of Experts for
  Continual Test Time Adaptation
Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation
Rongyu Zhang
Aosong Cheng
Yulin Luo
Gaole Dai
Huanrui Yang
...
Ran Xu
Li Du
Yuan Du
Yanbing Jiang
Shanghang Zhang
MoE
TTA
46
6
0
26 May 2024
Mixture of Experts Meets Prompt-Based Continual Learning
Mixture of Experts Meets Prompt-Based Continual Learning
Minh Le
An Nguyen
Huy Nguyen
Trang Nguyen
Trang Pham
L. Ngo
Nhat Ho
CLL
41
7
0
23 May 2024
Statistical Advantages of Perturbing Cosine Router in Mixture of Experts
Statistical Advantages of Perturbing Cosine Router in Mixture of Experts
Huy Le Nguyen
Pedram Akbarian
Trang Pham
Trang Nguyen
Shujian Zhang
Nhat Ho
MoE
41
2
0
23 May 2024
Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture
  of Experts
Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts
Huy Nguyen
Nhat Ho
Alessandro Rinaldo
39
3
0
22 May 2024
Joint-Task Regularization for Partially Labeled Multi-Task Learning
Joint-Task Regularization for Partially Labeled Multi-Task Learning
Kento Nishi
Junsik Kim
Wanhua Li
Hanspeter Pfister
19
0
0
02 Apr 2024
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Boran Han
Shuai Zhang
Xingjian Shi
Markus Reichstein
24
20
0
01 Apr 2024
Psychometry: An Omnifit Model for Image Reconstruction from Human Brain
  Activity
Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity
Ruijie Quan
Wenguan Wang
Zhibo Tian
Fan Ma
Yi Yang
34
12
0
29 Mar 2024
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Yuqi Yang
Peng-Tao Jiang
Qibin Hou
Hao Zhang
Jinwei Chen
Bo-wen Li
MoE
33
17
0
26 Mar 2024
Block Selective Reprogramming for On-device Training of Vision
  Transformers
Block Selective Reprogramming for On-device Training of Vision Transformers
Sreetama Sarkar
Souvik Kundu
Kai Zheng
P. Beerel
24
2
0
25 Mar 2024
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from
  Partially Annotated Data
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data
Hanrong Ye
Dan Xu
DiffM
50
4
0
22 Mar 2024
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Nanyi Fei
Guoxing Yang
Zhiwu Lu
45
3
0
07 Mar 2024
InterroGate: Learning to Share, Specialize, and Prune Representations
  for Multi-task Learning
InterroGate: Learning to Share, Specialize, and Prune Representations for Multi-task Learning
B. Bejnordi
Gaurav Kumar
Amelie Royer
Christos Louizos
Tijmen Blankevoort
Mohsen Ghafoorian
CVBM
34
0
0
26 Feb 2024
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Mixtures of Experts Unlock Parameter Scaling for Deep RL
J. Obando-Ceron
Ghada Sokar
Timon Willi
Clare Lyle
Jesse Farebrother
Jakob N. Foerster
Gintare Karolina Dziugaite
Doina Precup
Pablo Samuel Castro
42
27
0
13 Feb 2024
Differentially Private Training of Mixture of Experts Models
Differentially Private Training of Mixture of Experts Models
Pierre Tholoniat
Huseyin A. Inan
Janardhan Kulkarni
Robert Sim
MoE
14
1
0
11 Feb 2024
On Parameter Estimation in Deviated Gaussian Mixture of Experts
On Parameter Estimation in Deviated Gaussian Mixture of Experts
Huy Nguyen
Khai Nguyen
Nhat Ho
39
0
0
07 Feb 2024
On Least Square Estimation in Softmax Gating Mixture of Experts
On Least Square Estimation in Softmax Gating Mixture of Experts
Huy Nguyen
Nhat Ho
Alessandro Rinaldo
38
13
0
05 Feb 2024
Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?
Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?
Huy Nguyen
Pedram Akbarian
Nhat Ho
MoE
14
10
0
25 Jan 2024
Efficient Deweather Mixture-of-Experts with Uncertainty-aware
  Feature-wise Linear Modulation
Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation
Rongyu Zhang
Yulin Luo
Jiaming Liu
Huanrui Yang
Zhen Dong
...
Tomoyuki Okuno
Yohei Nakata
Kurt Keutzer
Yuan Du
Shanghang Zhang
MoMe
MoE
17
3
0
27 Dec 2023
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the
  Generative Artificial Intelligence (AI) Research Landscape
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
79
46
0
18 Dec 2023
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of
  Low-rank Experts
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu
Xia Hu
Yaqing Wang
Bo Pang
Radu Soricut
MoE
9
14
0
01 Dec 2023
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and
  Scalable Large Mixture-of-Experts Models
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
Zhixu Du
Shiyu Li
Yuhao Wu
Xiangyu Jiang
Jingwei Sun
Qilin Zheng
Yongkai Wu
Ang Li
Hai Helen Li
Yiran Chen
MoE
15
11
0
29 Oct 2023
A General Theory for Softmax Gating Multinomial Logistic Mixture of
  Experts
A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
Huy Nguyen
Pedram Akbarian
TrungTin Nguyen
Nhat Ho
16
10
0
22 Oct 2023
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its
  Routing Policy
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
Pingzhi Li
Zhenyu (Allen) Zhang
Prateek Yadav
Yi-Lin Sung
Yu Cheng
Mohit Bansal
Tianlong Chen
MoMe
18
33
0
02 Oct 2023
Multi-task Learning with 3D-Aware Regularization
Multi-task Learning with 3D-Aware Regularization
Weihong Li
Steven G. McDonagh
A. Leonardis
Hakan Bilen
16
3
0
02 Oct 2023
Statistical Perspective of Top-K Sparse Softmax Gating Mixture of
  Experts
Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts
Huy Nguyen
Pedram Akbarian
Fanqi Yan
Nhat Ho
MoE
33
16
0
25 Sep 2023
SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with
  Tunable Memory Budget
SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget
Rui Kong
Yuanchun Li
Qingtian Feng
Weijun Wang
Xiaozhou Ye
Ye Ouyang
L. Kong
Yunxin Liu
MoE
11
8
0
29 Aug 2023
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer
  with Mixture-of-View-Experts
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts
Wenyan Cong
Hanxue Liang
Peihao Wang
Zhiwen Fan
Tianlong Chen
M. Varma
Yi Wang
Zhangyang Wang
MoE
22
21
0
22 Aug 2023
TaskExpert: Dynamically Assembling Multi-Task Representations with
  Memorial Mixture-of-Experts
TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts
Hanrong Ye
Dan Xu
MoE
24
26
0
28 Jul 2023
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
Yi-Syuan Chen
Yun-Zhu Song
Cheng Yu Yeo
Bei Liu
Jianlong Fu
Hong-Han Shuai
VLM
LRM
24
4
0
15 Jul 2023
Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture
  with Task-level Sparsity via Mixture-of-Experts
Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts
Rishov Sarkar
Hanxue Liang
Zhiwen Fan
Zhangyang Wang
Cong Hao
MoE
12
17
0
30 May 2023
Demystifying Softmax Gating Function in Gaussian Mixture of Experts
Demystifying Softmax Gating Function in Gaussian Mixture of Experts
Huy Nguyen
TrungTin Nguyen
Nhat Ho
17
21
0
05 May 2023
AdaMTL: Adaptive Input-dependent Inference for Efficient Multi-Task
  Learning
AdaMTL: Adaptive Input-dependent Inference for Efficient Multi-Task Learning
Marina Neseem
Ahmed A. Agiza
Sherief Reda
10
5
0
17 Apr 2023
Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit
  Diversity Modeling
Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling
Haotao Wang
Ziyu Jiang
Yuning You
Yan Han
Gaowen Liu
Jayanth Srinivasa
Ramana Rao Kompella
Zhangyang Wang
16
27
0
06 Apr 2023
12
Next