M $^3$ ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

Neural Information Processing Systems (NeurIPS), 2022

26 October 2022

ArXiv (abs)PDF HTML Github (118★)

Papers citing "M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design"

21 / 71 papers shown

Title
On Least Square Estimation in Softmax Gating Mixture of ExpertsInternational Conference on Machine Learning (ICML), 2024 Huy Nguyen Nhat Ho Alessandro Rinaldo 211 21 0 05 Feb 2024
Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?International Conference on Machine Learning (ICML), 2024 Huy Nguyen Pedram Akbarian Nhat Ho MoE 199 17 0 25 Jan 2024
Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation Rongyu Zhang Yulin Luo Jiaming Liu Huanrui Yang Zhen Dong ... Tomoyuki Okuno Yohei Nakata Kurt Keutzer Yuan Du Shanghang Zhang MoMe MoE 194 6 0 27 Dec 2023
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape Timothy R. McIntosh Teo Susnjak Tong Liu Paul Watters Malka N. Halgamuge 341 70 0 18 Dec 2023
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank ExpertsComputer Vision and Pattern Recognition (CVPR), 2023 Jialin Wu Xia Hu Yaqing Wang Bo Pang Radu Soricut MoE 186 31 0 01 Dec 2023
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts ModelsConference on Machine Learning and Systems (MLSys), 2023 Zhixu Du Shiyu Li Yuhao Wu Xiangyu Jiang Jingwei Sun Qilin Zheng Yongkai Wu Ang Li Hai Helen Li Yiran Chen MoE 330 29 0 29 Oct 2023
A General Theory for Softmax Gating Multinomial Logistic Mixture of ExpertsInternational Conference on Machine Learning (ICML), 2023 Huy Nguyen Pedram Akbarian TrungTin Nguyen Nhat Ho 217 17 0 22 Oct 2023
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing PolicyInternational Conference on Learning Representations (ICLR), 2023 Pingzhi Li Zhenyu Zhang Prateek Yadav Yi-Lin Sung Yu Cheng Mohit Bansal Tianlong Chen MoMe 202 71 0 02 Oct 2023
Multi-task Learning with 3D-Aware RegularizationInternational Conference on Learning Representations (ICLR), 2023 Weihong Li Jingyu Sun A. Leonardis Hakan Bilen 155 8 0 02 Oct 2023
Statistical Perspective of Top-K Sparse Softmax Gating Mixture of ExpertsInternational Conference on Learning Representations (ICLR), 2023 Huy Nguyen Pedram Akbarian Fanqi Yan Nhat Ho MoE 259 23 0 25 Sep 2023
SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory BudgetAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Rui Kong Yuanchun Li Qingtian Feng Weijun Wang Xiaozhou Ye Ye Ouyang Lingyu Kong Yunxin Liu MoE 294 17 0 29 Aug 2023
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-ExpertsIEEE International Conference on Computer Vision (ICCV), 2023 Wenyan Cong Hanxue Liang Peihao Wang Zhiwen Fan Tianlong Chen M. Varma Yi Wang Zinan Lin MoE 213 30 0 22 Aug 2023
TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-ExpertsIEEE International Conference on Computer Vision (ICCV), 2023 Hanrong Ye Dan Xu MoE 201 45 0 28 Jul 2023
SINC: Self-Supervised In-Context Learning for Vision-Language TasksIEEE International Conference on Computer Vision (ICCV), 2023 Yi-Syuan Chen Yun-Zhu Song Cheng Yu Yeo Bei Liu Jianlong Fu Hong-Han Shuai VLM LRM 195 7 0 15 Jul 2023
Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts Rishov Sarkar Hanxue Liang Zhiwen Fan Zinan Lin Cong Hao MoE 224 34 0 30 May 2023
Demystifying Softmax Gating Function in Gaussian Mixture of ExpertsNeural Information Processing Systems (NeurIPS), 2023 Huy Nguyen TrungTin Nguyen Nhat Ho 176 32 0 05 May 2023
AdaMTL: Adaptive Input-dependent Inference for Efficient Multi-Task Learning Marina Neseem Ahmed A. Agiza Sherief Reda 123 9 0 17 Apr 2023
Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity ModelingNeural Information Processing Systems (NeurIPS), 2023 Haotao Wang Ziyu Jiang Yuning You Yan Han Gaowen Liu Jayanth Srinivasa Ramana Rao Kompella Zinan Lin 253 63 0 06 Apr 2023
Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook for Sparse Neural Network Researchers Shiwei Liu Zinan Lin 349 33 0 06 Feb 2023
Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners Zitian Chen Songlin Yang Mingyu Ding Zhenfang Chen Hengshuang Zhao E. Learned-Miller Chuang Gan MoE 92 17 0 15 Dec 2022
Accelerating Distributed MoE Training and Inference with LinaUSENIX Annual Technical Conference (USENIX ATC), 2022 Jiamin Li Yimin Jiang Yibo Zhu Cong Wang Hong-Yu Xu MoE 175 99 0 31 Oct 2022