Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1701.06538
Cited By
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
23 January 2017
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer"
50 / 499 papers shown
Title
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs
Shaoxiang Chen
Zequn Jie
Lin Ma
MoE
45
46
0
29 Jan 2024
LocMoE: A Low-Overhead MoE for Large Language Model Training
Jing Li
Zhijie Sun
Xuan He
Li Zeng
Yi Lin
Entong Li
Binfan Zheng
Rongqian Zhao
Xin Chen
MoE
30
11
0
25 Jan 2024
M
3
^3
3
TN: Multi-gate Mixture-of-Experts based Multi-valued Treatment Network for Uplift Modeling
Zexu Sun
Xu Chen
27
3
0
24 Jan 2024
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
Shiwei Zhang
Lansong Diao
Chuan Wu
Zongyan Cao
Siyu Wang
Wei Lin
43
12
0
11 Jan 2024
GLIMPSE: Generalized Local Imaging with MLPs
AmirEhsan Khorashadizadeh
Valentin Debarnot
Tianlin Liu
Ivan Dokmanić
36
1
0
01 Jan 2024
Machine learning and domain decomposition methods -- a survey
A. Klawonn
M. Lanser
J. Weber
AI4CE
24
7
0
21 Dec 2023
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
41
31
0
08 Dec 2023
Enhancing Molecular Property Prediction via Mixture of Collaborative Experts
Xu Yao
Shuang Liang
Songqiao Han
Hailiang Huang
29
4
0
06 Dec 2023
Language-driven All-in-one Adverse Weather Removal
Hao Yang
Liyuan Pan
Yan Yang
Wei Liang
VLM
KELM
29
18
0
03 Dec 2023
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
47
0
0
01 Dec 2023
Gene-MOE: A sparsely gated prognosis and classification framework exploiting pan-cancer genomic information
Xiangyu Meng
Xue Li
Qing Yang
Huanhuan Dai
Lian Qiao
Hongzhen Ding
Long Hao
Xun Wang
11
0
0
29 Nov 2023
Bidirectional Reactive Programming for Machine Learning
D. Potop-Butucaru
Albert Cohen
Gordon Plotkin
Hugo Pompougnac
KELM
AI4CE
16
0
0
28 Nov 2023
Conditional Prompt Tuning for Multimodal Fusion
Ruixia Jiang
Lingbo Liu
Changwen Chen
22
0
0
28 Nov 2023
HOPE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts
Do Huu Dat
Po Yuan Mao
Tien Hoang Nguyen
Wray L. Buntine
Bennamoun
56
1
0
23 Nov 2023
More Samples or More Prompts? Exploring Effective In-Context Sampling for LLM Few-Shot Prompt Engineering
Bingsheng Yao
Guiming Hardy Chen
Ruishi Zou
Yuxuan Lu
Jiachen Li
Shao Zhang
Yisi Sang
Sijia Liu
James A. Hendler
Dakuo Wang
45
13
0
16 Nov 2023
SiRA: Sparse Mixture of Low Rank Adaptation
Yun Zhu
Nevan Wichers
Chu-Cheng Lin
Xinyi Wang
Tianlong Chen
...
Han Lu
Canoee Liu
Liangchen Luo
Jindong Chen
Lei Meng
MoE
25
27
0
15 Nov 2023
Mixture of Weak & Strong Experts on Graphs
Hanqing Zeng
Hanjia Lyu
Diyi Hu
Yinglong Xia
Jiebo Luo
28
3
0
09 Nov 2023
Mixture-of-Experts for Open Set Domain Adaptation: A Dual-Space Detection Approach
Zhenbang Du
Jiayu An
Yunlu Tu
Jiahao Hong
Dongrui Wu
MoE
28
1
0
01 Nov 2023
Understanding the Effects of Projectors in Knowledge Distillation
Yudong Chen
Sen Wang
Jiajun Liu
Xuwei Xu
Frank de Hoog
Brano Kusy
Zi Huang
26
0
0
26 Oct 2023
A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
Huy Nguyen
Pedram Akbarian
TrungTin Nguyen
Nhat Ho
32
10
0
22 Oct 2023
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer
Boan Liu
Liang Ding
Li Shen
Keqin Peng
Yu Cao
Dazhao Cheng
Dacheng Tao
MoE
36
7
0
15 Oct 2023
Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach
KB Vimal
Saketh Bachu
Tanmay Garg
Niveditha Lakshmi Narasimhan
Raghavan Konuru
Vineeth N. Balasubramanian
42
2
0
05 Sep 2023
MvFS: Multi-view Feature Selection for Recommender System
Youngjune Lee
Yeongjong Jeong
Keunchan Park
SeongKu Kang
30
12
0
05 Sep 2023
Enhancing Mobile Face Anti-Spoofing: A Robust Framework for Diverse Attack Types under Screen Flash
Weihua Liu
Chaochao Lin
Yunzhen Yan
CVBM
AAML
15
1
0
29 Aug 2023
SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget
Rui Kong
Yuanchun Li
Qingtian Feng
Weijun Wang
Xiaozhou Ye
Ye Ouyang
L. Kong
Yunxin Liu
MoE
34
8
0
29 Aug 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLM
VLM
MoE
60
9
0
23 Aug 2023
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts
Wenyan Cong
Hanxue Liang
Peihao Wang
Zhiwen Fan
Tianlong Chen
M. Varma
Yi Wang
Zhangyang Wang
MoE
34
21
0
22 Aug 2023
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Yihua Zhang
Ruisi Cai
Tianlong Chen
Guanhua Zhang
Huan Zhang
Pin-Yu Chen
Shiyu Chang
Zhangyang Wang
Sijia Liu
MoE
AAML
OOD
34
16
0
19 Aug 2023
AST-MHSA : Code Summarization using Multi-Head Self-Attention
Y. Nagaraj
U. Gupta
18
1
0
10 Aug 2023
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
121
114
0
02 Aug 2023
TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts
Hanrong Ye
Dan Xu
MoE
42
26
0
28 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
22
41
0
12 Jul 2023
Attention Weighted Mixture of Experts with Contrastive Learning for Personalized Ranking in E-commerce
Juan Gong
Zhe Chen
Chao Ma
Zhuojian Xiao
Hong Wang
Guoyu Tang
Lin Liu
Sulong Xu
Bo Long
Yunjiang Jiang
16
4
0
08 Jun 2023
Neural Markov Jump Processes
Patrick Seifner
Ramses J. Sanchez
BDL
35
7
0
31 May 2023
Lifting the Curse of Capacity Gap in Distilling Language Models
Chen Zhang
Yang Yang
Jiahao Liu
Jingang Wang
Yunsen Xian
Benyou Wang
Dawei Song
MoE
32
19
0
20 May 2023
Perpetual Humanoid Control for Real-time Simulated Avatars
Zhengyi Luo
Jinkun Cao
Alexander W. Winkler
Kris M. Kitani
Weipeng Xu
58
88
0
10 May 2023
Multi-Path Transformer is Better: A Case Study on Neural Machine Translation
Ye Lin
Shuhan Zhou
Yanyang Li
Anxiang Ma
Tong Xiao
Jingbo Zhu
35
0
0
10 May 2023
Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity
Da Xu
Maha Elbayad
Kenton W. Murray
Jean Maillard
Vedanuj Goswami
MoE
47
3
0
03 May 2023
Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
Tao Lei
Junwen Bai
Siddhartha Brahma
Joshua Ainslie
Kenton Lee
...
Vincent Zhao
Yuexin Wu
Bo-wen Li
Yu Zhang
Ming-Wei Chang
BDL
AI4CE
30
54
0
11 Apr 2023
PoseFusion: Robust Object-in-Hand Pose Estimation with SelectLSTM
Yuyang Tu
Junnan Jiang
Shuang Li
Norman Hendrich
Miaopeng Li
Jianwei Zhang
24
7
0
10 Apr 2023
Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling
Haotao Wang
Ziyu Jiang
Yuning You
Yan Han
Gaowen Liu
Jayanth Srinivasa
Ramana Rao Kompella
Zhangyang Wang
26
28
0
06 Apr 2023
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
Hanlin Wang
Yilu Wu
Sheng Guo
Limin Wang
VGen
DiffM
73
30
0
26 Mar 2023
WM-MoE: Weather-aware Multi-scale Mixture-of-Experts for Blind Adverse Weather Removal
Yulin Luo
Rui Zhao
Xi Wei
Jinwei Chen
Yijie Lu
Shenghao Xie
Tianyu Wang
Ruiqin Xiong
Ming Lu
Shanghang Zhang
31
3
0
24 Mar 2023
Memorization Capacity of Neural Networks with Conditional Computation
Erdem Koyuncu
38
4
0
20 Mar 2023
CoLT5: Faster Long-Range Transformers with Conditional Computation
Joshua Ainslie
Tao Lei
Michiel de Jong
Santiago Ontañón
Siddhartha Brahma
...
Mandy Guo
James Lee-Thorp
Yi Tay
Yun-hsuan Sung
Sumit Sanghai
LLMAG
33
63
0
17 Mar 2023
Adaptive Rotated Convolution for Rotated Object Detection
Yifan Pu
Yiru Wang
Zhuofan Xia
Yizeng Han
Yulin Wang
Weihao Gan
Zidong Wang
S. Song
Gao Huang
23
76
0
14 Mar 2023
Simultaneous Action Recognition and Human Whole-Body Motion and Dynamics Prediction from Wearable Sensors
Kourosh Darvish
S. Ivaldi
Daniele Pucci
AI4CE
28
2
0
14 Mar 2023
ViM: Vision Middleware for Unified Downstream Transferring
Yutong Feng
Biao Gong
Jianwen Jiang
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
32
1
0
13 Mar 2023
HiNet: Novel Multi-Scenario & Multi-Task Learning with Hierarchical Information Extraction
Jie Zhou
Xia Cao
Wenhao Li
Lin Bo
Kun Zhang
Chuan Luo
Qian Yu
29
24
0
10 Mar 2023
SMoA: Sparse Mixture of Adapters to Mitigate Multiple Dataset Biases
Yanchen Liu
Jing Yang
Yan Chen
Jing Liu
Huaqin Wu
MoE
47
2
0
28 Feb 2023
Previous
1
2
3
4
5
6
...
8
9
10
Next