Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1701.06538
Cited By
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
23 January 2017
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer"
50 / 495 papers shown
Title
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Xi Lin
Akshat Shrivastava
Liang Luo
Srinivasan Iyer
Mike Lewis
Gargi Gosh
Luke Zettlemoyer
Armen Aghajanyan
MoE
43
20
0
31 Jul 2024
Mixture of Modular Experts: Distilling Knowledge from a Multilingual Teacher into Specialized Modular Language Models
Mohammed Al-Maamari
Mehdi Ben Amor
Michael Granitzer
KELM
MoE
33
0
0
28 Jul 2024
A deeper look at depth pruning of LLMs
Shoaib Ahmed Siddiqui
Xin Dong
Greg Heinrich
Thomas Breuel
Jan Kautz
David M. Krueger
Pavlo Molchanov
40
7
0
23 Jul 2024
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag
Xianghao Kong
Jingtao Li
Michael Spranger
Lingjuan Lyu
DiffM
47
9
0
22 Jul 2024
MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs
Quang H. Nguyen
Duy C. Hoang
Juliette Decugis
Saurav Manchanda
Nitesh V. Chawla
Khoa D. Doan
Khoa D. Doan
45
6
0
15 Jul 2024
Low-Rank Interconnected Adaptation Across Layers
Yibo Zhong
Yao Zhou
OffRL
MoE
48
1
0
13 Jul 2024
Modality Agnostic Heterogeneous Face Recognition with Switch Style Modulators
Anjith George
S´ebastien Marcel
CVBM
28
2
0
11 Jul 2024
PLeaS -- Merging Models with Permutations and Least Squares
Anshul Nasery
J. Hayase
Pang Wei Koh
Sewoong Oh
MoMe
51
3
0
02 Jul 2024
Kolmogorov-Arnold Convolutions: Design Principles and Empirical Studies
Ivan Drokin
53
19
0
01 Jul 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang
Dong Shen
Chaoxiang Cai
Fan Yang
Size Li
Di Zhang
Xi Li
MoE
56
2
0
28 Jun 2024
Compositional Models for Estimating Causal Effects
Purva Pruthi
David D. Jensen
CML
67
0
0
25 Jun 2024
Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction
Bruce Rushing
MoE
15
0
0
24 Jun 2024
MoE-RBench
\texttt{MoE-RBench}
MoE-RBench
: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen
Xinyu Zhao
Tianlong Chen
Yu Cheng
MoE
76
5
0
17 Jun 2024
Attention as a Hypernetwork
Simon Schug
Seijin Kobayashi
Yassir Akram
João Sacramento
Razvan Pascanu
GNN
37
3
0
09 Jun 2024
Submodular Framework for Structured-Sparse Optimal Transport
Piyushi Manupriya
Pratik Jawanpuria
Karthik S. Gurumoorthy
SakethaNath Jagarlapudi
Bamdev Mishra
OT
97
0
0
07 Jun 2024
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Zheshu Song
Jianheng Zhuo
Yifan Yang
Ziyang Ma
Shixiong Zhang
Xie Chen
36
9
0
07 Jun 2024
Mixture-of-Agents Enhances Large Language Model Capabilities
Junlin Wang
Jue Wang
Ben Athiwaratkun
Ce Zhang
James Zou
LLMAG
AIFin
41
99
0
07 Jun 2024
OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference
Dujian Ding
Bicheng Xu
L. Lakshmanan
VLM
44
1
0
06 Jun 2024
Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach
Haoyu Han
Juanhui Li
Wei Huang
Xianfeng Tang
Hanqing Lu
Chen Luo
Hui Liu
Jiliang Tang
42
5
0
05 Jun 2024
Scorch: A Library for Sparse Deep Learning
Bobby Yan
Alexander J. Root
Trevor Gale
David Broman
Fredrik Kjolstad
33
0
0
27 May 2024
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
Xingqun Qi
Hengyuan Zhang
Yatian Wang
J. Pan
Chen Liu
...
Qixun Zhang
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Qi-fei Liu
DiffM
SLR
110
5
0
27 May 2024
Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders
Qichao Shentu
Beibu Li
Kai Zhao
Yang Shu
Zhongwen Rao
Lujia Pan
Bin Yang
Chenjuan Guo
AI4TS
53
5
0
24 May 2024
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Yongxin Guo
Zhenglin Cheng
Xiaoying Tang
Tao R. Lin
Tao Lin
MoE
64
7
0
23 May 2024
DirectMultiStep: Direct Route Generation for Multistep Retrosynthesis
Yu Shee
Haote Li
Anton Morgunov
Victor S. Batista
54
1
0
22 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min-Ling Zhang
MoE
46
28
0
18 May 2024
PhysMLE: Generalizable and Priors-Inclusive Multi-task Remote Physiological Measurement
Jiyao Wang
Hao Lu
Ange Wang
Xiao Yang
Ying Chen
Dengbo He
Kaishun Wu
26
3
0
10 May 2024
SUTRA: Scalable Multilingual Language Model Architecture
Abhijit Bendale
Michael Sapienza
Steven Ripplinger
Simon Gibbs
Jaewon Lee
Pranav Mistry
LRM
ELM
36
4
0
07 May 2024
Evaluation of Few-Shot Learning for Classification Tasks in the Polish Language
Tsimur Hadeliya
D. Kajtoch
46
0
0
27 Apr 2024
Double Mixture: Towards Continual Event Detection from Speech
Jingqi Kang
Tongtong Wu
Jinming Zhao
Guitao Wang
Yinwei Wei
Haomiao Yang
Guilin Qi
Yuan-Fang Li
Gholamreza Haffari
28
0
0
20 Apr 2024
Reducing the Barriers to Entry for Foundation Model Training
Paolo Faraboschi
Ellis Giles
Justin Hotard
Konstanty Owczarek
Andrew Wheeler
27
4
0
12 Apr 2024
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Yikang Shen
Zhen Guo
Tianle Cai
Zengyi Qin
MoE
ALM
46
26
0
11 Apr 2024
Mixture of Low-rank Experts for Transferable AI-Generated Image Detection
Zihan Liu
Hanyi Wang
Yaoyu Kang
Shilin Wang
MoE
41
12
0
07 Apr 2024
Two Heads are Better than One: Nested PoE for Robust Defense Against Multi-Backdoors
Victoria Graf
Qin Liu
Muhao Chen
AAML
37
8
0
02 Apr 2024
Tiny Models are the Computational Saver for Large Models
Qingyuan Wang
B. Cardiff
Antoine Frappé
Benoît Larras
Deepu John
44
2
0
26 Mar 2024
DiPaCo: Distributed Path Composition
Arthur Douillard
Qixuang Feng
Andrei A. Rusu
A. Kuncoro
Yani Donchev
Rachita Chhaparia
Ionel Gog
MarcÁurelio Ranzato
Jiajun Shen
Arthur Szlam
MoE
48
2
0
15 Mar 2024
Video Relationship Detection Using Mixture of Experts
A. Shaabana
Zahra Gharaee
Paul Fieguth
34
1
0
06 Mar 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Xudong Lu
Qi Liu
Yuhui Xu
Aojun Zhou
Siyuan Huang
Bo-Wen Zhang
Junchi Yan
Hongsheng Li
MoE
32
25
0
22 Feb 2024
LLMBind: A Unified Modality-Task Integration Framework
Bin Zhu
Munan Ning
Peng Jin
Bin Lin
Jinfa Huang
...
Junwu Zhang
Zhenyu Tang
Mingjun Pan
Xing Zhou
Li-ming Yuan
MLLM
40
6
0
22 Feb 2024
LiRank: Industrial Large Scale Ranking Models at LinkedIn
Fedor Borisyuk
Mingzhou Zhou
Qingquan Song
Siyu Zhu
B. Tiwana
...
Chen-Chen Jiang
Haichao Wei
Maneesh Varshney
Amol Ghoting
Souvik Ghosh
29
1
0
10 Feb 2024
Multimodal Clinical Trial Outcome Prediction with Large Language Models
Wenhao Zheng
Dongsheng Peng
Hongxia Xu
Yun-Qing Li
Hongtu Zhu
Tianfan Fu
Huaxiu Yao
Huaxiu Yao
50
5
0
09 Feb 2024
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
134
371
0
09 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
130
109
0
08 Feb 2024
Approximation Rates and VC-Dimension Bounds for (P)ReLU MLP Mixture of Experts
Anastasis Kratsios
Haitz Sáez de Ocáriz Borde
Takashi Furuya
Marc T. Law
MoE
41
1
0
05 Feb 2024
Rethinking RGB Color Representation for Image Restoration Models
Jaerin Lee
J. Park
Sungyong Baik
Kyoung Mu Lee
24
1
0
05 Feb 2024
FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion
Xing Han
Huy Nguyen
Carl Harris
Nhat Ho
S. Saria
MoE
77
16
0
05 Feb 2024
A Hyper-Transformer model for Controllable Pareto Front Learning with Split Feasibility Constraints
Tran Anh Tuan
Nguyen Viet Dung
Tran Ngoc Thang
39
3
0
04 Feb 2024
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Weijiao Zhang
Jindong Han
Zhao Xu
Hang Ni
Hao Liu
Hui Xiong
Hui Xiong
AI4CE
77
15
0
30 Jan 2024
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs
Shaoxiang Chen
Zequn Jie
Lin Ma
MoE
45
46
0
29 Jan 2024
LocMoE: A Low-Overhead MoE for Large Language Model Training
Jing Li
Zhijie Sun
Xuan He
Li Zeng
Yi Lin
Entong Li
Binfan Zheng
Rongqian Zhao
Xin Chen
MoE
30
11
0
25 Jan 2024
M
3
^3
3
TN: Multi-gate Mixture-of-Experts based Multi-valued Treatment Network for Uplift Modeling
Zexu Sun
Xu Chen
27
3
0
24 Jan 2024
Previous
1
2
3
4
5
...
8
9
10
Next