ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1701.06538
  4. Cited By
Outrageously Large Neural Networks: The Sparsely-Gated
  Mixture-of-Experts Layer

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

23 January 2017
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
    MoE
ArXivPDFHTML

Papers citing "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer"

50 / 495 papers shown
Title
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware
  Experts
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Xi Lin
Akshat Shrivastava
Liang Luo
Srinivasan Iyer
Mike Lewis
Gargi Gosh
Luke Zettlemoyer
Armen Aghajanyan
MoE
43
20
0
31 Jul 2024
Mixture of Modular Experts: Distilling Knowledge from a Multilingual
  Teacher into Specialized Modular Language Models
Mixture of Modular Experts: Distilling Knowledge from a Multilingual Teacher into Specialized Modular Language Models
Mohammed Al-Maamari
Mehdi Ben Amor
Michael Granitzer
KELM
MoE
33
0
0
28 Jul 2024
A deeper look at depth pruning of LLMs
A deeper look at depth pruning of LLMs
Shoaib Ahmed Siddiqui
Xin Dong
Greg Heinrich
Thomas Breuel
Jan Kautz
David M. Krueger
Pavlo Molchanov
40
7
0
23 Jul 2024
Stretching Each Dollar: Diffusion Training from Scratch on a
  Micro-Budget
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag
Xianghao Kong
Jingtao Li
Michael Spranger
Lingjuan Lyu
DiffM
47
9
0
22 Jul 2024
MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs
MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs
Quang H. Nguyen
Duy C. Hoang
Juliette Decugis
Saurav Manchanda
Nitesh V. Chawla
Khoa D. Doan
Khoa D. Doan
45
6
0
15 Jul 2024
Low-Rank Interconnected Adaptation Across Layers
Low-Rank Interconnected Adaptation Across Layers
Yibo Zhong
Yao Zhou
OffRL
MoE
48
1
0
13 Jul 2024
Modality Agnostic Heterogeneous Face Recognition with Switch Style
  Modulators
Modality Agnostic Heterogeneous Face Recognition with Switch Style Modulators
Anjith George
S´ebastien Marcel
CVBM
28
2
0
11 Jul 2024
PLeaS -- Merging Models with Permutations and Least Squares
PLeaS -- Merging Models with Permutations and Least Squares
Anshul Nasery
J. Hayase
Pang Wei Koh
Sewoong Oh
MoMe
51
3
0
02 Jul 2024
Kolmogorov-Arnold Convolutions: Design Principles and Empirical Studies
Kolmogorov-Arnold Convolutions: Design Principles and Empirical Studies
Ivan Drokin
53
19
0
01 Jul 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang
Dong Shen
Chaoxiang Cai
Fan Yang
Size Li
Di Zhang
Xi Li
MoE
56
2
0
28 Jun 2024
Compositional Models for Estimating Causal Effects
Compositional Models for Estimating Causal Effects
Purva Pruthi
David D. Jensen
CML
67
0
0
25 Jun 2024
Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis
  Construction
Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction
Bruce Rushing
MoE
15
0
0
24 Jun 2024
$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with
  Sparse Mixture-of-Experts
MoE-RBench\texttt{MoE-RBench}MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen
Xinyu Zhao
Tianlong Chen
Yu Cheng
MoE
76
5
0
17 Jun 2024
Attention as a Hypernetwork
Attention as a Hypernetwork
Simon Schug
Seijin Kobayashi
Yassir Akram
João Sacramento
Razvan Pascanu
GNN
37
3
0
09 Jun 2024
Submodular Framework for Structured-Sparse Optimal Transport
Submodular Framework for Structured-Sparse Optimal Transport
Piyushi Manupriya
Pratik Jawanpuria
Karthik S. Gurumoorthy
SakethaNath Jagarlapudi
Bamdev Mishra
OT
97
0
0
07 Jun 2024
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Zheshu Song
Jianheng Zhuo
Yifan Yang
Ziyang Ma
Shixiong Zhang
Xie Chen
36
9
0
07 Jun 2024
Mixture-of-Agents Enhances Large Language Model Capabilities
Mixture-of-Agents Enhances Large Language Model Capabilities
Junlin Wang
Jue Wang
Ben Athiwaratkun
Ce Zhang
James Zou
LLMAG
AIFin
41
99
0
07 Jun 2024
OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference
OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference
Dujian Ding
Bicheng Xu
L. Lakshmanan
VLM
44
1
0
06 Jun 2024
Node-wise Filtering in Graph Neural Networks: A Mixture of Experts
  Approach
Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach
Haoyu Han
Juanhui Li
Wei Huang
Xianfeng Tang
Hanqing Lu
Chen Luo
Hui Liu
Jiliang Tang
42
5
0
05 Jun 2024
Scorch: A Library for Sparse Deep Learning
Scorch: A Library for Sparse Deep Learning
Bobby Yan
Alexander J. Root
Trevor Gale
David Broman
Fredrik Kjolstad
33
0
0
27 May 2024
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
Xingqun Qi
Hengyuan Zhang
Yatian Wang
J. Pan
Chen Liu
...
Qixun Zhang
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Qi-fei Liu
DiffM
SLR
110
5
0
27 May 2024
Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders
Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders
Qichao Shentu
Beibu Li
Kai Zhao
Yang Shu
Zhongwen Rao
Lujia Pan
Bin Yang
Chenjuan Guo
AI4TS
53
5
0
24 May 2024
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Yongxin Guo
Zhenglin Cheng
Xiaoying Tang
Tao R. Lin
Tao Lin
MoE
64
7
0
23 May 2024
DirectMultiStep: Direct Route Generation for Multistep Retrosynthesis
DirectMultiStep: Direct Route Generation for Multistep Retrosynthesis
Yu Shee
Haote Li
Anton Morgunov
Victor S. Batista
54
1
0
22 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min-Ling Zhang
MoE
46
28
0
18 May 2024
PhysMLE: Generalizable and Priors-Inclusive Multi-task Remote
  Physiological Measurement
PhysMLE: Generalizable and Priors-Inclusive Multi-task Remote Physiological Measurement
Jiyao Wang
Hao Lu
Ange Wang
Xiao Yang
Ying Chen
Dengbo He
Kaishun Wu
26
3
0
10 May 2024
SUTRA: Scalable Multilingual Language Model Architecture
SUTRA: Scalable Multilingual Language Model Architecture
Abhijit Bendale
Michael Sapienza
Steven Ripplinger
Simon Gibbs
Jaewon Lee
Pranav Mistry
LRM
ELM
36
4
0
07 May 2024
Evaluation of Few-Shot Learning for Classification Tasks in the Polish
  Language
Evaluation of Few-Shot Learning for Classification Tasks in the Polish Language
Tsimur Hadeliya
D. Kajtoch
46
0
0
27 Apr 2024
Double Mixture: Towards Continual Event Detection from Speech
Double Mixture: Towards Continual Event Detection from Speech
Jingqi Kang
Tongtong Wu
Jinming Zhao
Guitao Wang
Yinwei Wei
Haomiao Yang
Guilin Qi
Yuan-Fang Li
Gholamreza Haffari
28
0
0
20 Apr 2024
Reducing the Barriers to Entry for Foundation Model Training
Reducing the Barriers to Entry for Foundation Model Training
Paolo Faraboschi
Ellis Giles
Justin Hotard
Konstanty Owczarek
Andrew Wheeler
27
4
0
12 Apr 2024
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Yikang Shen
Zhen Guo
Tianle Cai
Zengyi Qin
MoE
ALM
46
26
0
11 Apr 2024
Mixture of Low-rank Experts for Transferable AI-Generated Image
  Detection
Mixture of Low-rank Experts for Transferable AI-Generated Image Detection
Zihan Liu
Hanyi Wang
Yaoyu Kang
Shilin Wang
MoE
41
12
0
07 Apr 2024
Two Heads are Better than One: Nested PoE for Robust Defense Against
  Multi-Backdoors
Two Heads are Better than One: Nested PoE for Robust Defense Against Multi-Backdoors
Victoria Graf
Qin Liu
Muhao Chen
AAML
37
8
0
02 Apr 2024
Tiny Models are the Computational Saver for Large Models
Tiny Models are the Computational Saver for Large Models
Qingyuan Wang
B. Cardiff
Antoine Frappé
Benoît Larras
Deepu John
44
2
0
26 Mar 2024
DiPaCo: Distributed Path Composition
DiPaCo: Distributed Path Composition
Arthur Douillard
Qixuang Feng
Andrei A. Rusu
A. Kuncoro
Yani Donchev
Rachita Chhaparia
Ionel Gog
MarcÁurelio Ranzato
Jiajun Shen
Arthur Szlam
MoE
48
2
0
15 Mar 2024
Video Relationship Detection Using Mixture of Experts
Video Relationship Detection Using Mixture of Experts
A. Shaabana
Zahra Gharaee
Paul Fieguth
34
1
0
06 Mar 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for
  Mixture-of-Experts Large Language Models
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Xudong Lu
Qi Liu
Yuhui Xu
Aojun Zhou
Siyuan Huang
Bo-Wen Zhang
Junchi Yan
Hongsheng Li
MoE
32
25
0
22 Feb 2024
LLMBind: A Unified Modality-Task Integration Framework
LLMBind: A Unified Modality-Task Integration Framework
Bin Zhu
Munan Ning
Peng Jin
Bin Lin
Jinfa Huang
...
Junwu Zhang
Zhenyu Tang
Mingjun Pan
Xing Zhou
Li-ming Yuan
MLLM
40
6
0
22 Feb 2024
LiRank: Industrial Large Scale Ranking Models at LinkedIn
LiRank: Industrial Large Scale Ranking Models at LinkedIn
Fedor Borisyuk
Mingzhou Zhou
Qingquan Song
Siyu Zhu
B. Tiwana
...
Chen-Chen Jiang
Haichao Wei
Maneesh Varshney
Amol Ghoting
Souvik Ghosh
29
1
0
10 Feb 2024
Multimodal Clinical Trial Outcome Prediction with Large Language Models
Multimodal Clinical Trial Outcome Prediction with Large Language Models
Wenhao Zheng
Dongsheng Peng
Hongxia Xu
Yun-Qing Li
Hongtu Zhu
Tianfan Fu
Huaxiu Yao
Huaxiu Yao
50
5
0
09 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
134
371
0
09 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
130
109
0
08 Feb 2024
Approximation Rates and VC-Dimension Bounds for (P)ReLU MLP Mixture of
  Experts
Approximation Rates and VC-Dimension Bounds for (P)ReLU MLP Mixture of Experts
Anastasis Kratsios
Haitz Sáez de Ocáriz Borde
Takashi Furuya
Marc T. Law
MoE
41
1
0
05 Feb 2024
Rethinking RGB Color Representation for Image Restoration Models
Rethinking RGB Color Representation for Image Restoration Models
Jaerin Lee
J. Park
Sungyong Baik
Kyoung Mu Lee
24
1
0
05 Feb 2024
FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion
FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion
Xing Han
Huy Nguyen
Carl Harris
Nhat Ho
S. Saria
MoE
77
16
0
05 Feb 2024
A Hyper-Transformer model for Controllable Pareto Front Learning with
  Split Feasibility Constraints
A Hyper-Transformer model for Controllable Pareto Front Learning with Split Feasibility Constraints
Tran Anh Tuan
Nguyen Viet Dung
Tran Ngoc Thang
39
3
0
04 Feb 2024
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Weijiao Zhang
Jindong Han
Zhao Xu
Hang Ni
Hao Liu
Hui Xiong
Hui Xiong
AI4CE
77
15
0
30 Jan 2024
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts
  in Instruction Finetuning MLLMs
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs
Shaoxiang Chen
Zequn Jie
Lin Ma
MoE
45
46
0
29 Jan 2024
LocMoE: A Low-Overhead MoE for Large Language Model Training
LocMoE: A Low-Overhead MoE for Large Language Model Training
Jing Li
Zhijie Sun
Xuan He
Li Zeng
Yi Lin
Entong Li
Binfan Zheng
Rongqian Zhao
Xin Chen
MoE
30
11
0
25 Jan 2024
M$^3$TN: Multi-gate Mixture-of-Experts based Multi-valued Treatment
  Network for Uplift Modeling
M3^33TN: Multi-gate Mixture-of-Experts based Multi-valued Treatment Network for Uplift Modeling
Zexu Sun
Xu Chen
27
3
0
24 Jan 2024
Previous
12345...8910
Next