ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1701.06538
  4. Cited By
Outrageously Large Neural Networks: The Sparsely-Gated
  Mixture-of-Experts Layer

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

23 January 2017
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
    MoE
ArXivPDFHTML

Papers citing "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer"

50 / 499 papers shown
Title
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts
  in Instruction Finetuning MLLMs
LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs
Shaoxiang Chen
Zequn Jie
Lin Ma
MoE
45
46
0
29 Jan 2024
LocMoE: A Low-Overhead MoE for Large Language Model Training
LocMoE: A Low-Overhead MoE for Large Language Model Training
Jing Li
Zhijie Sun
Xuan He
Li Zeng
Yi Lin
Entong Li
Binfan Zheng
Rongqian Zhao
Xin Chen
MoE
30
11
0
25 Jan 2024
M$^3$TN: Multi-gate Mixture-of-Experts based Multi-valued Treatment
  Network for Uplift Modeling
M3^33TN: Multi-gate Mixture-of-Experts based Multi-valued Treatment Network for Uplift Modeling
Zexu Sun
Xu Chen
27
3
0
24 Jan 2024
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated
  Program Synthesis
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
Shiwei Zhang
Lansong Diao
Chuan Wu
Zongyan Cao
Siyu Wang
Wei Lin
43
12
0
11 Jan 2024
GLIMPSE: Generalized Local Imaging with MLPs
GLIMPSE: Generalized Local Imaging with MLPs
AmirEhsan Khorashadizadeh
Valentin Debarnot
Tianlin Liu
Ivan Dokmanić
36
1
0
01 Jan 2024
Machine learning and domain decomposition methods -- a survey
Machine learning and domain decomposition methods -- a survey
A. Klawonn
M. Lanser
J. Weber
AI4CE
24
7
0
21 Dec 2023
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language
  Models with 3D Parallelism
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
41
31
0
08 Dec 2023
Enhancing Molecular Property Prediction via Mixture of Collaborative
  Experts
Enhancing Molecular Property Prediction via Mixture of Collaborative Experts
Xu Yao
Shuang Liang
Songqiao Han
Hailiang Huang
29
4
0
06 Dec 2023
Language-driven All-in-one Adverse Weather Removal
Language-driven All-in-one Adverse Weather Removal
Hao Yang
Liyuan Pan
Yan Yang
Wei Liang
VLM
KELM
29
18
0
03 Dec 2023
SCHEME: Scalable Channel Mixer for Vision Transformers
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
47
0
0
01 Dec 2023
Gene-MOE: A sparsely gated prognosis and classification framework
  exploiting pan-cancer genomic information
Gene-MOE: A sparsely gated prognosis and classification framework exploiting pan-cancer genomic information
Xiangyu Meng
Xue Li
Qing Yang
Huanhuan Dai
Lian Qiao
Hongzhen Ding
Long Hao
Xun Wang
11
0
0
29 Nov 2023
Bidirectional Reactive Programming for Machine Learning
Bidirectional Reactive Programming for Machine Learning
D. Potop-Butucaru
Albert Cohen
Gordon Plotkin
Hugo Pompougnac
KELM
AI4CE
16
0
0
28 Nov 2023
Conditional Prompt Tuning for Multimodal Fusion
Conditional Prompt Tuning for Multimodal Fusion
Ruixia Jiang
Lingbo Liu
Changwen Chen
22
0
0
28 Nov 2023
HOPE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts
HOPE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts
Do Huu Dat
Po Yuan Mao
Tien Hoang Nguyen
Wray L. Buntine
Bennamoun
56
1
0
23 Nov 2023
More Samples or More Prompts? Exploring Effective In-Context Sampling
  for LLM Few-Shot Prompt Engineering
More Samples or More Prompts? Exploring Effective In-Context Sampling for LLM Few-Shot Prompt Engineering
Bingsheng Yao
Guiming Hardy Chen
Ruishi Zou
Yuxuan Lu
Jiachen Li
Shao Zhang
Yisi Sang
Sijia Liu
James A. Hendler
Dakuo Wang
45
13
0
16 Nov 2023
SiRA: Sparse Mixture of Low Rank Adaptation
SiRA: Sparse Mixture of Low Rank Adaptation
Yun Zhu
Nevan Wichers
Chu-Cheng Lin
Xinyi Wang
Tianlong Chen
...
Han Lu
Canoee Liu
Liangchen Luo
Jindong Chen
Lei Meng
MoE
25
27
0
15 Nov 2023
Mixture of Weak & Strong Experts on Graphs
Mixture of Weak & Strong Experts on Graphs
Hanqing Zeng
Hanjia Lyu
Diyi Hu
Yinglong Xia
Jiebo Luo
28
3
0
09 Nov 2023
Mixture-of-Experts for Open Set Domain Adaptation: A Dual-Space
  Detection Approach
Mixture-of-Experts for Open Set Domain Adaptation: A Dual-Space Detection Approach
Zhenbang Du
Jiayu An
Yunlu Tu
Jiahao Hong
Dongrui Wu
MoE
28
1
0
01 Nov 2023
Understanding the Effects of Projectors in Knowledge Distillation
Understanding the Effects of Projectors in Knowledge Distillation
Yudong Chen
Sen Wang
Jiajun Liu
Xuwei Xu
Frank de Hoog
Brano Kusy
Zi Huang
26
0
0
26 Oct 2023
A General Theory for Softmax Gating Multinomial Logistic Mixture of
  Experts
A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
Huy Nguyen
Pedram Akbarian
TrungTin Nguyen
Nhat Ho
32
10
0
22 Oct 2023
Diversifying the Mixture-of-Experts Representation for Language Models
  with Orthogonal Optimizer
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer
Boan Liu
Liang Ding
Li Shen
Keqin Peng
Yu Cao
Dazhao Cheng
Dacheng Tao
MoE
36
7
0
15 Oct 2023
Building a Winning Team: Selecting Source Model Ensembles using a
  Submodular Transferability Estimation Approach
Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach
KB Vimal
Saketh Bachu
Tanmay Garg
Niveditha Lakshmi Narasimhan
Raghavan Konuru
Vineeth N. Balasubramanian
42
2
0
05 Sep 2023
MvFS: Multi-view Feature Selection for Recommender System
MvFS: Multi-view Feature Selection for Recommender System
Youngjune Lee
Yeongjong Jeong
Keunchan Park
SeongKu Kang
30
12
0
05 Sep 2023
Enhancing Mobile Face Anti-Spoofing: A Robust Framework for Diverse
  Attack Types under Screen Flash
Enhancing Mobile Face Anti-Spoofing: A Robust Framework for Diverse Attack Types under Screen Flash
Weihua Liu
Chaochao Lin
Yunzhen Yan
CVBM
AAML
15
1
0
29 Aug 2023
SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with
  Tunable Memory Budget
SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget
Rui Kong
Yuanchun Li
Qingtian Feng
Weijun Wang
Xiaozhou Ye
Ye Ouyang
L. Kong
Yunxin Liu
MoE
34
8
0
29 Aug 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and
  Modality-Aware MoE
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLM
VLM
MoE
60
9
0
23 Aug 2023
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer
  with Mixture-of-View-Experts
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts
Wenyan Cong
Hanxue Liang
Peihao Wang
Zhiwen Fan
Tianlong Chen
M. Varma
Yi Wang
Zhangyang Wang
MoE
34
21
0
22 Aug 2023
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Yihua Zhang
Ruisi Cai
Tianlong Chen
Guanhua Zhang
Huan Zhang
Pin-Yu Chen
Shiyu Chang
Zhangyang Wang
Sijia Liu
MoE
AAML
OOD
34
16
0
19 Aug 2023
AST-MHSA : Code Summarization using Multi-Head Self-Attention
AST-MHSA : Code Summarization using Multi-Head Self-Attention
Y. Nagaraj
U. Gupta
18
1
0
10 Aug 2023
From Sparse to Soft Mixtures of Experts
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
121
114
0
02 Aug 2023
TaskExpert: Dynamically Assembling Multi-Task Representations with
  Memorial Mixture-of-Experts
TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts
Hanrong Ye
Dan Xu
MoE
42
26
0
28 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
22
41
0
12 Jul 2023
Attention Weighted Mixture of Experts with Contrastive Learning for
  Personalized Ranking in E-commerce
Attention Weighted Mixture of Experts with Contrastive Learning for Personalized Ranking in E-commerce
Juan Gong
Zhe Chen
Chao Ma
Zhuojian Xiao
Hong Wang
Guoyu Tang
Lin Liu
Sulong Xu
Bo Long
Yunjiang Jiang
16
4
0
08 Jun 2023
Neural Markov Jump Processes
Neural Markov Jump Processes
Patrick Seifner
Ramses J. Sanchez
BDL
35
7
0
31 May 2023
Lifting the Curse of Capacity Gap in Distilling Language Models
Lifting the Curse of Capacity Gap in Distilling Language Models
Chen Zhang
Yang Yang
Jiahao Liu
Jingang Wang
Yunsen Xian
Benyou Wang
Dawei Song
MoE
32
19
0
20 May 2023
Perpetual Humanoid Control for Real-time Simulated Avatars
Perpetual Humanoid Control for Real-time Simulated Avatars
Zhengyi Luo
Jinkun Cao
Alexander W. Winkler
Kris M. Kitani
Weipeng Xu
58
88
0
10 May 2023
Multi-Path Transformer is Better: A Case Study on Neural Machine
  Translation
Multi-Path Transformer is Better: A Case Study on Neural Machine Translation
Ye Lin
Shuhan Zhou
Yanyang Li
Anxiang Ma
Tong Xiao
Jingbo Zhu
35
0
0
10 May 2023
Towards Being Parameter-Efficient: A Stratified Sparsely Activated
  Transformer with Dynamic Capacity
Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity
Da Xu
Maha Elbayad
Kenton W. Murray
Jean Maillard
Vedanuj Goswami
MoE
47
3
0
03 May 2023
Conditional Adapters: Parameter-efficient Transfer Learning with Fast
  Inference
Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
Tao Lei
Junwen Bai
Siddhartha Brahma
Joshua Ainslie
Kenton Lee
...
Vincent Zhao
Yuexin Wu
Bo-wen Li
Yu Zhang
Ming-Wei Chang
BDL
AI4CE
30
54
0
11 Apr 2023
PoseFusion: Robust Object-in-Hand Pose Estimation with SelectLSTM
PoseFusion: Robust Object-in-Hand Pose Estimation with SelectLSTM
Yuyang Tu
Junnan Jiang
Shuang Li
Norman Hendrich
Miaopeng Li
Jianwei Zhang
24
7
0
10 Apr 2023
Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit
  Diversity Modeling
Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling
Haotao Wang
Ziyu Jiang
Yuning You
Yan Han
Gaowen Liu
Jayanth Srinivasa
Ramana Rao Kompella
Zhangyang Wang
26
28
0
06 Apr 2023
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
Hanlin Wang
Yilu Wu
Sheng Guo
Limin Wang
VGen
DiffM
73
30
0
26 Mar 2023
WM-MoE: Weather-aware Multi-scale Mixture-of-Experts for Blind Adverse
  Weather Removal
WM-MoE: Weather-aware Multi-scale Mixture-of-Experts for Blind Adverse Weather Removal
Yulin Luo
Rui Zhao
Xi Wei
Jinwei Chen
Yijie Lu
Shenghao Xie
Tianyu Wang
Ruiqin Xiong
Ming Lu
Shanghang Zhang
31
3
0
24 Mar 2023
Memorization Capacity of Neural Networks with Conditional Computation
Memorization Capacity of Neural Networks with Conditional Computation
Erdem Koyuncu
38
4
0
20 Mar 2023
CoLT5: Faster Long-Range Transformers with Conditional Computation
CoLT5: Faster Long-Range Transformers with Conditional Computation
Joshua Ainslie
Tao Lei
Michiel de Jong
Santiago Ontañón
Siddhartha Brahma
...
Mandy Guo
James Lee-Thorp
Yi Tay
Yun-hsuan Sung
Sumit Sanghai
LLMAG
33
63
0
17 Mar 2023
Adaptive Rotated Convolution for Rotated Object Detection
Adaptive Rotated Convolution for Rotated Object Detection
Yifan Pu
Yiru Wang
Zhuofan Xia
Yizeng Han
Yulin Wang
Weihao Gan
Zidong Wang
S. Song
Gao Huang
23
76
0
14 Mar 2023
Simultaneous Action Recognition and Human Whole-Body Motion and Dynamics
  Prediction from Wearable Sensors
Simultaneous Action Recognition and Human Whole-Body Motion and Dynamics Prediction from Wearable Sensors
Kourosh Darvish
S. Ivaldi
Daniele Pucci
AI4CE
28
2
0
14 Mar 2023
ViM: Vision Middleware for Unified Downstream Transferring
ViM: Vision Middleware for Unified Downstream Transferring
Yutong Feng
Biao Gong
Jianwen Jiang
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
32
1
0
13 Mar 2023
HiNet: Novel Multi-Scenario & Multi-Task Learning with Hierarchical
  Information Extraction
HiNet: Novel Multi-Scenario & Multi-Task Learning with Hierarchical Information Extraction
Jie Zhou
Xia Cao
Wenhao Li
Lin Bo
Kun Zhang
Chuan Luo
Qian Yu
29
24
0
10 Mar 2023
SMoA: Sparse Mixture of Adapters to Mitigate Multiple Dataset Biases
SMoA: Sparse Mixture of Adapters to Mitigate Multiple Dataset Biases
Yanchen Liu
Jing Yang
Yan Chen
Jing Liu
Huaqin Wu
MoE
47
2
0
28 Feb 2023
Previous
123456...8910
Next