ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.01610
  4. Cited By
Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable
  Transformers

Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers

2 March 2023
Tianlong Chen
Zhenyu (Allen) Zhang
Ajay Jaiswal
Shiwei Liu
Zhangyang Wang
    MoE
ArXivPDFHTML

Papers citing "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers"

14 / 14 papers shown
Title
Improving Routing in Sparse Mixture of Experts with Graph of Tokens
Improving Routing in Sparse Mixture of Experts with Graph of Tokens
Tam Minh Nguyen
Ngoc N. Tran
Khai Nguyen
Richard G. Baraniuk
MoE
59
0
0
01 May 2025
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities
Raman Dutt
Harleen Hanspal
Guoxuan Xia
Petru-Daniel Tudosiu
Alexander Black
Yongxin Yang
Steven G. McDonagh
Sarah Parisot
MoE
38
0
0
28 Mar 2025
CAMEx: Curvature-aware Merging of Experts
CAMEx: Curvature-aware Merging of Experts
Dung V. Nguyen
Minh H. Nguyen
Luc Q. Nguyen
R. Teo
T. Nguyen
Linh Duy Tran
MoMe
73
2
0
26 Feb 2025
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing
Sagi Shaier
Francisco Pereira
K. Wense
Lawrence E Hunter
Matt Jones
MoE
41
0
0
10 Oct 2024
Layerwise Recurrent Router for Mixture-of-Experts
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
68
2
0
13 Aug 2024
Mixture of Low-rank Experts for Transferable AI-Generated Image
  Detection
Mixture of Low-rank Experts for Transferable AI-Generated Image Detection
Zihan Liu
Hanyi Wang
Yaoyu Kang
Shilin Wang
MoE
34
12
0
07 Apr 2024
SiRA: Sparse Mixture of Low Rank Adaptation
SiRA: Sparse Mixture of Low Rank Adaptation
Yun Zhu
Nevan Wichers
Chu-Cheng Lin
Xinyi Wang
Tianlong Chen
...
Han Lu
Canoee Liu
Liangchen Luo
Jindong Chen
Lei Meng
MoE
19
27
0
15 Nov 2023
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer
  with Mixture-of-View-Experts
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts
Wenyan Cong
Hanxue Liang
Peihao Wang
Zhiwen Fan
Tianlong Chen
M. Varma
Yi Wang
Zhangyang Wang
MoE
22
21
0
22 Aug 2023
Mixture-of-Experts with Expert Choice Routing
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
147
323
0
18 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Towards More Effective and Economic Sparsely-Activated Model
Towards More Effective and Economic Sparsely-Activated Model
Hao Jiang
Ke Zhan
Jianwei Qu
Yongkang Wu
Zhaoye Fei
...
Enrui Hu
Yinxia Zhang
Yantao Jia
Fan Yu
Zhao Cao
MoE
134
12
0
14 Oct 2021
Sequence Length is a Domain: Length-based Overfitting in Transformer
  Models
Sequence Length is a Domain: Length-based Overfitting in Transformer Models
Dusan Varis
Ondrej Bojar
49
56
0
15 Sep 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,424
0
23 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1