Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.10837
Cited By
Approximating Two-Layer Feedforward Networks for Efficient Transformers
16 October 2023
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Approximating Two-Layer Feedforward Networks for Efficient Transformers"
5 / 5 papers shown
Title
Improving Routing in Sparse Mixture of Experts with Graph of Tokens
Tam Minh Nguyen
Ngoc N. Tran
Khai Nguyen
Richard G. Baraniuk
MoE
59
0
0
01 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
94
1
0
01 May 2025
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
120
316
0
21 Sep 2022
Unbiased Gradient Estimation with Balanced Assignments for Mixtures of Experts
W. Kool
Chris J. Maddison
A. Mnih
26
10
0
24 Sep 2021
A Decomposable Attention Model for Natural Language Inference
Ankur P. Parikh
Oscar Täckström
Dipanjan Das
Jakob Uszkoreit
196
1,363
0
06 Jun 2016
1