Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.07987
Cited By
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
13 December 2023
Róbert Csordás
Piotr Piekos
Kazuki Irie
Jürgen Schmidhuber
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
6 / 6 papers shown
Title
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
94
1
0
01 May 2025
RouterKT: Mixture-of-Experts for Knowledge Tracing
Han Liao
Shuaishuai Zu
38
0
0
11 Apr 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
120
1
0
10 Mar 2025
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
274
3,000
0
22 Mar 2023
Mixture of Attention Heads: Selecting Attention Heads Per Token
Xiaofeng Zhang
Yikang Shen
Zeyu Huang
Jie Zhou
Wenge Rong
Zhang Xiong
MoE
99
42
0
11 Oct 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
244
458
0
24 Sep 2022
1