SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

13 December 2023

Papers citing "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"

6 / 6 papers shown

Title
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing Piotr Piekos Róbert Csordás Jürgen Schmidhuber MoE VLM 94 1 0 01 May 2025
RouterKT: Mixture-of-Experts for Knowledge Tracing Han Liao Shuaishuai Zu 38 0 0 11 Apr 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications Siyuan Mu Sen Lin MoE 120 1 0 10 Mar 2025
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Sébastien Bubeck Varun Chandrasekaran Ronen Eldan J. Gehrke Eric Horvitz ... Scott M. Lundberg Harsha Nori Hamid Palangi Marco Tulio Ribeiro Yi Zhang ELM AI4MH AI4CE ALM 274 3,000 0 22 Mar 2023
Mixture of Attention Heads: Selecting Attention Heads Per Token Xiaofeng Zhang Yikang Shen Zeyu Huang Jie Zhou Wenge Rong Zhang Xiong MoE 99 42 0 11 Oct 2022
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 244 458 0 24 Sep 2022