ModuleFormer: Modularity Emerges from Mixture-of-Experts

ModuleFormer: Modularity Emerges from Mixture-of-Experts

7 June 2023

Chuang Gan

Papers citing "ModuleFormer: Modularity Emerges from Mixture-of-Experts"

4 / 4 papers shown

Title
Mixture of Attention Heads: Selecting Attention Heads Per Token Xiaofeng Zhang Yikang Shen Zeyu Huang Jie Zhou Wenge Rong Zhang Xiong MoE 93 42 0 11 Oct 2022
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 234 690 0 27 Aug 2021
PonderNet: Learning to Ponder Andrea Banino Jan Balaguer Charles Blundell PINN AIMat 90 80 0 12 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 242 1,977 0 31 Dec 2020