ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.04640
  4. Cited By
ModuleFormer: Modularity Emerges from Mixture-of-Experts

ModuleFormer: Modularity Emerges from Mixture-of-Experts

7 June 2023
Yikang Shen
Zheyu Zhang
Tianyou Cao
Shawn Tan
Zhenfang Chen
Chuang Gan
    KELM
    MoE
ArXivPDFHTML

Papers citing "ModuleFormer: Modularity Emerges from Mixture-of-Experts"

4 / 4 papers shown
Title
Mixture of Attention Heads: Selecting Attention Heads Per Token
Mixture of Attention Heads: Selecting Attention Heads Per Token
Xiaofeng Zhang
Yikang Shen
Zeyu Huang
Jie Zhou
Wenge Rong
Zhang Xiong
MoE
93
42
0
11 Oct 2022
Train Short, Test Long: Attention with Linear Biases Enables Input
  Length Extrapolation
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
234
690
0
27 Aug 2021
PonderNet: Learning to Ponder
PonderNet: Learning to Ponder
Andrea Banino
Jan Balaguer
Charles Blundell
PINN
AIMat
90
80
0
12 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
242
1,977
0
31 Dec 2020
1