Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.08553
Cited By
Improving Transformers with Dynamically Composable Multi-Head Attention
14 May 2024
Da Xiao
Qingye Meng
Shengping Li
Xingyuan Yuan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improving Transformers with Dynamically Composable Multi-Head Attention"
4 / 4 papers shown
Title
Mixture of Attention Heads: Selecting Attention Heads Per Token
Xiaofeng Zhang
Yikang Shen
Zeyu Huang
Jie Zhou
Wenge Rong
Zhang Xiong
MoE
93
42
0
11 Oct 2022
Transformer Quality in Linear Time
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
71
220
0
21 Feb 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,977
0
31 Dec 2020
Talking-Heads Attention
Noam M. Shazeer
Zhenzhong Lan
Youlong Cheng
Nan Ding
L. Hou
89
79
0
05 Mar 2020
1