Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2208.00579
Cited By
Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization
1 August 2022
T. Nguyen
Richard G. Baraniuk
Robert M. Kirby
Stanley J. Osher
Bao Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization"
6 / 6 papers shown
Title
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
41
2
0
02 Mar 2025
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
33
3
0
18 Oct 2024
Breaking the Attention Bottleneck
Kalle Hilsenbek
86
0
0
16 Jun 2024
How Does Momentum Benefit Deep Neural Networks Architecture Design? A Few Case Studies
Bao Wang
Hedi Xia
T. Nguyen
Stanley Osher
AI4CE
37
10
0
13 Oct 2021
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
280
2,015
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
243
580
0
12 Mar 2020
1