ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2208.00579
  4. Cited By
Momentum Transformer: Closing the Performance Gap Between Self-attention
  and Its Linearization

Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization

1 August 2022
T. Nguyen
Richard G. Baraniuk
Robert M. Kirby
Stanley J. Osher
Bao Wang
ArXivPDFHTML

Papers citing "Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization"

6 / 6 papers shown
Title
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
41
2
0
02 Mar 2025
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
33
3
0
18 Oct 2024
Breaking the Attention Bottleneck
Breaking the Attention Bottleneck
Kalle Hilsenbek
86
0
0
16 Jun 2024
How Does Momentum Benefit Deep Neural Networks Architecture Design? A
  Few Case Studies
How Does Momentum Benefit Deep Neural Networks Architecture Design? A Few Case Studies
Bao Wang
Hedi Xia
T. Nguyen
Stanley Osher
AI4CE
37
10
0
13 Oct 2021
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
282
2,015
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
243
580
0
12 Mar 2020
1