Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.07034
Cited By
How Does Momentum Benefit Deep Neural Networks Architecture Design? A Few Case Studies
13 October 2021
Bao Wang
Hedi Xia
T. Nguyen
Stanley Osher
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How Does Momentum Benefit Deep Neural Networks Architecture Design? A Few Case Studies"
8 / 8 papers shown
Title
CAMEx: Curvature-aware Merging of Experts
Dung V. Nguyen
Minh H. Nguyen
Luc Q. Nguyen
R. Teo
T. Nguyen
Linh Duy Tran
MoMe
73
2
0
26 Feb 2025
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
R. Teo
Tan M. Nguyen
MoE
31
3
0
18 Oct 2024
Physics-informed Machine Learning for Calibrating Macroscopic Traffic Flow Models
Yu Tang
Li Jin
K. Ozbay
AI4CE
18
1
0
12 Jul 2023
Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization
T. Nguyen
Richard G. Baraniuk
Robert M. Kirby
Stanley J. Osher
Bao Wang
21
9
0
01 Aug 2022
Learning POD of Complex Dynamics Using Heavy-ball Neural ODEs
Justin Baker
E. Cherkaev
A. Narayan
Bao Wang
AI4CE
9
4
0
24 Feb 2022
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
249
2,009
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
238
578
0
12 Mar 2020
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights
Weijie Su
Stephen P. Boyd
Emmanuel J. Candes
97
1,150
0
04 Mar 2015
1