Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.08678
Cited By
Improving Transformers with Probabilistic Attention Keys
16 October 2021
Tam Nguyen
T. Nguyen
Dung D. Le
Duy Khuong Nguyen
Viet-Anh Tran
Richard G. Baraniuk
Nhat Ho
Stanley J. Osher
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improving Transformers with Probabilistic Attention Keys"
10 / 10 papers shown
Title
Generalization Guarantees for Multi-View Representation Learning and Application to Regularization via Gaussian Product Mixture Prior
Milad Sefidgaran
Abdellatif Zaidi
Piotr Krasnowski
44
0
0
25 Apr 2025
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
37
2
0
02 Mar 2025
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors
Milad Sefidgaran
A. Zaidi
Piotr Krasnowski
77
1
0
21 Feb 2025
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
22
39
0
07 Apr 2023
Beyond EM Algorithm on Over-specified Two-Component Location-Scale Gaussian Mixtures
Tongzheng Ren
Fuheng Cui
Sujay Sanghavi
Nhat Ho
39
3
0
23 May 2022
An Exponentially Increasing Step-size for Parameter Estimation in Statistical Models
Nhat Ho
Tongzheng Ren
Sujay Sanghavi
Purnamrita Sarkar
Rachel A. Ward
23
3
0
16 May 2022
Architecture Agnostic Federated Learning for Neural Networks
Disha Makhija
Xing Han
Nhat Ho
Joydeep Ghosh
FedML
13
40
0
15 Feb 2022
How Does Momentum Benefit Deep Neural Networks Architecture Design? A Few Case Studies
Bao Wang
Hedi Xia
T. Nguyen
Stanley Osher
AI4CE
26
10
0
13 Oct 2021
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
234
578
0
12 Mar 2020
A Decomposable Attention Model for Natural Language Inference
Ankur P. Parikh
Oscar Täckström
Dipanjan Das
Jakob Uszkoreit
190
1,358
0
06 Jun 2016
1