Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.03764
Cited By
On the Expressive Power of Self-Attention Matrices
7 June 2021
Valerii Likhosherstov
K. Choromanski
Adrian Weller
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Expressive Power of Self-Attention Matrices"
14 / 14 papers shown
Title
Attention Condensation via Sparsity Induced Regularized Training
Eli Sason
Darya Frolova
Boris Nazarov
Felix Goldberd
157
0
0
03 Mar 2025
How Smooth Is Attention?
Valérie Castin
Pierre Ablin
Gabriel Peyré
AAML
40
9
0
22 Dec 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
T. Kajitsuka
Issei Sato
29
16
0
26 Jul 2023
Self-attention Dual Embedding for Graphs with Heterophily
Yurui Lai
Taiyan Zhang
Rui Fan
GNN
29
0
0
28 May 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
M. Wang
Sijia Liu
Pin-Yu Chen
ViT
MLT
35
56
0
12 Feb 2023
Pure Transformers are Powerful Graph Learners
Jinwoo Kim
Tien Dat Nguyen
Seonwoo Min
Sungjun Cho
Moontae Lee
Honglak Lee
Seunghoon Hong
32
187
0
06 Jul 2022
Self-attention Presents Low-dimensional Knowledge Graph Embeddings for Link Prediction
Peyman Baghershahi
Reshad Hosseini
H. Moradi
26
52
0
20 Dec 2021
Can Vision Transformers Perform Convolution?
Shanda Li
Xiangning Chen
Di He
Cho-Jui Hsieh
ViT
24
19
0
02 Nov 2021
Inductive Biases and Variable Creation in Self-Attention Mechanisms
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Cyril Zhang
22
115
0
19 Oct 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,693
0
11 Feb 2021
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
267
1,812
0
14 Dec 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
251
2,012
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
238
579
0
12 Mar 2020
Structured adaptive and random spinners for fast machine learning computations
Mariusz Bojarski
A. Choromańska
K. Choromanski
Francois Fagan
Cédric Gouy-Pailler
Anne Morvan
Nourhan Sakr
Tamás Sarlós
Jamal Atif
25
35
0
19 Oct 2016
1