Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2006.04862
Cited By
v1
v2 (latest)
O
(
n
)
O(n)
O
(
n
)
Connections are Expressive Enough: Universal Approximability of Sparse Transformers
8 June 2020
Chulhee Yun
Yin-Wen Chang
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"$O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers"
48 / 48 papers shown
Rectifying LLM Thought from Lens of Optimization
J. Liu
Hongwei Liu
Songyang Zhang
Kai Chen
LRM
128
1
0
01 Dec 2025
On the Capacity of Self-Attention
Micah Adler
193
0
0
26 Sep 2025
Transformers in Pseudo-Random Number Generation: A Dual Perspective on Theory and Practice
Ran Li
Lingshu Zeng
114
0
0
02 Aug 2025
BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers
Patrik Okanovic
Sameer Deshmukh
Grzegorz Kwa'sniewski
Yi Zhu
Haruto Fujii
...
Maciej Besta
Kentaro Katayama
Takumi Honda
Yusuke Nagasaka
Torsten Hoefler
204
0
0
03 Jul 2025
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
Hantao Yu
Josh Alman
232
0
0
13 Jun 2025
DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Hanzhi Zhang
Heng Fan
Kewei Sha
Yan Huang
Yunhe Feng
191
2
0
06 Jun 2025
Approximation Rate of the Transformer Architecture for Sequence Modeling
Neural Information Processing Systems (NeurIPS), 2023
Hao Jiang
Qianxiao Li
529
17
0
03 Jan 2025
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Guhao Feng
Kai-Bo Yang
Yuntian Gu
Xinyue Ai
Shengjie Luo
Jiacheng Sun
Di He
Hao Sun
Liwei Wang
LRM
315
13
0
17 Oct 2024
Snuffy: Efficient Whole Slide Image Classifier
European Conference on Computer Vision (ECCV), 2024
Hossein Jafarinia
Alireza Alipanah
Danial Hamdi
Saeed Razavi
Nahal Mirzaie
M. Rohban
3DH
330
7
0
15 Aug 2024
Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads
Ali Khaleghi Rahimian
Manish Kumar Govind
Subhajit Maity
Dominick Reilly
Christian Kummerle
Srijan Das
A. Dutta
237
1
0
27 Jun 2024
FrameQuant: Flexible Low-Bit Quantization for Transformers
International Conference on Machine Learning (ICML), 2024
Harshavardhan Adepu
Zhanpeng Zeng
Li Zhang
Vikas Singh
MQ
174
15
0
10 Mar 2024
Transformers are Expressive, But Are They Expressive Enough for Regression?
Swaroop Nath
H. Khadilkar
Pushpak Bhattacharyya
216
5
0
23 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
398
28
0
08 Feb 2024
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
Xindi Wang
Mahsa Salmani
Parsa Omidi
Xiangyu Ren
Mehdi Rezagholizadeh
A. Eshaghi
LRM
271
83
0
03 Feb 2024
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Yunpeng Huang
Jingwei Xu
Junyu Lai
Zixu Jiang
Taolue Chen
...
Xiaoxing Ma
Lijuan Yang
Zhou Xin
Shupeng Li
Penghao Zhao
LLMAG
KELM
370
101
0
21 Nov 2023
The Expressive Power of Low-Rank Adaptation
International Conference on Learning Representations (ICLR), 2023
Yuchen Zeng
Kangwook Lee
490
96
0
26 Oct 2023
On the Optimization and Generalization of Multi-head Attention
Puneesh Deora
Rouzbeh Ghaderi
Hossein Taheri
Christos Thrampoulidis
MLT
284
43
0
19 Oct 2023
Do Generative Large Language Models need billions of parameters?
Sia Gholami
Marwan Omar
188
27
0
12 Sep 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
International Conference on Learning Representations (ICLR), 2023
T. Kajitsuka
Issei Sato
456
29
0
26 Jul 2023
Trained Transformers Learn Linear Models In-Context
Journal of machine learning research (JMLR), 2023
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
412
281
0
16 Jun 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Neural Information Processing Systems (NeurIPS), 2023
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
367
70
0
25 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Neural Information Processing Systems (NeurIPS), 2023
Guhao Feng
Bohang Zhang
Yuntian Gu
Haotian Ye
Di He
Liwei Wang
LRM
656
356
0
24 May 2023
Sampled Transformer for Point Sets
Shidi Li
Christian J. Walder
Alexander Soen
Lexing Xie
Miaomiao Liu
3DPC
177
1
0
28 Feb 2023
A Brief Survey on the Approximation Theory for Sequence Modelling
Journal of Machine Learning (JML), 2023
Hao Jiang
Qianxiao Li
Zhong Li
Shida Wang
AI4TS
260
14
0
27 Feb 2023
One Fits All:Power General Time Series Analysis by Pretrained LM
Neural Information Processing Systems (NeurIPS), 2023
Tian Zhou
Peisong Niu
Qingsong Wen
Liang Sun
Rong Jin
AI4TS
501
740
0
23 Feb 2023
Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
K. Choromanski
Shanda Li
Valerii Likhosherstov
Kumar Avinava Dubey
Shengjie Luo
Di He
Yiming Yang
Tamás Sarlós
Thomas Weingarten
Adrian Weller
324
10
0
03 Feb 2023
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost
Neural Information Processing Systems (NeurIPS), 2022
Sungjun Cho
Seonwoo Min
Jinwoo Kim
Moontae Lee
Honglak Lee
Seunghoon Hong
228
4
0
27 Oct 2022
Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for Long Sequences
AAAI Conference on Artificial Intelligence (AAAI), 2022
Aosong Feng
Irene Li
Yuang Jiang
Rex Ying
207
18
0
21 Oct 2022
Treeformer: Dense Gradient Trees for Efficient Attention Computation
International Conference on Learning Representations (ICLR), 2022
Lovish Madaan
Srinadh Bhojanapalli
Himanshu Jain
Prateek Jain
170
9
0
18 Aug 2022
Your Transformer May Not be as Powerful as You Expect
Neural Information Processing Systems (NeurIPS), 2022
Shengjie Luo
Shanda Li
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
Di He
312
66
0
26 May 2022
Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers
International Conference on Machine Learning (ICML), 2022
Arda Sahiner
Tolga Ergen
Batu Mehmet Ozturkler
John M. Pauly
Morteza Mardani
Mert Pilanci
314
36
0
17 May 2022
Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Andreas Grivas
Nikolay Bogoychev
Adam Lopez
183
13
0
12 Mar 2022
Attention Enables Zero Approximation Error
Zhiying Fang
Yidong Ouyang
Ding-Xuan Zhou
Guang Cheng
159
5
0
24 Feb 2022
Revisiting Over-smoothing in BERT from the Perspective of Graph
International Conference on Learning Representations (ICLR), 2022
Han Shi
Jiahui Gao
Hang Xu
Xiaodan Liang
Zhenguo Li
Lingpeng Kong
Stephen M. S. Lee
James T. Kwok
216
90
0
17 Feb 2022
Can Vision Transformers Perform Convolution?
Shanda Li
Xiangning Chen
Di He
Cho-Jui Hsieh
ViT
210
22
0
02 Nov 2021
Leveraging redundancy in attention with Reuse Transformers
Srinadh Bhojanapalli
Ayan Chakrabarti
Andreas Veit
Michal Lukasik
Himanshu Jain
Frederick Liu
Yin-Wen Chang
Sanjiv Kumar
155
37
0
13 Oct 2021
Universal Approximation Under Constraints is Possible with Transformers
Anastasis Kratsios
Behnoosh Zamanlooy
Tianlin Liu
Ivan Dokmanić
308
33
0
07 Oct 2021
Continuous Streaming Multi-Talker ASR with Dual-path Transducers
Desh Raj
Liang Lu
Zhuo Chen
Yashesh Gaur
Jinyu Li
118
19
0
17 Sep 2021
MATE: Multi-view Attention for Table Transformer Efficiency
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Julian Martin Eisenschlos
Maharshi Gor
Thomas Müller
William W. Cohen
LMTD
185
100
0
09 Sep 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
338
93
0
12 Jul 2021
Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation
Srinadh Bhojanapalli
Ayan Chakrabarti
Himanshu Jain
Sanjiv Kumar
Michal Lukasik
Andreas Veit
110
10
0
16 Jun 2021
Rethinking Graph Transformers with Spectral Attention
Neural Information Processing Systems (NeurIPS), 2021
Devin Kreuzer
Dominique Beaini
William L. Hamilton
Vincent Létourneau
Prudencio Tossou
485
691
0
07 Jun 2021
On the Expressive Power of Self-Attention Matrices
Valerii Likhosherstov
K. Choromanski
Adrian Weller
350
43
0
07 Jun 2021
Learning and Generalization in RNNs
Neural Information Processing Systems (NeurIPS), 2021
A. Panigrahi
Navin Goyal
233
3
0
31 May 2021
SparseBERT: Rethinking the Importance Analysis in Self-attention
International Conference on Machine Learning (ICML), 2021
Han Shi
Jiahui Gao
Xiaozhe Ren
Hang Xu
Xiaodan Liang
Zhenguo Li
James T. Kwok
197
59
0
25 Feb 2021
A Survey on Visual Transformer
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Kai Han
Yunhe Wang
Hanting Chen
Xinghao Chen
Jianyuan Guo
...
Chunjing Xu
Yixing Xu
Zhaohui Yang
Yiman Zhang
Dacheng Tao
ViT
1.1K
3,095
0
23 Dec 2020
Efficient Transformers: A Survey
ACM Computing Surveys (ACM CSUR), 2020
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
866
1,362
0
14 Sep 2020
Big Bird: Transformers for Longer Sequences
Neural Information Processing Systems (NeurIPS), 2020
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
1.3K
2,532
0
28 Jul 2020
1
Page 1 of 1