ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.04862
  4. Cited By
$O(n)$ Connections are Expressive Enough: Universal Approximability of
  Sparse Transformers
v1v2 (latest)

O(n)O(n)O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers

8 June 2020
Chulhee Yun
Yin-Wen Chang
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
ArXiv (abs)PDFHTML

Papers citing "$O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers"

48 / 48 papers shown
Rectifying LLM Thought from Lens of Optimization
Rectifying LLM Thought from Lens of Optimization
J. Liu
Hongwei Liu
Songyang Zhang
Kai Chen
LRM
128
1
0
01 Dec 2025
On the Capacity of Self-Attention
On the Capacity of Self-Attention
Micah Adler
193
0
0
26 Sep 2025
Transformers in Pseudo-Random Number Generation: A Dual Perspective on Theory and Practice
Transformers in Pseudo-Random Number Generation: A Dual Perspective on Theory and Practice
Ran Li
Lingshu Zeng
115
0
0
02 Aug 2025
BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers
BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers
Patrik Okanovic
Sameer Deshmukh
Grzegorz Kwa'sniewski
Yi Zhu
Haruto Fujii
...
Maciej Besta
Kentaro Katayama
Takumi Honda
Yusuke Nagasaka
Torsten Hoefler
204
0
0
03 Jul 2025
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
Hantao Yu
Josh Alman
232
0
0
13 Jun 2025
DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration
DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference AccelerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Hanzhi Zhang
Heng Fan
Kewei Sha
Yan Huang
Yunhe Feng
191
2
0
06 Jun 2025
Approximation Rate of the Transformer Architecture for Sequence Modeling
Approximation Rate of the Transformer Architecture for Sequence ModelingNeural Information Processing Systems (NeurIPS), 2023
Hao Jiang
Qianxiao Li
529
17
0
03 Jan 2025
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs
How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Guhao Feng
Kai-Bo Yang
Yuntian Gu
Xinyue Ai
Shengjie Luo
Jiacheng Sun
Di He
Hao Sun
Liwei Wang
LRM
315
13
0
17 Oct 2024
Snuffy: Efficient Whole Slide Image Classifier
Snuffy: Efficient Whole Slide Image ClassifierEuropean Conference on Computer Vision (ECCV), 2024
Hossein Jafarinia
Alireza Alipanah
Danial Hamdi
Saeed Razavi
Nahal Mirzaie
M. Rohban
3DH
330
7
0
15 Aug 2024
Fibottention: Inceptive Visual Representation Learning with Diverse
  Attention Across Heads
Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads
Ali Khaleghi Rahimian
Manish Kumar Govind
Subhajit Maity
Dominick Reilly
Christian Kummerle
Srijan Das
A. Dutta
237
1
0
27 Jun 2024
FrameQuant: Flexible Low-Bit Quantization for Transformers
FrameQuant: Flexible Low-Bit Quantization for TransformersInternational Conference on Machine Learning (ICML), 2024
Harshavardhan Adepu
Zhanpeng Zeng
Li Zhang
Vikas Singh
MQ
174
15
0
10 Mar 2024
Transformers are Expressive, But Are They Expressive Enough for
  Regression?
Transformers are Expressive, But Are They Expressive Enough for Regression?
Swaroop Nath
H. Khadilkar
Pushpak Bhattacharyya
216
5
0
23 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
398
28
0
08 Feb 2024
Beyond the Limits: A Survey of Techniques to Extend the Context Length
  in Large Language Models
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
Xindi Wang
Mahsa Salmani
Parsa Omidi
Xiangyu Ren
Mehdi Rezagholizadeh
A. Eshaghi
LRM
271
83
0
03 Feb 2024
Advancing Transformer Architecture in Long-Context Large Language
  Models: A Comprehensive Survey
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Yunpeng Huang
Jingwei Xu
Junyu Lai
Zixu Jiang
Taolue Chen
...
Xiaoxing Ma
Lijuan Yang
Zhou Xin
Shupeng Li
Penghao Zhao
LLMAGKELM
370
101
0
21 Nov 2023
The Expressive Power of Low-Rank Adaptation
The Expressive Power of Low-Rank AdaptationInternational Conference on Learning Representations (ICLR), 2023
Yuchen Zeng
Kangwook Lee
490
96
0
26 Oct 2023
On the Optimization and Generalization of Multi-head Attention
On the Optimization and Generalization of Multi-head Attention
Puneesh Deora
Rouzbeh Ghaderi
Hossein Taheri
Christos Thrampoulidis
MLT
284
43
0
19 Oct 2023
Do Generative Large Language Models need billions of parameters?
Do Generative Large Language Models need billions of parameters?
Sia Gholami
Marwan Omar
188
27
0
12 Sep 2023
Are Transformers with One Layer Self-Attention Using Low-Rank Weight
  Matrices Universal Approximators?
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?International Conference on Learning Representations (ICLR), 2023
T. Kajitsuka
Issei Sato
456
29
0
26 Jul 2023
Trained Transformers Learn Linear Models In-Context
Trained Transformers Learn Linear Models In-ContextJournal of machine learning research (JMLR), 2023
Ruiqi Zhang
Spencer Frei
Peter L. Bartlett
412
281
0
16 Jun 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
  Transformers
Dynamic Context Pruning for Efficient and Interpretable Autoregressive TransformersNeural Information Processing Systems (NeurIPS), 2023
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
367
70
0
25 May 2023
Towards Revealing the Mystery behind Chain of Thought: A Theoretical
  Perspective
Towards Revealing the Mystery behind Chain of Thought: A Theoretical PerspectiveNeural Information Processing Systems (NeurIPS), 2023
Guhao Feng
Bohang Zhang
Yuntian Gu
Haotian Ye
Di He
Liwei Wang
LRM
656
356
0
24 May 2023
Sampled Transformer for Point Sets
Sampled Transformer for Point Sets
Shidi Li
Christian J. Walder
Alexander Soen
Lexing Xie
Miaomiao Liu
3DPC
177
1
0
28 Feb 2023
A Brief Survey on the Approximation Theory for Sequence Modelling
A Brief Survey on the Approximation Theory for Sequence ModellingJournal of Machine Learning (JML), 2023
Hao Jiang
Qianxiao Li
Zhong Li
Shida Wang
AI4TS
260
14
0
27 Feb 2023
One Fits All:Power General Time Series Analysis by Pretrained LM
One Fits All:Power General Time Series Analysis by Pretrained LMNeural Information Processing Systems (NeurIPS), 2023
Tian Zhou
Peisong Niu
Qingsong Wen
Liang Sun
Rong Jin
AI4TS
501
740
0
23 Feb 2023
Learning a Fourier Transform for Linear Relative Positional Encodings in
  Transformers
Learning a Fourier Transform for Linear Relative Positional Encodings in TransformersInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023
K. Choromanski
Shanda Li
Valerii Likhosherstov
Kumar Avinava Dubey
Shengjie Luo
Di He
Yiming Yang
Tamás Sarlós
Thomas Weingarten
Adrian Weller
324
10
0
03 Feb 2023
Transformers meet Stochastic Block Models: Attention with Data-Adaptive
  Sparsity and Cost
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and CostNeural Information Processing Systems (NeurIPS), 2022
Sungjun Cho
Seonwoo Min
Jinwoo Kim
Moontae Lee
Honglak Lee
Seunghoon Hong
228
4
0
27 Oct 2022
Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for
  Long Sequences
Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for Long SequencesAAAI Conference on Artificial Intelligence (AAAI), 2022
Aosong Feng
Irene Li
Yuang Jiang
Rex Ying
207
18
0
21 Oct 2022
Treeformer: Dense Gradient Trees for Efficient Attention Computation
Treeformer: Dense Gradient Trees for Efficient Attention ComputationInternational Conference on Learning Representations (ICLR), 2022
Lovish Madaan
Srinadh Bhojanapalli
Himanshu Jain
Prateek Jain
170
9
0
18 Aug 2022
Your Transformer May Not be as Powerful as You Expect
Your Transformer May Not be as Powerful as You ExpectNeural Information Processing Systems (NeurIPS), 2022
Shengjie Luo
Shanda Li
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
Di He
312
66
0
26 May 2022
Unraveling Attention via Convex Duality: Analysis and Interpretations of
  Vision Transformers
Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision TransformersInternational Conference on Machine Learning (ICML), 2022
Arda Sahiner
Tolga Ergen
Batu Mehmet Ozturkler
John M. Pauly
Morteza Mardani
Mert Pilanci
314
36
0
17 May 2022
Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in
  Practice
Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in PracticeAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Andreas Grivas
Nikolay Bogoychev
Adam Lopez
189
13
0
12 Mar 2022
Attention Enables Zero Approximation Error
Attention Enables Zero Approximation Error
Zhiying Fang
Yidong Ouyang
Ding-Xuan Zhou
Guang Cheng
159
5
0
24 Feb 2022
Revisiting Over-smoothing in BERT from the Perspective of Graph
Revisiting Over-smoothing in BERT from the Perspective of GraphInternational Conference on Learning Representations (ICLR), 2022
Han Shi
Jiahui Gao
Hang Xu
Xiaodan Liang
Zhenguo Li
Lingpeng Kong
Stephen M. S. Lee
James T. Kwok
216
90
0
17 Feb 2022
Can Vision Transformers Perform Convolution?
Can Vision Transformers Perform Convolution?
Shanda Li
Xiangning Chen
Di He
Cho-Jui Hsieh
ViT
210
22
0
02 Nov 2021
Leveraging redundancy in attention with Reuse Transformers
Leveraging redundancy in attention with Reuse Transformers
Srinadh Bhojanapalli
Ayan Chakrabarti
Andreas Veit
Michal Lukasik
Himanshu Jain
Frederick Liu
Yin-Wen Chang
Sanjiv Kumar
155
37
0
13 Oct 2021
Universal Approximation Under Constraints is Possible with Transformers
Universal Approximation Under Constraints is Possible with Transformers
Anastasis Kratsios
Behnoosh Zamanlooy
Tianlin Liu
Ivan Dokmanić
308
33
0
07 Oct 2021
Continuous Streaming Multi-Talker ASR with Dual-path Transducers
Continuous Streaming Multi-Talker ASR with Dual-path Transducers
Desh Raj
Liang Lu
Zhuo Chen
Yashesh Gaur
Jinyu Li
118
19
0
17 Sep 2021
MATE: Multi-view Attention for Table Transformer Efficiency
MATE: Multi-view Attention for Table Transformer EfficiencyConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Julian Martin Eisenschlos
Maharshi Gor
Thomas Müller
William W. Cohen
LMTD
185
100
0
09 Sep 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
338
93
0
12 Jul 2021
Eigen Analysis of Self-Attention and its Reconstruction from Partial
  Computation
Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation
Srinadh Bhojanapalli
Ayan Chakrabarti
Himanshu Jain
Sanjiv Kumar
Michal Lukasik
Andreas Veit
110
10
0
16 Jun 2021
Rethinking Graph Transformers with Spectral Attention
Rethinking Graph Transformers with Spectral AttentionNeural Information Processing Systems (NeurIPS), 2021
Devin Kreuzer
Dominique Beaini
William L. Hamilton
Vincent Létourneau
Prudencio Tossou
485
691
0
07 Jun 2021
On the Expressive Power of Self-Attention Matrices
On the Expressive Power of Self-Attention Matrices
Valerii Likhosherstov
K. Choromanski
Adrian Weller
350
43
0
07 Jun 2021
Learning and Generalization in RNNs
Learning and Generalization in RNNsNeural Information Processing Systems (NeurIPS), 2021
A. Panigrahi
Navin Goyal
233
3
0
31 May 2021
SparseBERT: Rethinking the Importance Analysis in Self-attention
SparseBERT: Rethinking the Importance Analysis in Self-attentionInternational Conference on Machine Learning (ICML), 2021
Han Shi
Jiahui Gao
Xiaozhe Ren
Hang Xu
Xiaodan Liang
Zhenguo Li
James T. Kwok
197
59
0
25 Feb 2021
A Survey on Visual Transformer
A Survey on Visual TransformerIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Kai Han
Yunhe Wang
Hanting Chen
Xinghao Chen
Jianyuan Guo
...
Chunjing Xu
Yixing Xu
Zhaohui Yang
Yiman Zhang
Dacheng Tao
ViT
1.1K
3,095
0
23 Dec 2020
Efficient Transformers: A Survey
Efficient Transformers: A SurveyACM Computing Surveys (ACM CSUR), 2020
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
866
1,362
0
14 Sep 2020
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer SequencesNeural Information Processing Systems (NeurIPS), 2020
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
1.3K
2,532
0
28 Jul 2020
1
Page 1 of 1