ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.13214
  4. Cited By
Fast Attention Requires Bounded Entries
v1v2 (latest)

Fast Attention Requires Bounded Entries

26 February 2023
Josh Alman
Zhao Song
ArXiv (abs)PDFHTML

Papers citing "Fast Attention Requires Bounded Entries"

50 / 71 papers shown
Title
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
Hantao Yu
Josh Alman
24
0
0
13 Jun 2025
SLICK: Selective Localization and Instance Calibration for Knowledge-Enhanced Car Damage Segmentation in Automotive Insurance
SLICK: Selective Localization and Instance Calibration for Knowledge-Enhanced Car Damage Segmentation in Automotive Insurance
Teerapong Panboonyuen
150
0
0
12 Jun 2025
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions
Jerry Yao-Chieh Hu
Xiwen Zhang
Maojiang Su
Zhao Song
Han Liu
MLT
240
1
0
26 May 2025
Only Large Weights (And Not Skip Connections) Can Prevent the Perils of Rank Collapse
Only Large Weights (And Not Skip Connections) Can Prevent the Perils of Rank Collapse
Josh Alman
Zhao Song
101
2
0
22 May 2025
Subquadratic Algorithms and Hardness for Attention with Any Temperature
Subquadratic Algorithms and Hardness for Attention with Any Temperature
Shreya Gupta
Boyang Huang
Barna Saha
Yinzhan Xu
Christopher Ye
56
2
0
20 May 2025
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform
Josh Alman
Zhao Song
103
16
0
17 May 2025
Theoretical Foundation of Flow-Based Time Series Generation: Provable Approximation, Generalization, and Efficiency
Theoretical Foundation of Flow-Based Time Series Generation: Provable Approximation, Generalization, and Efficiency
Jiangxuan Long
Zhao Song
Chiwun Yang
AI4TS
480
0
0
18 Mar 2025
Scaling Law Phenomena Across Regression Paradigms: Multiple and Kernel Approaches
Yifang Chen
Xuyang Guo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
103
3
0
03 Mar 2025
Attention Condensation via Sparsity Induced Regularized Training
Eli Sason
Darya Frolova
Boris Nazarov
Felix Goldberd
512
0
0
03 Mar 2025
Real-Time Personalization with Simple Transformers
Lin An
Andrew A. Li
Vaisnavi Nemala
Gabriel Visotsky
87
0
0
01 Mar 2025
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time?
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time?
Chenyang Li
Yingyu Liang
Zhenmei Shi
Zhao Song
74
5
0
24 Feb 2025
On Computational Limits of FlowAR Models: Expressivity and Efficiency
On Computational Limits of FlowAR Models: Expressivity and Efficiency
Chengyue Gong
Yekun Ke
Xiaoyu Li
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
153
4
0
23 Feb 2025
Compression Barriers for Autoregressive Transformers
Compression Barriers for Autoregressive Transformers
Themistoklis Haris
Krzysztof Onak
77
1
0
21 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
Yufa Zhou
179
19
0
21 Feb 2025
Low-Rank Thinning
Low-Rank Thinning
Annabelle Michael Carrell
Albert Gong
Abhishek Shetty
Raaz Dwivedi
Lester W. Mackey
108
0
0
17 Feb 2025
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
Yang Cao
Zhao Song
Chiwun Yang
VGen
144
3
0
01 Feb 2025
Fast Gradient Computation for RoPE Attention in Almost Linear Time
Fast Gradient Computation for RoPE Attention in Almost Linear Time
Yifang Chen
Jiayan Huo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
170
14
0
03 Jan 2025
$k$NN Attention Demystified: A Theoretical Exploration for Scalable
  Transformers
kkkNN Attention Demystified: A Theoretical Exploration for Scalable Transformers
Themistoklis Haris
51
0
0
06 Nov 2024
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding
Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding
Yi Liu
Chengxin Li
Shoukun Xu
Jiawei Han
ViT
64
6
0
19 Oct 2024
Advancing the Understanding of Fixed Point Iterations in Deep Neural
  Networks: A Detailed Analytical Study
Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study
Yekun Ke
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
101
3
0
15 Oct 2024
HSR-Enhanced Sparse Attention Acceleration
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
259
22
0
14 Oct 2024
Fine-grained Attention I/O Complexity: Comprehensive Analysis for
  Backward Passes
Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
Yufa Zhou
94
18
0
12 Oct 2024
LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy
  Attentions
LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentions
R. Kannan
Chiranjib Bhattacharyya
Praneeth Kacham
David P. Woodruff
91
1
0
07 Oct 2024
Fundamental Limitations on Subquadratic Alternatives to Transformers
Fundamental Limitations on Subquadratic Alternatives to Transformers
Josh Alman
Hantao Yu
110
4
0
05 Oct 2024
Differentially Private Kernel Density Estimation
Differentially Private Kernel Density Estimation
Erzhi Liu
Jerry Yao-Chieh Hu
Alex Reneau
Zhao Song
Han Liu
143
3
0
03 Sep 2024
A Tighter Complexity Analysis of SparseGPT
A Tighter Complexity Analysis of SparseGPT
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
132
24
0
22 Aug 2024
M5: A Whole Genome Bacterial Encoder at Single Nucleotide Resolution
M5: A Whole Genome Bacterial Encoder at Single Nucleotide Resolution
Agust Egilsson
76
0
0
03 Jul 2024
When big data actually are low-rank, or entrywise approximation of certain function-generated matrices
When big data actually are low-rank, or entrywise approximation of certain function-generated matrices
Stanislav Budzinskiy
143
2
0
03 Jul 2024
On the Role of Attention Masks and LayerNorm in Transformers
On the Role of Attention Masks and LayerNorm in Transformers
Xinyi Wu
A. Ajorlou
Yifei Wang
Stefanie Jegelka
Ali Jadbabaie
98
12
0
29 May 2024
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Yeqi Gao
Yuzhou Gu
Zhao Song
71
0
0
09 May 2024
Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
Jerry Yao-Chieh Hu
Pei-Hsuan Chang
Haozheng Luo
Hong-Yu Chen
Weijian Li
Wei-Po Wang
Han Liu
98
29
0
04 Apr 2024
Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models
Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models
Dennis Wu
Jerry Yao-Chieh Hu
Teng-Yun Hsiao
Han Liu
106
34
0
04 Apr 2024
Do Efficient Transformers Really Save Computation?
Do Efficient Transformers Really Save Computation?
Kai-Bo Yang
Jan Ackermann
Zhenyu He
Guhao Feng
Bohang Zhang
Yunzhen Feng
Qiwei Ye
Di He
Liwei Wang
102
19
0
21 Feb 2024
The I/O Complexity of Attention, or How Optimal is Flash Attention?
The I/O Complexity of Attention, or How Optimal is Flash Attention?
Barna Saha
Christopher Ye
58
5
0
12 Feb 2024
SubGen: Token Generation in Sublinear Time and Memory
SubGen: Token Generation in Sublinear Time and Memory
A. Zandieh
Insu Han
Vahab Mirrokni
Amin Karbasi
74
18
0
08 Feb 2024
The Fine-Grained Complexity of Gradient Computation for Training Large
  Language Models
The Fine-Grained Complexity of Gradient Computation for Training Large Language Models
Josh Alman
Zhao Song
70
15
0
07 Feb 2024
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax
  Mimicry
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Michael Zhang
Kush S. Bhatia
Hermann Kumbong
Christopher Ré
80
54
0
06 Feb 2024
One Pass Streaming Algorithm for Super Long Token Attention
  Approximation in Sublinear Space
One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space
Raghav Addanki
Chenyang Li
Zhao Song
Chiwun Yang
92
3
0
24 Nov 2023
Fast Heavy Inner Product Identification Between Weights and Inputs in
  Neural Network Training
Fast Heavy Inner Product Identification Between Weights and Inputs in Neural Network Training
Lianke Qin
Saayan Mitra
Zhao Song
Yuanyuan Yang
Dinesh Manocha
75
0
0
19 Nov 2023
Hardness of Low Rank Approximation of Entrywise Transformed Matrix
  Products
Hardness of Low Rank Approximation of Entrywise Transformed Matrix Products
Tamás Sarlós
Xingyou Song
David P. Woodruff
Qiuyi
Qiuyi Zhang
80
4
0
03 Nov 2023
The Expressibility of Polynomial based Attention Scheme
The Expressibility of Polynomial based Attention Scheme
Zhao Song
Guangyi Xu
Junze Yin
95
5
0
30 Oct 2023
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Zichang Liu
Jue Wang
Tri Dao
Dinesh Manocha
Binhang Yuan
...
Anshumali Shrivastava
Ce Zhang
Yuandong Tian
Christopher Ré
Beidi Chen
BDL
123
221
0
26 Oct 2023
HyperAttention: Long-context Attention in Near-Linear Time
HyperAttention: Long-context Attention in Near-Linear Time
Insu Han
Rajesh Jayaram
Amin Karbasi
Vahab Mirrokni
David P. Woodruff
A. Zandieh
111
74
0
09 Oct 2023
How to Capture Higher-order Correlations? Generalizing Matrix Softmax
  Attention to Kronecker Computation
How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation
Josh Alman
Zhao Song
127
37
0
06 Oct 2023
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
Praneeth Kacham
Vahab Mirrokni
Peilin Zhong
95
14
0
02 Oct 2023
Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph
  Neural Network?
Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?
Lianke Qin
Zhao Song
Baocheng Sun
96
7
0
14 Sep 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM
  Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
114
29
0
14 Sep 2023
Online Adaptive Mahalanobis Distance Estimation
Online Adaptive Mahalanobis Distance Estimation
Lianke Qin
Aravind Reddy
Zhao Song
70
1
0
02 Sep 2023
Solving Attention Kernel Regression Problem via Pre-conditioner
Solving Attention Kernel Regression Problem via Pre-conditioner
Zhao Song
Junze Yin
Licheng Zhang
84
15
0
28 Aug 2023
How to Protect Copyright Data in Optimization of Large Language Models?
How to Protect Copyright Data in Optimization of Large Language Models?
T. Chu
Zhao Song
Chiwun Yang
83
31
0
23 Aug 2023
12
Next