v1v2 (latest)

Fast Attention Requires Bounded Entries

Neural Information Processing Systems (NeurIPS), 2023

26 February 2023

Josh Alman

Zhao Song

ArXiv (abs)PDF HTML

Papers citing "Fast Attention Requires Bounded Entries"

50 / 78 papers shown

Title
Your Vision-Language Model Can't Even Count to 20: Exposing the Failures of VLMs in Compositional Counting Xuyang Guo Zekai Huang Zhenmei Shi Zhao Song Jiahao Zhang CoGe VLM 118 0 0 06 Oct 2025
Towards Sampling Data Structures for Tensor Products in Turnstile Streams Zhao Song Shenghao Xie Samson Zhou 68 0 0 04 Oct 2025
Even Faster Kernel Matrix Linear Algebra via Density Estimation Rikhav Shah Sandeep Silwal Haike Xu 24 0 0 02 Oct 2025
Fast attention mechanisms: a tale of parallelism Jingwen Liu Hantao Yu Clayton Sanford Alexandr Andoni Daniel J. Hsu 56 0 0 10 Sep 2025
Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions Xuyang Guo Zekai Huang Zhao Song Jiahao Zhang LRM 44 3 0 16 Aug 2025
Towards High-Order Mean Flow Generative Models: Feasibility, Expressivity, and Provably Efficient Criteria Yang Cao Yubin Chen Zhao Song Jiahao Zhang 63 4 0 09 Aug 2025
Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity AssumptionsInternational Conference on Learning Representations (ICLR), 2025 Piotr Indyk Michael Kapralov Kshiteej Sheth Tal Wagner 69 1 0 31 Jul 2025
Two Heads Are Better than One: Simulating Large Transformers with Small Ones Hantao Yu Josh Alman 152 0 0 13 Jun 2025
SLICK: Selective Localization and Instance Calibration for Knowledge-Enhanced Car Damage Segmentation in Automotive Insurance Teerapong Panboonyuen 240 0 0 12 Jun 2025
Minimalist Softmax Attention Provably Learns Constrained Boolean Functions Jerry Yao-Chieh Hu Xiwen Zhang Maojiang Su Zhao Song Han Liu MLT 363 4 0 26 May 2025
Only Large Weights (And Not Skip Connections) Can Prevent the Perils of Rank Collapse Josh Alman Zhao Song 223 9 0 22 May 2025
Subquadratic Algorithms and Hardness for Attention with Any Temperature Shreya Gupta Boyang Huang Barna Saha Yinzhan Xu Christopher Ye 167 2 0 20 May 2025
Fast RoPE Attention: Combining the Polynomial Method and Fast Fourier Transform Josh Alman Zhao Song 192 23 0 17 May 2025
Theoretical Foundation of Flow-Based Time Series Generation: Provable Approximation, Generalization, and Efficiency Jiangxuan Long Zhao Song Chiwun Yang AI4TS 740 2 0 18 Mar 2025
Attention Condensation via Sparsity Induced Regularized Training Eli Sason Darya Frolova Boris Nazarov Felix Goldberd 800 0 0 03 Mar 2025
Scaling Law Phenomena Across Regression Paradigms: Multiple and Kernel Approaches Yifang Chen Xuyang Guo Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao Song 162 3 0 03 Mar 2025
Near-Optimal Real-Time Personalization with Simple Transformers Lin An Andrew A. Li Vaisnavi Nemala Gabriel Visotsky 104 0 0 01 Mar 2025
When Can We Solve the Weighted Low Rank Approximation Problem in Truly Subquadratic Time?International Conference on Artificial Intelligence and Statistics (AISTATS), 2025 Chenyang Li Yingyu Liang Zhenmei Shi Zhao Song 136 5 0 24 Feb 2025
On Computational Limits of FlowAR Models: Expressivity and Efficiency Chengyue Gong Yekun Ke Xiaoyu Li Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao Song 248 9 0 23 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable ComputersInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024 Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao Song Yufa Zhou 296 21 0 21 Feb 2025
Compression Barriers for Autoregressive Transformers Themistoklis Haris Krzysztof Onak 127 1 0 21 Feb 2025
Low-Rank Thinning Annabelle Michael Carrell Albert Gong Abhishek Shetty Raaz Dwivedi Lester W. Mackey 312 0 0 17 Feb 2025
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation Yang Cao Zhao Song Chiwun Yang VGen 315 7 0 01 Feb 2025
Fast Gradient Computation for RoPE Attention in Almost Linear Time Yifang Chen Jiayan Huo Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao Song 225 19 0 03 Jan 2025
$k$ NN Attention Demystified: A Theoretical Exploration for Scalable Transformers Themistoklis Haris 156 0 0 06 Nov 2024
Part-Whole Relational Fusion Towards Multi-Modal Scene UnderstandingInternational Journal of Computer Vision (IJCV), 2024 Yi Liu Chengxin Li Shoukun Xu Jiawei Han ViT 143 14 0 19 Oct 2024
Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study Yekun Ke Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao Song 121 5 0 15 Oct 2024
HSR-Enhanced Sparse Attention Acceleration Bo Chen Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao Song 377 24 0 14 Oct 2024
Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao Song Yufa Zhou 155 19 0 12 Oct 2024
LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy AttentionsInternational Conference on Learning Representations (ICLR), 2024 R. Kannan Chiranjib Bhattacharyya Praneeth Kacham David P. Woodruff 150 1 0 07 Oct 2024
Fundamental Limitations on Subquadratic Alternatives to TransformersInternational Conference on Learning Representations (ICLR), 2024 Josh Alman Hantao Yu 276 6 0 05 Oct 2024
Differentially Private Kernel Density Estimation Erzhi Liu Jerry Yao-Chieh Hu Alex Reneau Zhao Song Han Liu 261 3 0 03 Sep 2024
A Tighter Complexity Analysis of SparseGPT Xiaoyu Li Yingyu Liang Zhenmei Shi Zhao Song 212 25 0 22 Aug 2024
M5: A Whole Genome Bacterial Encoder at Single Nucleotide Resolution Agust Egilsson 99 0 0 03 Jul 2024
When big data actually are low-rank, or entrywise approximation of certain function-generated matrices Stanislav Budzinskiy 357 4 0 03 Jul 2024
On the Role of Attention Masks and LayerNorm in Transformers Xinyi Wu A. Ajorlou Yifei Wang Stefanie Jegelka Ali Jadbabaie 157 21 0 29 May 2024
Binary Hypothesis Testing for Softmax Models and Leverage Score Models Yeqi Gao Yuzhou Gu Zhao Song 211 1 0 09 May 2024
Outlier-Efficient Hopfield Layers for Large Transformer-Based Models Jerry Yao-Chieh Hu Pei-Hsuan Chang Haozheng Luo Hong-Yu Chen Weijian Li Wei-Po Wang Han Liu 146 35 0 04 Apr 2024
Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models Dennis Wu Jerry Yao-Chieh Hu Teng-Yun Hsiao Han Liu 188 37 0 04 Apr 2024
Do Efficient Transformers Really Save Computation? Kai-Bo Yang Jan Ackermann Zhenyu He Guhao Feng Bohang Zhang Yunzhen Feng Qiwei Ye Di He Liwei Wang 158 22 0 21 Feb 2024
The I/O Complexity of Attention, or How Optimal is Flash Attention? Barna Saha Christopher Ye 87 5 0 12 Feb 2024
SubGen: Token Generation in Sublinear Time and Memory A. Zandieh Insu Han Vahab Mirrokni Amin Karbasi 94 18 0 08 Feb 2024
The Fine-Grained Complexity of Gradient Computation for Training Large Language Models Josh Alman Zhao Song 117 22 0 07 Feb 2024
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry Michael Zhang Kush S. Bhatia Hermann Kumbong Christopher Ré 137 76 0 06 Feb 2024
One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space Raghav Addanki Chenyang Li Zhao Song Chiwun Yang 198 3 0 24 Nov 2023
Fast Heavy Inner Product Identification Between Weights and Inputs in Neural Network Training Lianke Qin Saayan Mitra Zhao Song Yuanyuan Yang Wanrong Zhu 130 0 0 19 Nov 2023
Hardness of Low Rank Approximation of Entrywise Transformed Matrix ProductsNeural Information Processing Systems (NeurIPS), 2023 Tamás Sarlós Xingyou Song David P. Woodruff Qiuyi Qiuyi Zhang 133 5 0 03 Nov 2023
The Expressibility of Polynomial based Attention Scheme Zhao Song Guangyi Xu Junze Yin 174 7 0 30 Oct 2023
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference TimeInternational Conference on Machine Learning (ICML), 2023 Zichang Liu Jue Wang Tri Dao Wanrong Zhu Binhang Yuan ... Anshumali Shrivastava Ce Zhang Yuandong Tian Christopher Ré Beidi Chen BDL 200 254 0 26 Oct 2023
HyperAttention: Long-context Attention in Near-Linear TimeInternational Conference on Learning Representations (ICLR), 2023 Insu Han Rajesh Jayaram Amin Karbasi Vahab Mirrokni David P. Woodruff A. Zandieh 229 87 0 09 Oct 2023

All Papers

Fast Attention Requires Bounded Entries

Papers citing "Fast Attention Requires Bounded Entries"