Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.04347
Cited By
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
6 February 2024
Michael Zhang
Kush S. Bhatia
Hermann Kumbong
Christopher Ré
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry"
13 / 13 papers shown
Title
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
Guihong Li
Mehdi Rezagholizadeh
Mingyu Yang
Vikram Appia
Emad Barsoum
VLM
55
0
0
14 Mar 2025
Attention Condensation via Sparsity Induced Regularized Training
Eli Sason
Darya Frolova
Boris Nazarov
Felix Goldberd
113
0
0
03 Mar 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan
Weigao Sun
Jiaxi Hu
Jusen Du
Yu-Xi Cheng
64
0
0
03 Mar 2025
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
Weikang Meng
Yadan Luo
Xin Li
D. Jiang
Zheng Zhang
82
0
0
25 Jan 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles X. Ling
Boyu Wang
45
1
0
24 Jan 2025
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
75
9
0
11 Jan 2025
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao-quan Song
84
18
0
14 Oct 2024
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
N. Jha
Brandon Reagen
OffRL
AI4CE
28
0
0
12 Oct 2024
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
80
0
0
09 Oct 2024
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Aviv Bick
Kevin Y. Li
Eric P. Xing
J. Zico Kolter
Albert Gu
Mamba
43
24
0
19 Aug 2024
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
H. H. Mao
61
20
0
09 Oct 2022
On The Computational Complexity of Self-Attention
Feyza Duman Keles
Pruthuvi Maheshakya Wijewardena
C. Hegde
63
107
0
11 Sep 2022
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
292
5,761
0
29 Apr 2021
1