The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax
Mimicry

The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

6 February 2024

Hermann Kumbong

Christopher Ré

Papers citing "The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry"

13 / 13 papers shown

Title
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression Guihong Li Mehdi Rezagholizadeh Mingyu Yang Vikram Appia Emad Barsoum VLM 55 0 0 14 Mar 2025
Attention Condensation via Sparsity Induced Regularized Training Eli Sason Darya Frolova Boris Nazarov Felix Goldberd 113 0 0 03 Mar 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures Disen Lan Weigao Sun Jiaxi Hu Jusen Du Yu-Xi Cheng 64 0 0 03 Mar 2025
PolaFormer: Polarity-aware Linear Attention for Vision Transformers Weikang Meng Yadan Luo Xin Li D. Jiang Zheng Zhang 82 0 0 25 Jan 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention Qiuhao Zeng Jerry Huang Peng Lu Gezheng Xu Boxing Chen Charles X. Ling Boyu Wang 45 1 0 24 Jan 2025
Tensor Product Attention Is All You Need Yifan Zhang Yifeng Liu Huizhuo Yuan Zhen Qin Yang Yuan Q. Gu Andrew Chi-Chih Yao 75 9 0 11 Jan 2025
HSR-Enhanced Sparse Attention Acceleration Bo Chen Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao-quan Song 84 18 0 14 Oct 2024
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models N. Jha Brandon Reagen OffRL AI4CE 28 0 0 12 Oct 2024
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity Mutian He Philip N. Garner 80 0 0 09 Oct 2024
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models Aviv Bick Kevin Y. Li Eric P. Xing J. Zico Kolter Albert Gu Mamba 43 24 0 19 Aug 2024
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights H. H. Mao 61 20 0 09 Oct 2022
On The Computational Complexity of Self-Attention Feyza Duman Keles Pruthuvi Maheshakya Wijewardena C. Hegde 63 107 0 11 Sep 2022
Emerging Properties in Self-Supervised Vision Transformers Mathilde Caron Hugo Touvron Ishan Misra Hervé Jégou Julien Mairal Piotr Bojanowski Armand Joulin 292 5,761 0 29 Apr 2021