Sparse Attention with Linear Units

14 April 2021

Papers citing "Sparse Attention with Linear Units"

10 / 10 papers shown

Title
Attention Condensation via Sparsity Induced Regularized Training Eli Sason Darya Frolova Boris Nazarov Felix Goldberd 183 0 0 03 Mar 2025
Mixture of Attentions For Speculative Decoding Matthieu Zimmer Milan Gritta Gerasimos Lampouras Haitham Bou Ammar Jun Wang 76 4 0 04 Oct 2024
Loki: Low-Rank Keys for Efficient Sparse Attention Prajwal Singhania Siddharth Singh Shwai He S. Feizi A. Bhatele 34 13 0 04 Jun 2024
Towards Equipping Transformer with the Ability of Systematic Compositionality Chen Huang Peixin Qin Wenqiang Lei Jiancheng Lv 27 1 0 12 Dec 2023
A Study on ReLU and Softmax in Transformer Kai Shen Junliang Guo Xuejiao Tan Siliang Tang Rui Wang Jiang Bian 24 53 0 13 Feb 2023
The Devil in Linear Transformer Zhen Qin Xiaodong Han Weixuan Sun Dongxu Li Lingpeng Kong Nick Barnes Yiran Zhong 36 70 0 19 Oct 2022
Neural Architecture Search on Efficient Transformers and Beyond Zexiang Liu Dong Li Kaiyue Lu Zhen Qin Weixuan Sun Jiacheng Xu Yiran Zhong 35 19 0 28 Jul 2022
Efficient Content-Based Sparse Attention with Routing Transformers Aurko Roy M. Saffar Ashish Vaswani David Grangier MoE 252 580 0 12 Mar 2020
Text Summarization with Pretrained Encoders Yang Liu Mirella Lapata MILM 258 1,432 0 22 Aug 2019
Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu H. Pham Christopher D. Manning 218 7,926 0 17 Aug 2015