Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.07012
Cited By
Sparse Attention with Linear Units
14 April 2021
Biao Zhang
Ivan Titov
Rico Sennrich
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sparse Attention with Linear Units"
10 / 10 papers shown
Title
Attention Condensation via Sparsity Induced Regularized Training
Eli Sason
Darya Frolova
Boris Nazarov
Felix Goldberd
183
0
0
03 Mar 2025
Mixture of Attentions For Speculative Decoding
Matthieu Zimmer
Milan Gritta
Gerasimos Lampouras
Haitham Bou Ammar
Jun Wang
76
4
0
04 Oct 2024
Loki: Low-Rank Keys for Efficient Sparse Attention
Prajwal Singhania
Siddharth Singh
Shwai He
S. Feizi
A. Bhatele
34
13
0
04 Jun 2024
Towards Equipping Transformer with the Ability of Systematic Compositionality
Chen Huang
Peixin Qin
Wenqiang Lei
Jiancheng Lv
27
1
0
12 Dec 2023
A Study on ReLU and Softmax in Transformer
Kai Shen
Junliang Guo
Xuejiao Tan
Siliang Tang
Rui Wang
Jiang Bian
24
53
0
13 Feb 2023
The Devil in Linear Transformer
Zhen Qin
Xiaodong Han
Weixuan Sun
Dongxu Li
Lingpeng Kong
Nick Barnes
Yiran Zhong
36
70
0
19 Oct 2022
Neural Architecture Search on Efficient Transformers and Beyond
Zexiang Liu
Dong Li
Kaiyue Lu
Zhen Qin
Weixuan Sun
Jiacheng Xu
Yiran Zhong
35
19
0
28 Jul 2022
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
252
580
0
12 Mar 2020
Text Summarization with Pretrained Encoders
Yang Liu
Mirella Lapata
MILM
258
1,432
0
22 Aug 2019
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
218
7,926
0
17 Aug 2015
1