Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.07012
Cited By
v1
v2 (latest)
Sparse Attention with Linear Units
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
14 April 2021
Biao Zhang
Ivan Titov
Rico Sennrich
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Sparse Attention with Linear Units"
27 / 27 papers shown
Title
KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference
H. Zhang
Chunwei Xia
Zheng Wang
SyDa
308
1
0
14 Nov 2025
Exploring Superposition and Interference in State-of-the-Art Low-Parameter Vision Models
Lilian Hollard
Lucas Mohimont
N. Gaveau
L. Steffenel
SupR
231
0
0
21 Jul 2025
Dual Attention Residual U-Net for Accurate Brain Ultrasound Segmentation in IVH Detection
Dan Yuan
Yi Feng
Ziyun Tang
549
0
0
23 May 2025
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
492
1
0
29 Mar 2025
Attention Condensation via Sparsity Induced Regularized Training
Eli Sason
Darya Frolova
Boris Nazarov
Felix Goldberd
1.0K
0
0
03 Mar 2025
Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding
Konstantin Berestizshevsky
Renzo Andri
Lukas Cavigelli
384
2
0
12 Feb 2025
Mixture of Attentions For Speculative Decoding
International Conference on Learning Representations (ICLR), 2024
Matthieu Zimmer
Milan Gritta
Gerasimos Lampouras
Haitham Bou Ammar
Jun Wang
314
12
0
04 Oct 2024
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Meng Wang
Yunzhi Yao
Ziwen Xu
Shuofei Qiao
Shumin Deng
...
Yong Jiang
Pengjun Xie
Fei Huang
Huajun Chen
Ningyu Zhang
313
59
0
22 Jul 2024
Loki: Low-Rank Keys for Efficient Sparse Attention
Prajwal Singhania
Siddharth Singh
Shwai He
Soheil Feizi
A. Bhatele
232
46
0
04 Jun 2024
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
332
293
0
22 Apr 2024
CCDSReFormer: Traffic Flow Prediction with a Criss-Crossed Dual-Stream Enhanced Rectified Transformer Model
Zhiqi Shao
Michael G. H. Bell
Ze Wang
Glenn Geers
Xusheng Yao
Junbin Gao
173
8
0
26 Mar 2024
AYDIV: Adaptable Yielding 3D Object Detection via Integrated Contextual Vision Transformer
Tanmoy Dam
Sanjay Bhargav Dharavath
Sameer Alam
Nimrod Lilith
Supriyo Chakraborty
Mir Feroskhan
221
4
0
12 Feb 2024
Unification of Symmetries Inside Neural Networks: Transformer, Feedforward and Neural ODE
Koji Hashimoto
Yuji Hirono
Akiyoshi Sannai
AI4CE
218
12
0
04 Feb 2024
A Cost-Efficient FPGA Implementation of Tiny Transformer Model using Neural ODE
Ikumi Okubo
Keisuke Sugiura
Hiroki Matsutani
212
2
0
05 Jan 2024
Low-latency Space-time Supersampling for Real-time Rendering
Ruian He
Shili Zhou
Yuqi Sun
Ri Cheng
Weimin Tan
Bo Yan
117
3
0
18 Dec 2023
Towards Equipping Transformer with the Ability of Systematic Compositionality
AAAI Conference on Artificial Intelligence (AAAI), 2023
Chen Huang
Peixin Qin
Wenqiang Lei
Jiancheng Lv
210
2
0
12 Dec 2023
Learning Section Weights for Multi-Label Document Classification
Maziar Moradi Fard
Paula Sorolla Bayod
Kiomars Motarjem
Mohammad Alian Nejadi
S. Akhondi
Camilo Thorne
167
0
0
26 Nov 2023
TransRadar: Adaptive-Directional Transformer for Real-Time Multi-View Radar Semantic Segmentation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Yahia Dalbah
Jean Lahoud
Hisham Cholakkal
217
13
0
03 Oct 2023
Learning Transformer Programs
Neural Information Processing Systems (NeurIPS), 2023
Dan Friedman
Alexander Wettig
Danqi Chen
247
47
0
01 Jun 2023
A Study on ReLU and Softmax in Transformer
Kai Shen
Junliang Guo
Xuejiao Tan
Siliang Tang
Rui Wang
Jiang Bian
193
70
0
13 Feb 2023
Abstractive Summarization Guided by Latent Hierarchical Document Structure
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yifu Qiu
Shay B. Cohen
206
13
0
17 Nov 2022
Leveraging commonsense for object localisation in partial scenes
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Francesco Giuliari
Geri Skenderi
Marco Cristani
Alessio Del Bue
Yiming Wang
187
3
0
01 Nov 2022
The Devil in Linear Transformer
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zhen Qin
Xiaodong Han
Weixuan Sun
Dongxu Li
Lingpeng Kong
Nick Barnes
Yiran Zhong
195
93
0
19 Oct 2022
Neural Architecture Search on Efficient Transformers and Beyond
Zexiang Liu
Dong Li
Kaiyue Lu
Zhen Qin
Weixuan Sun
Jiacheng Xu
Yiran Zhong
195
21
0
28 Jul 2022
Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers
International Conference on Machine Learning (ICML), 2022
Arda Sahiner
Tolga Ergen
Batu Mehmet Ozturkler
John M. Pauly
Morteza Mardani
Mert Pilanci
299
35
0
17 May 2022
Speeding Up Entmax
Maxat Tezekbayev
Vassilina Nikoulina
Matthias Gallé
Z. Assylbekov
MoE
99
3
0
12 Nov 2021
Predicting Attention Sparsity in Transformers
Marcos Vinícius Treviso
António Góis
Patrick Fernandes
E. Fonseca
André F. T. Martins
336
17
0
24 Sep 2021
1