v1v2 (latest)

Sparse Attention with Linear Units

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021

14 April 2021

Papers citing "Sparse Attention with Linear Units"

27 / 27 papers shown

KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference

349

14 Nov 2025

Exploring Superposition and Interference in State-of-the-Art Low-Parameter Vision Models

243

21 Jul 2025

Dual Attention Residual U-Net for Accurate Brain Ultrasound Segmentation in IVH Detection

Dan Yuan

Yi Feng

Ziyun Tang

574

23 May 2025

TRA: Better Length Generalisation with Threshold Relative Attention

547

29 Mar 2025

Attention Condensation via Sparsity Induced Regularized Training

1.0K

03 Mar 2025

Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding

Konstantin Berestizshevsky

Renzo Andri

Lukas Cavigelli

420

12 Feb 2025

Mixture of Attentions For Speculative DecodingInternational Conference on Learning Representations (ICLR), 2024

Matthieu Zimmer

Milan Gritta

Gerasimos Lampouras

Haitham Bou Ammar

Jun Wang

340

04 Oct 2024

Knowledge Mechanisms in Large Language Models: A Survey and Perspective

Shumin Deng

...

Yong Jiang

Pengjun Xie

Fei Huang

Huajun Chen

Ningyu Zhang

332

22 Jul 2024

Loki: Low-Rank Keys for Efficient Sparse Attention

263

04 Jun 2024

Mechanistic Interpretability for AI Safety -- A Review

Leonard Bereska

E. Gavves

AI4CE

335

298

22 Apr 2024

CCDSReFormer: Traffic Flow Prediction with a Criss-Crossed Dual-Stream Enhanced Rectified Transformer Model

Ze Wang

Junbin Gao

208

26 Mar 2024

AYDIV: Adaptable Yielding 3D Object Detection via Integrated Contextual Vision Transformer

Tanmoy Dam

Sanjay Bhargav Dharavath

232

12 Feb 2024

Unification of Symmetries Inside Neural Networks: Transformer, Feedforward and Neural ODE

256

04 Feb 2024

A Cost-Efficient FPGA Implementation of Tiny Transformer Model using Neural ODE

Ikumi Okubo

Keisuke Sugiura

Hiroki Matsutani

240

05 Jan 2024

Low-latency Space-time Supersampling for Real-time Rendering

Ruian He

Weimin Tan

137

18 Dec 2023

Towards Equipping Transformer with the Ability of Systematic CompositionalityAAAI Conference on Artificial Intelligence (AAAI), 2023

237

12 Dec 2023

Learning Section Weights for Multi-Label Document Classification

Maziar Moradi Fard

Paula Sorolla Bayod

Kiomars Motarjem

Mohammad Alian Nejadi

S. Akhondi

Camilo Thorne

187

26 Nov 2023

TransRadar: Adaptive-Directional Transformer for Real-Time Multi-View Radar Semantic SegmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Yahia Dalbah

Jean Lahoud

Hisham Cholakkal

247

03 Oct 2023

Learning Transformer ProgramsNeural Information Processing Systems (NeurIPS), 2023

Dan Friedman

Alexander Wettig

Danqi Chen

292

01 Jun 2023

A Study on ReLU and Softmax in Transformer

Junliang Guo

Jiang Bian

222

13 Feb 2023

Abstractive Summarization Guided by Latent Hierarchical Document StructureConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Yifu Qiu

Shay B. Cohen

220

17 Nov 2022

Leveraging commonsense for object localisation in partial scenesIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

203

01 Nov 2022

The Devil in Linear TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Zhen Qin

Lingpeng Kong

210

19 Oct 2022

Neural Architecture Search on Efficient Transformers and Beyond

Zhen Qin

208

28 Jul 2022

Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision TransformersInternational Conference on Machine Learning (ICML), 2022

Arda Sahiner

Tolga Ergen

Batu Mehmet Ozturkler

John M. Pauly

Morteza Mardani

Mert Pilanci

311

17 May 2022

128

12 Nov 2021

Predicting Attention Sparsity in Transformers

Marcos Vinícius Treviso

António Góis

Patrick Fernandes

E. Fonseca

André F. T. Martins

370

24 Sep 2021