Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.06899
Cited By
Memory-efficient Transformers via Top-
k
k
k
Attention
13 June 2021
Ankit Gupta
Guy Dar
Shaya Goodman
David Ciprut
Jonathan Berant
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Memory-efficient Transformers via Top-$k$ Attention"
31 / 31 papers shown
Title
Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding
Konstantin Berestizshevsky
Renzo Andri
Lukas Cavigelli
80
1
0
12 Feb 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles X. Ling
Boyu Wang
47
1
0
24 Jan 2025
Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models
Zhisong Zhang
Yan Wang
Xinting Huang
Tianqing Fang
H. Zhang
Chenlong Deng
Shuaiyi Li
Dong Yu
80
2
0
21 Dec 2024
k
k
k
NN Attention Demystified: A Theoretical Exploration for Scalable Transformers
Themistoklis Haris
31
0
0
06 Nov 2024
LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentions
R. Kannan
Chiranjib Bhattacharyya
Praneeth Kacham
David P. Woodruff
23
0
0
07 Oct 2024
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Jian Chen
Vashisth Tiwari
Ranajoy Sadhukhan
Zhuoming Chen
Jinyuan Shi
Ian En-Hsu Yen
Ian En-Hsu Yen
Avner May
Tianqi Chen
Beidi Chen
LRM
31
22
0
20 Aug 2024
Pick of the Bunch: Detecting Infrared Small Targets Beyond Hit-Miss Trade-Offs via Selective Rank-Aware Attention
Yimian Dai
Peiwen Pan
Yulei Qian
Yuxuan Li
Xiang Li
Jian Yang
Huan Wan
20
8
0
07 Aug 2024
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
31
18
0
24 Jun 2024
Loki: Low-Rank Keys for Efficient Sparse Attention
Prajwal Singhania
Siddharth Singh
Shwai He
S. Feizi
A. Bhatele
32
13
0
04 Jun 2024
Extended Mind Transformers
Phoebe Klett
Thomas Ahle
RALM
21
0
0
04 Jun 2024
MultiMax: Sparse and Multi-Modal Attention Learning
Yuxuan Zhou
Mario Fritz
M. Keuper
40
1
0
03 Jun 2024
IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs
Yuzhen Mao
Martin Ester
Ke Li
30
6
0
05 May 2024
What makes Models Compositional? A Theoretical View: With Supplement
Parikshit Ram
Tim Klinger
Alexander G. Gray
CoGe
34
6
0
02 May 2024
LoMA: Lossless Compressed Memory Attention
Yumeng Wang
Zhenyang Xiao
14
3
0
16 Jan 2024
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
61
76
0
23 Dec 2023
Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages
Andy Yang
David Chiang
Dana Angluin
28
14
0
21 Oct 2023
Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning
Mingde Zhao
Safa Alver
H. V. Seijen
Romain Laroche
Doina Precup
Yoshua Bengio
15
3
0
30 Sep 2023
Approximating ReLU on a Reduced Ring for Efficient MPC-based Private Inference
Kiwan Maeng
G. E. Suh
30
2
0
09 Sep 2023
BiFormer: Vision Transformer with Bi-Level Routing Attention
Lei Zhu
Xinjiang Wang
Zhanghan Ke
Wayne Zhang
Rynson W. H. Lau
126
480
0
15 Mar 2023
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
Zong-xiao Li
Chong You
Srinadh Bhojanapalli
Daliang Li
A. S. Rawat
...
Kenneth Q Ye
Felix Chern
Felix X. Yu
Ruiqi Guo
Surinder Kumar
MoE
25
87
0
12 Oct 2022
Pretraining the Vision Transformer using self-supervised methods for vision based Deep Reinforcement Learning
Manuel Goulão
Arlindo L. Oliveira
ViT
25
6
0
22 Sep 2022
Treeformer: Dense Gradient Trees for Efficient Attention Computation
Lovish Madaan
Srinadh Bhojanapalli
Himanshu Jain
Prateek Jain
27
6
0
18 Aug 2022
Diagonal State Spaces are as Effective as Structured State Spaces
Ankit Gupta
Albert Gu
Jonathan Berant
34
290
0
27 Mar 2022
Memorizing Transformers
Yuhuai Wu
M. Rabe
DeLesley S. Hutchins
Christian Szegedy
RALM
16
171
0
16 Mar 2022
Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models
Beren Millidge
Tommaso Salvatori
Yuhang Song
Thomas Lukasiewicz
Rafal Bogacz
VLM
14
52
0
09 Feb 2022
On Learning the Transformer Kernel
Sankalan Pal Chowdhury
Adamos Solomou
Kumar Avinava Dubey
Mrinmaya Sachan
ViT
44
14
0
15 Oct 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
260
179
0
17 Feb 2021
Decision Machines: An Extension of Decision Trees
Jinxiong Zhang
OffRL
14
0
0
27 Jan 2021
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
251
2,012
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
238
579
0
12 Mar 2020
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
406
2,584
0
03 Sep 2019
1