ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.06899
  4. Cited By
Memory-efficient Transformers via Top-$k$ Attention

Memory-efficient Transformers via Top-kkk Attention

13 June 2021
Ankit Gupta
Guy Dar
Shaya Goodman
David Ciprut
Jonathan Berant
    MQ
ArXivPDFHTML

Papers citing "Memory-efficient Transformers via Top-$k$ Attention"

31 / 31 papers shown
Title
Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding
Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding
Konstantin Berestizshevsky
Renzo Andri
Lukas Cavigelli
80
1
0
12 Feb 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles X. Ling
Boyu Wang
47
1
0
24 Jan 2025
Attention Entropy is a Key Factor: An Analysis of Parallel Context
  Encoding with Full-attention-based Pre-trained Language Models
Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models
Zhisong Zhang
Yan Wang
Xinting Huang
Tianqing Fang
H. Zhang
Chenlong Deng
Shuaiyi Li
Dong Yu
80
2
0
21 Dec 2024
$k$NN Attention Demystified: A Theoretical Exploration for Scalable
  Transformers
kkkNN Attention Demystified: A Theoretical Exploration for Scalable Transformers
Themistoklis Haris
31
0
0
06 Nov 2024
LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy
  Attentions
LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentions
R. Kannan
Chiranjib Bhattacharyya
Praneeth Kacham
David P. Woodruff
23
0
0
07 Oct 2024
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Jian Chen
Vashisth Tiwari
Ranajoy Sadhukhan
Zhuoming Chen
Jinyuan Shi
Ian En-Hsu Yen
Ian En-Hsu Yen
Avner May
Tianqi Chen
Beidi Chen
LRM
31
22
0
20 Aug 2024
Pick of the Bunch: Detecting Infrared Small Targets Beyond Hit-Miss
  Trade-Offs via Selective Rank-Aware Attention
Pick of the Bunch: Detecting Infrared Small Targets Beyond Hit-Miss Trade-Offs via Selective Rank-Aware Attention
Yimian Dai
Peiwen Pan
Yulei Qian
Yuxuan Li
Xiang Li
Jian Yang
Huan Wan
20
8
0
07 Aug 2024
Sparser is Faster and Less is More: Efficient Sparse Attention for
  Long-Range Transformers
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
31
18
0
24 Jun 2024
Loki: Low-Rank Keys for Efficient Sparse Attention
Loki: Low-Rank Keys for Efficient Sparse Attention
Prajwal Singhania
Siddharth Singh
Shwai He
S. Feizi
A. Bhatele
32
13
0
04 Jun 2024
Extended Mind Transformers
Extended Mind Transformers
Phoebe Klett
Thomas Ahle
RALM
21
0
0
04 Jun 2024
MultiMax: Sparse and Multi-Modal Attention Learning
MultiMax: Sparse and Multi-Modal Attention Learning
Yuxuan Zhou
Mario Fritz
M. Keuper
40
1
0
03 Jun 2024
IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs
IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs
Yuzhen Mao
Martin Ester
Ke Li
30
6
0
05 May 2024
What makes Models Compositional? A Theoretical View: With Supplement
What makes Models Compositional? A Theoretical View: With Supplement
Parikshit Ram
Tim Klinger
Alexander G. Gray
CoGe
34
6
0
02 May 2024
LoMA: Lossless Compressed Memory Attention
LoMA: Lossless Compressed Memory Attention
Yumeng Wang
Zhenyang Xiao
14
3
0
16 Jan 2024
Towards Efficient Generative Large Language Model Serving: A Survey from
  Algorithms to Systems
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
61
76
0
23 Dec 2023
Masked Hard-Attention Transformers Recognize Exactly the Star-Free
  Languages
Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages
Andy Yang
David Chiang
Dana Angluin
28
14
0
21 Oct 2023
Consciousness-Inspired Spatio-Temporal Abstractions for Better
  Generalization in Reinforcement Learning
Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning
Mingde Zhao
Safa Alver
H. V. Seijen
Romain Laroche
Doina Precup
Yoshua Bengio
15
3
0
30 Sep 2023
Approximating ReLU on a Reduced Ring for Efficient MPC-based Private
  Inference
Approximating ReLU on a Reduced Ring for Efficient MPC-based Private Inference
Kiwan Maeng
G. E. Suh
30
2
0
09 Sep 2023
BiFormer: Vision Transformer with Bi-Level Routing Attention
BiFormer: Vision Transformer with Bi-Level Routing Attention
Lei Zhu
Xinjiang Wang
Zhanghan Ke
Wayne Zhang
Rynson W. H. Lau
126
480
0
15 Mar 2023
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in
  Transformers
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
Zong-xiao Li
Chong You
Srinadh Bhojanapalli
Daliang Li
A. S. Rawat
...
Kenneth Q Ye
Felix Chern
Felix X. Yu
Ruiqi Guo
Surinder Kumar
MoE
25
87
0
12 Oct 2022
Pretraining the Vision Transformer using self-supervised methods for
  vision based Deep Reinforcement Learning
Pretraining the Vision Transformer using self-supervised methods for vision based Deep Reinforcement Learning
Manuel Goulão
Arlindo L. Oliveira
ViT
25
6
0
22 Sep 2022
Treeformer: Dense Gradient Trees for Efficient Attention Computation
Treeformer: Dense Gradient Trees for Efficient Attention Computation
Lovish Madaan
Srinadh Bhojanapalli
Himanshu Jain
Prateek Jain
27
6
0
18 Aug 2022
Diagonal State Spaces are as Effective as Structured State Spaces
Diagonal State Spaces are as Effective as Structured State Spaces
Ankit Gupta
Albert Gu
Jonathan Berant
34
290
0
27 Mar 2022
Memorizing Transformers
Memorizing Transformers
Yuhuai Wu
M. Rabe
DeLesley S. Hutchins
Christian Szegedy
RALM
16
171
0
16 Mar 2022
Universal Hopfield Networks: A General Framework for Single-Shot
  Associative Memory Models
Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models
Beren Millidge
Tommaso Salvatori
Yuhang Song
Thomas Lukasiewicz
Rafal Bogacz
VLM
14
52
0
09 Feb 2022
On Learning the Transformer Kernel
On Learning the Transformer Kernel
Sankalan Pal Chowdhury
Adamos Solomou
Kumar Avinava Dubey
Mrinmaya Sachan
ViT
44
14
0
15 Oct 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
260
179
0
17 Feb 2021
Decision Machines: An Extension of Decision Trees
Decision Machines: An Extension of Decision Trees
Jinxiong Zhang
OffRL
14
0
0
27 Jan 2021
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
251
2,012
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
238
579
0
12 Mar 2020
Language Models as Knowledge Bases?
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
406
2,584
0
03 Sep 2019
1