ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.04825
  4. Cited By
Fast Transformers with Clustered Attention
v1v2 (latest)

Fast Transformers with Clustered Attention

9 July 2020
Apoorv Vyas
Angelos Katharopoulos
Franccois Fleuret
ArXiv (abs)PDFHTML

Papers citing "Fast Transformers with Clustered Attention"

50 / 58 papers shown
Title
CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution
Xin Liu
Jie Liu
J. Tang
Gangshan Wu
SupRViT
87
0
0
10 Mar 2025
Rethinking Transformer for Long Contextual Histopathology Whole Slide
  Image Analysis
Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Pingyi Chen
Zhongyi Shui
Chenglu Zhu
Lin Yang
MedIm
99
5
0
18 Oct 2024
ENACT: Entropy-based Clustering of Attention Input for Reducing the Computational Needs of Object Detection Transformers
ENACT: Entropy-based Clustering of Attention Input for Reducing the Computational Needs of Object Detection Transformers
Giorgos Savathrakis
Antonis Argyros
ViT
41
0
0
11 Sep 2024
CLIP-Decoder : ZeroShot Multilabel Classification using Multimodal CLIP
  Aligned Representation
CLIP-Decoder : ZeroShot Multilabel Classification using Multimodal CLIP Aligned Representation
Muhammad Ali
Salman Khan
VLM
133
15
0
21 Jun 2024
Computation and Parameter Efficient Multi-Modal Fusion Transformer for
  Cued Speech Recognition
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition
Lei Liu
Li Liu
Haizhou Li
82
7
0
31 Jan 2024
Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for
  Long Sequences
Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences
Yanming Kang
Giang Tran
H. Sterck
100
5
0
18 Oct 2023
MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers
MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers
Jakob Drachmann Havtorn
Amelie Royer
Tijmen Blankevoort
B. Bejnordi
81
8
0
05 Jul 2023
The emergence of clusters in self-attention dynamics
The emergence of clusters in self-attention dynamics
Borjan Geshkovski
Cyril Letrouit
Yury Polyanskiy
Philippe Rigollet
111
56
0
09 May 2023
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Amanda Bertsch
Uri Alon
Graham Neubig
Matthew R. Gormley
RALM
211
130
0
02 May 2023
Efficient Long Sequence Modeling via State Space Augmented Transformer
Efficient Long Sequence Modeling via State Space Augmented Transformer
Simiao Zuo
Xiaodong Liu
Jian Jiao
Denis Xavier Charles
Eren Manavoglu
Tuo Zhao
Jianfeng Gao
175
37
0
15 Dec 2022
A Survey on Artificial Intelligence for Music Generation: Agents,
  Domains and Perspectives
A Survey on Artificial Intelligence for Music Generation: Agents, Domains and Perspectives
Carlos Hernandez-Olivan
Javier Hernandez-Olivan
J. R. Beltrán
MGen
93
7
0
25 Oct 2022
Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for
  Long Sequences
Diffuser: Efficient Transformers with Multi-hop Attention Diffusion for Long Sequences
Aosong Feng
Irene Li
Yuang Jiang
Rex Ying
79
18
0
21 Oct 2022
Museformer: Transformer with Fine- and Coarse-Grained Attention for
  Music Generation
Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation
Botao Yu
Peiling Lu
Rui Wang
Wei Hu
Xu Tan
Wei Ye
Shikun Zhang
Tao Qin
Tie-Yan Liu
MGen
104
60
0
19 Oct 2022
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
Jinchao Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Dianbo Sui
3DV
195
9
0
14 Oct 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without
  Fine-tuning
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng Zhang
Chao Zhang
Hanhua Hu
117
31
0
03 Oct 2022
TagRec++: Hierarchical Label Aware Attention Network for Question
  Categorization
TagRec++: Hierarchical Label Aware Attention Network for Question Categorization
Venktesh V
Mukesh Mohania
Vikram Goyal
BDL
60
2
0
10 Aug 2022
Momentum Transformer: Closing the Performance Gap Between Self-attention
  and Its Linearization
Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization
T. Nguyen
Richard G. Baraniuk
Robert M. Kirby
Stanley J. Osher
Bao Wang
127
9
0
01 Aug 2022
Attention and Self-Attention in Random Forests
Attention and Self-Attention in Random Forests
Lev V. Utkin
A. Konstantinov
73
7
0
09 Jul 2022
FL-Tuning: Layer Tuning for Feed-Forward Network in Transformer
FL-Tuning: Layer Tuning for Feed-Forward Network in Transformer
Jingping Liu
Yuqiu Song
Kui Xue
Hongli Sun
Chao Wang
Lihan Chen
Haiyun Jiang
Jiaqing Liang
Tong Ruan
72
2
0
30 Jun 2022
Long Range Language Modeling via Gated State Spaces
Long Range Language Modeling via Gated State Spaces
Harsh Mehta
Ankit Gupta
Ashok Cutkosky
Behnam Neyshabur
Mamba
140
243
0
27 Jun 2022
Online Segmentation of LiDAR Sequences: Dataset and Algorithm
Online Segmentation of LiDAR Sequences: Dataset and Algorithm
Romain Loiseau
Mathieu Aubry
Loïc Landrieu
3DPC
100
15
0
16 Jun 2022
Separable Self-attention for Mobile Vision Transformers
Separable Self-attention for Mobile Vision Transformers
Sachin Mehta
Mohammad Rastegari
ViTMQ
105
265
0
06 Jun 2022
OnePose: One-Shot Object Pose Estimation without CAD Models
OnePose: One-Shot Object Pose Estimation without CAD Models
Jiaming Sun
Zihao Wang
Siyu Zhang
Xingyi He He
Hongcheng Zhao
Guofeng Zhang
Xiaowei Zhou
181
159
0
24 May 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
125
182
0
27 Apr 2022
Visual Attention Methods in Deep Learning: An In-Depth Survey
Visual Attention Methods in Deep Learning: An In-Depth Survey
Mohammed Hassanin
Saeed Anwar
Ibrahim Radwan
Fahad Shahbaz Khan
Ajmal Mian
136
166
0
16 Apr 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
97
9
0
11 Apr 2022
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from
  Point Clouds
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds
Chenhang He
Ruihuang Li
Shuai Li
Lei Zhang
ViT3DPC
87
173
0
19 Mar 2022
cosFormer: Rethinking Softmax in Attention
cosFormer: Rethinking Softmax in Attention
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
95
222
0
17 Feb 2022
Flowformer: Linearizing Transformers with Conservation Flows
Flowformer: Linearizing Transformers with Conservation Flows
Haixu Wu
Jialong Wu
Jiehui Xu
Jianmin Wang
Mingsheng Long
64
92
0
13 Feb 2022
glassoformer: a query-sparse transformer for post-fault power grid
  voltage prediction
glassoformer: a query-sparse transformer for post-fault power grid voltage prediction
Yunling Zheng
Carson Hu
Guang Lin
Meng Yue
Bao Wang
Jack Xin
114
3
0
22 Jan 2022
Transformer Uncertainty Estimation with Hierarchical Stochastic
  Attention
Transformer Uncertainty Estimation with Hierarchical Stochastic Attention
Jiahuan Pei
Cheng-Yu Wang
Gyuri Szarvas
65
23
0
27 Dec 2021
Efficient Visual Tracking with Exemplar Transformers
Efficient Visual Tracking with Exemplar Transformers
Philippe Blatter
Menelaos Kanakis
Martin Danelljan
Luc Van Gool
ViT
128
84
0
17 Dec 2021
A deep language model to predict metabolic network equilibria
A deep language model to predict metabolic network equilibria
Franccois Charton
Amaury Hayat
Sean T. McQuade
Nathaniel J. Merrill
B. Piccoli
GNN
77
5
0
07 Dec 2021
Linear algebra with transformers
Linear algebra with transformers
Franccois Charton
AIMat
104
59
0
03 Dec 2021
Octree Transformer: Autoregressive 3D Shape Generation on Hierarchically
  Structured Sequences
Octree Transformer: Autoregressive 3D Shape Generation on Hierarchically Structured Sequences
Moritz Ibing
Gregor Kobsik
Leif Kobbelt
90
37
0
24 Nov 2021
Token Pooling in Vision Transformers
Token Pooling in Vision Transformers
D. Marin
Jen-Hao Rick Chang
Anurag Ranjan
Anish K. Prabhu
Mohammad Rastegari
Oncel Tuzel
ViT
143
71
0
08 Oct 2021
ABC: Attention with Bounded-memory Control
ABC: Attention with Bounded-memory Control
Hao Peng
Jungo Kasai
Nikolaos Pappas
Dani Yogatama
Zhaofeng Wu
Lingpeng Kong
Roy Schwartz
Noah A. Smith
125
22
0
06 Oct 2021
Predicting Attention Sparsity in Transformers
Predicting Attention Sparsity in Transformers
Marcos Vinícius Treviso
António Góis
Patrick Fernandes
E. Fonseca
André F. T. Martins
154
14
0
24 Sep 2021
$\infty$-former: Infinite Memory Transformer
∞\infty∞-former: Infinite Memory Transformer
Pedro Henrique Martins
Zita Marinho
André F. T. Martins
98
11
0
01 Sep 2021
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field
  and Far-field Attention
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention
T. Nguyen
Vai Suliafu
Stanley J. Osher
Long Chen
Bao Wang
70
36
0
05 Aug 2021
Grid Partitioned Attention: Efficient TransformerApproximation with
  Inductive Bias for High Resolution Detail Generation
Grid Partitioned Attention: Efficient TransformerApproximation with Inductive Bias for High Resolution Detail Generation
Nikolay Jetchev
Gökhan Yildirim
Christian Bracher
Roland Vollgraf
26
0
0
08 Jul 2021
Learned Token Pruning for Transformers
Learned Token Pruning for Transformers
Sehoon Kim
Sheng Shen
D. Thorsley
A. Gholami
Woosuk Kwon
Joseph Hassoun
Kurt Keutzer
86
157
0
02 Jul 2021
Stable, Fast and Accurate: Kernelized Attention with Relative Positional
  Encoding
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding
Shengjie Luo
Shanda Li
Tianle Cai
Di He
Dinglan Peng
Shuxin Zheng
Guolin Ke
Liwei Wang
Tie-Yan Liu
95
50
0
23 Jun 2021
Prototypical Cross-Attention Networks for Multiple Object Tracking and
  Segmentation
Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation
Lei Ke
Xia Li
Martin Danelljan
Yu-Wing Tai
Chi-Keung Tang
Feng Yu
VOS
69
75
0
22 Jun 2021
Memory-efficient Transformers via Top-$k$ Attention
Memory-efficient Transformers via Top-kkk Attention
Ankit Gupta
Guy Dar
Shaya Goodman
David Ciprut
Jonathan Berant
MQ
98
60
0
13 Jun 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
202
1,147
0
08 Jun 2021
On the Expressive Power of Self-Attention Matrices
On the Expressive Power of Self-Attention Matrices
Valerii Likhosherstov
K. Choromanski
Adrian Weller
95
36
0
07 Jun 2021
Container: Context Aggregation Network
Container: Context Aggregation Network
Peng Gao
Jiasen Lu
Hongsheng Li
Roozbeh Mottaghi
Aniruddha Kembhavi
ViT
106
72
0
02 Jun 2021
FNet: Mixing Tokens with Fourier Transforms
FNet: Mixing Tokens with Fourier Transforms
James Lee-Thorp
Joshua Ainslie
Ilya Eckstein
Santiago Ontanon
134
536
0
09 May 2021
Attention for Image Registration (AiR): an unsupervised Transformer
  approach
Attention for Image Registration (AiR): an unsupervised Transformer approach
Zihao Wang
H. Delingette
ViTMedIm
29
7
0
05 May 2021
12
Next