ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.06732
  4. Cited By
Efficient Transformers: A Survey

Efficient Transformers: A Survey

14 September 2020
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
    VLM
ArXivPDFHTML

Papers citing "Efficient Transformers: A Survey"

50 / 633 papers shown
Title
SOFT: Softmax-free Transformer with Linear Complexity
SOFT: Softmax-free Transformer with Linear Complexity
Jiachen Lu
Jinghan Yao
Junge Zhang
Martin Danelljan
Hang Xu
Weiguo Gao
Chunjing Xu
Thomas B. Schon
Li Zhang
13
159
0
22 Oct 2021
Transformer Acceleration with Dynamic Sparse Attention
Transformer Acceleration with Dynamic Sparse Attention
Liu Liu
Zheng Qu
Zhaodong Chen
Yufei Ding
Yuan Xie
11
11
0
21 Oct 2021
Compositional Attention: Disentangling Search and Retrieval
Compositional Attention: Disentangling Search and Retrieval
Sarthak Mittal
Sharath Chandra Raparthy
Irina Rish
Yoshua Bengio
Guillaume Lajoie
11
20
0
18 Oct 2021
Energon: Towards Efficient Acceleration of Transformers Using Dynamic
  Sparse Attention
Energon: Towards Efficient Acceleration of Transformers Using Dynamic Sparse Attention
Zhe Zhou
Junling Liu
Zhenyu Gu
Guangyu Sun
56
39
0
18 Oct 2021
Sparse Distillation: Speeding Up Text Classification by Using Bigger
  Student Models
Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models
Qinyuan Ye
Madian Khabsa
M. Lewis
Sinong Wang
Xiang Ren
Aaron Jaech
24
5
0
16 Oct 2021
On Learning the Transformer Kernel
On Learning the Transformer Kernel
Sankalan Pal Chowdhury
Adamos Solomou
Kumar Avinava Dubey
Mrinmaya Sachan
ViT
31
14
0
15 Oct 2021
DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization
DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization
Ziming Mao
Chen Henry Wu
Ansong Ni
Yusen Zhang
Rui Zhang
Tao Yu
Budhaditya Deb
Chenguang Zhu
Ahmed Hassan Awadallah
Dragomir R. Radev
14
56
0
15 Oct 2021
StreaMulT: Streaming Multimodal Transformer for Heterogeneous and
  Arbitrary Long Sequential Data
StreaMulT: Streaming Multimodal Transformer for Heterogeneous and Arbitrary Long Sequential Data
Victor Pellegrain
Myriam Tami
M. Batteux
C´eline Hudelot
AI4TS
20
2
0
15 Oct 2021
How Does Momentum Benefit Deep Neural Networks Architecture Design? A
  Few Case Studies
How Does Momentum Benefit Deep Neural Networks Architecture Design? A Few Case Studies
Bao Wang
Hedi Xia
T. Nguyen
Stanley Osher
AI4CE
26
10
0
13 Oct 2021
Leveraging redundancy in attention with Reuse Transformers
Leveraging redundancy in attention with Reuse Transformers
Srinadh Bhojanapalli
Ayan Chakrabarti
Andreas Veit
Michal Lukasik
Himanshu Jain
Frederick Liu
Yin-Wen Chang
Sanjiv Kumar
8
23
0
13 Oct 2021
Pre-trained Language Models in Biomedical Domain: A Systematic Survey
Pre-trained Language Models in Biomedical Domain: A Systematic Survey
Benyou Wang
Qianqian Xie
Jiahuan Pei
Zhihong Chen
Prayag Tiwari
Zhao Li
Jie Fu
LM&MA
AI4CE
23
160
0
11 Oct 2021
Token Pooling in Vision Transformers
Token Pooling in Vision Transformers
D. Marin
Jen-Hao Rick Chang
Anurag Ranjan
Anish K. Prabhu
Mohammad Rastegari
Oncel Tuzel
ViT
65
65
0
08 Oct 2021
ABC: Attention with Bounded-memory Control
ABC: Attention with Bounded-memory Control
Hao Peng
Jungo Kasai
Nikolaos Pappas
Dani Yogatama
Zhaofeng Wu
Lingpeng Kong
Roy Schwartz
Noah A. Smith
61
22
0
06 Oct 2021
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Lin Zheng
Huijie Pan
Lingpeng Kong
16
3
0
06 Oct 2021
PoNet: Pooling Network for Efficient Token Mixing in Long Sequences
PoNet: Pooling Network for Efficient Token Mixing in Long Sequences
Chao-Hong Tan
Qian Chen
Wen Wang
Qinglin Zhang
Siqi Zheng
Zhenhua Ling
ViT
6
11
0
06 Oct 2021
Understanding and Overcoming the Challenges of Efficient Transformer
  Quantization
Understanding and Overcoming the Challenges of Efficient Transformer Quantization
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
12
132
0
27 Sep 2021
Vision Transformer Hashing for Image Retrieval
Vision Transformer Hashing for Image Retrieval
S. Dubey
S. Singh
Wei Chu
ViT
25
47
0
26 Sep 2021
Long-Range Transformers for Dynamic Spatiotemporal Forecasting
Long-Range Transformers for Dynamic Spatiotemporal Forecasting
J. E. Grigsby
Zhe Wang
Nam Nguyen
Yanjun Qi
AI4TS
58
83
0
24 Sep 2021
Named Entity Recognition and Classification on Historical Documents: A
  Survey
Named Entity Recognition and Classification on Historical Documents: A Survey
Maud Ehrmann
Ahmed Hamdi
Elvys Linhares Pontes
Matteo Romanello
A. Doucet
47
108
0
23 Sep 2021
Scale Efficiently: Insights from Pre-training and Fine-tuning
  Transformers
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Yi Tay
Mostafa Dehghani
J. Rao
W. Fedus
Samira Abnar
Hyung Won Chung
Sharan Narang
Dani Yogatama
Ashish Vaswani
Donald Metzler
185
110
0
22 Sep 2021
Audiomer: A Convolutional Transformer For Keyword Spotting
Surya Kant Sahu
Sai Mitheran
Juhi Kamdar
Meet Gandhi
16
8
0
21 Sep 2021
Survey: Transformer based Video-Language Pre-training
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLM
ViT
61
44
0
21 Sep 2021
General Cross-Architecture Distillation of Pretrained Language Models
  into Matrix Embeddings
General Cross-Architecture Distillation of Pretrained Language Models into Matrix Embeddings
Lukas Galke
Isabelle Cuber
Christophe Meyer
Henrik Ferdinand Nolscher
Angelina Sonderecker
A. Scherp
17
2
0
17 Sep 2021
SHAPE: Shifted Absolute Position Embedding for Transformers
SHAPE: Shifted Absolute Position Embedding for Transformers
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
Kentaro Inui
223
44
0
13 Sep 2021
Query-driven Segment Selection for Ranking Long Documents
Query-driven Segment Selection for Ranking Long Documents
Youngwoo Kim
Razieh Rahimi
Hamed Bonab
James Allan
RALM
23
5
0
10 Sep 2021
MATE: Multi-view Attention for Table Transformer Efficiency
MATE: Multi-view Attention for Table Transformer Efficiency
Julian Martin Eisenschlos
Maharshi Gor
Thomas Müller
William W. Cohen
LMTD
67
92
0
09 Sep 2021
Sparsity and Sentence Structure in Encoder-Decoder Attention of
  Summarization Systems
Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems
Potsawee Manakul
Mark J. F. Gales
13
5
0
08 Sep 2021
Memory and Knowledge Augmented Language Models for Inferring Salience in
  Long-Form Stories
Memory and Knowledge Augmented Language Models for Inferring Salience in Long-Form Stories
David Wilmot
Frank Keller
RALM
KELM
8
21
0
08 Sep 2021
PermuteFormer: Efficient Relative Position Encoding for Long Sequences
PermuteFormer: Efficient Relative Position Encoding for Long Sequences
Peng-Jen Chen
20
21
0
06 Sep 2021
$\infty$-former: Infinite Memory Transformer
∞\infty∞-former: Infinite Memory Transformer
Pedro Henrique Martins
Zita Marinho
André F. T. Martins
28
11
0
01 Sep 2021
SHIFT15M: Fashion-specific dataset for set-to-set matching with several
  distribution shifts
SHIFT15M: Fashion-specific dataset for set-to-set matching with several distribution shifts
Masanari Kimura
Takuma Nakamura
Yuki Saito
OOD
20
3
0
30 Aug 2021
A Web Scale Entity Extraction System
A Web Scale Entity Extraction System
Xuanting Cai
Quanbin Ma
Pan Li
Jianyu Liu
Qi Zeng
Zhengkan Yang
Pushkar Tripathi
17
0
0
27 Aug 2021
Greenformers: Improving Computation and Memory Efficiency in Transformer
  Models via Low-Rank Approximation
Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation
Samuel Cahyawijaya
26
12
0
24 Aug 2021
Smart Bird: Learnable Sparse Attention for Efficient and Effective
  Transformer
Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer
Chuhan Wu
Fangzhao Wu
Tao Qi
Binxing Jiao
Daxin Jiang
Yongfeng Huang
Xing Xie
19
3
0
20 Aug 2021
Fastformer: Additive Attention Can Be All You Need
Fastformer: Additive Attention Can Be All You Need
Chuhan Wu
Fangzhao Wu
Tao Qi
Yongfeng Huang
Xing Xie
22
112
0
20 Aug 2021
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field
  and Far-field Attention
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention
T. Nguyen
Vai Suliafu
Stanley J. Osher
Long Chen
Bao Wang
10
35
0
05 Aug 2021
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Yifan Xu
Zhijie Zhang
Mengdan Zhang
Kekai Sheng
Ke Li
Weiming Dong
Liqing Zhang
Changsheng Xu
Xing Sun
ViT
18
201
0
03 Aug 2021
Representation learning for neural population activity with Neural Data
  Transformers
Representation learning for neural population activity with Neural Data Transformers
Joel Ye
C. Pandarinath
AI4TS
AI4CE
6
51
0
02 Aug 2021
A Survey of Human-in-the-loop for Machine Learning
A Survey of Human-in-the-loop for Machine Learning
Xingjiao Wu
Luwei Xiao
Yixuan Sun
Junhang Zhang
Tianlong Ma
Liangbo He
SyDa
31
499
0
02 Aug 2021
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Andrew Jaegle
Sebastian Borgeaud
Jean-Baptiste Alayrac
Carl Doersch
Catalin Ionescu
...
Olivier J. Hénaff
M. Botvinick
Andrew Zisserman
Oriol Vinyals
João Carreira
MLLM
VLM
GNN
17
561
0
30 Jul 2021
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for
  Sequences
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences
Zhenhai Zhu
Radu Soricut
95
41
0
25 Jul 2021
Clinical Relation Extraction Using Transformer-based Models
Clinical Relation Extraction Using Transformer-based Models
Xi Yang
Zehao Yu
Yi Guo
Jiang Bian
Yonghui Wu
LM&MA
MedIm
22
20
0
19 Jul 2021
Video Crowd Localization with Multi-focus Gaussian Neighborhood
  Attention and a Large-Scale Benchmark
Video Crowd Localization with Multi-focus Gaussian Neighborhood Attention and a Large-Scale Benchmark
Haopeng Li
Lingbo Liu
Kunlin Yang
Shinan Liu
Junyuan Gao
Bin Zhao
Rui Zhang
Jun Hou
37
14
0
19 Jul 2021
STAR: Sparse Transformer-based Action Recognition
STAR: Sparse Transformer-based Action Recognition
Feng Shi
Chonghan Lee
Liang Qiu
Yizhou Zhao
Tianyi Shen
Shivran Muralidhar
Tian Han
Song-Chun Zhu
V. Narayanan
ViT
16
26
0
15 Jul 2021
Efficient Transformer for Direct Speech Translation
Efficient Transformer for Direct Speech Translation
Belen Alastruey
Gerard I. Gállego
Marta R. Costa-jussá
6
7
0
07 Jul 2021
Poly-NL: Linear Complexity Non-local Layers with Polynomials
Poly-NL: Linear Complexity Non-local Layers with Polynomials
F. Babiloni
Ioannis Marras
Filippos Kokkinos
Jiankang Deng
Grigorios G. Chrysos
S. Zafeiriou
23
6
0
06 Jul 2021
Clustering and attention model based for intelligent trading
Clustering and attention model based for intelligent trading
Mimansa Rana
Nanxiang Mao
Ming Ao
Xiaohui Wu
Poning Liang
Matloob Khushi
10
1
0
06 Jul 2021
Vision Xformers: Efficient Attention for Image Classification
Vision Xformers: Efficient Attention for Image Classification
Pranav Jeevan
Amit Sethi
ViT
6
13
0
05 Jul 2021
A Primer on Pretrained Multilingual Language Models
A Primer on Pretrained Multilingual Language Models
Sumanth Doddapaneni
Gowtham Ramesh
Mitesh M. Khapra
Anoop Kunchukuttan
Pratyush Kumar
LRM
30
73
0
01 Jul 2021
Improving the Efficiency of Transformers for Resource-Constrained
  Devices
Improving the Efficiency of Transformers for Resource-Constrained Devices
Hamid Tabani
Ajay Balasubramaniam
Shabbir Marzban
Elahe Arani
Bahram Zonooz
17
20
0
30 Jun 2021
Previous
123...10111213
Next