ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.06732
  4. Cited By
Efficient Transformers: A Survey

Efficient Transformers: A Survey

14 September 2020
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
    VLM
ArXivPDFHTML

Papers citing "Efficient Transformers: A Survey"

33 / 633 papers shown
Title
MalBERT: Using Transformers for Cybersecurity and Malicious Software
  Detection
MalBERT: Using Transformers for Cybersecurity and Malicious Software Detection
Abir Rahali
M. Akhloufi
6
30
0
05 Mar 2021
Perceiver: General Perception with Iterative Attention
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
48
970
0
04 Mar 2021
Random Feature Attention
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
8
346
0
03 Mar 2021
OmniNet: Omnidirectional Representations from Transformers
OmniNet: Omnidirectional Representations from Transformers
Yi Tay
Mostafa Dehghani
V. Aribandi
Jai Gupta
Philip Pham
Zhen Qin
Dara Bahri
Da-Cheng Juan
Donald Metzler
34
26
0
01 Mar 2021
Single-Shot Motion Completion with Transformer
Single-Shot Motion Completion with Transformer
Yinglin Duan
Tianyang Shi
Zhengxia Zou
Yenan Lin
Zhehui Qian
Bohan Zhang
U. Michigan
ViT
11
75
0
01 Mar 2021
When Attention Meets Fast Recurrence: Training Language Models with
  Reduced Compute
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Tao Lei
RALM
VLM
40
47
0
24 Feb 2021
Training Large-Scale News Recommenders with Pretrained Language Models
  in the Loop
Training Large-Scale News Recommenders with Pretrained Language Models in the Loop
Shitao Xiao
Zheng Liu
Yingxia Shao
Tao Di
Xing Xie
VLM
AIFin
116
41
0
18 Feb 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
260
178
0
17 Feb 2021
Translational Equivariance in Kernelizable Attention
Translational Equivariance in Kernelizable Attention
Max Horn
Kumar Shridhar
Elrich Groenewald
Philipp F. M. Baumann
6
7
0
15 Feb 2021
Exploring Classic and Neural Lexical Translation Models for Information
  Retrieval: Interpretability, Effectiveness, and Efficiency Benefits
Exploring Classic and Neural Lexical Translation Models for Information Retrieval: Interpretability, Effectiveness, and Efficiency Benefits
Leonid Boytsov
Zico Kolter
6
11
0
12 Feb 2021
HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis
  and Emotion Recognition
HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition
Avihay Chriqui
I. Yahav
17
36
0
03 Feb 2021
Bottleneck Transformers for Visual Recognition
Bottleneck Transformers for Visual Recognition
A. Srinivas
Tsung-Yi Lin
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
267
955
0
27 Jan 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
F. Khan
M. Shah
ViT
225
2,404
0
04 Jan 2021
Improving reference mining in patents with BERT
Improving reference mining in patents with BERT
K. Voskuil
Suzan Verberne
13
10
0
04 Jan 2021
What all do audio transformer models hear? Probing Acoustic
  Representations for Language Delivery and its Structure
What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure
Jui Shah
Yaman Kumar Singla
Changyou Chen
R. Shah
17
81
0
02 Jan 2021
A Survey on Deep Reinforcement Learning for Audio-Based Applications
A Survey on Deep Reinforcement Learning for Audio-Based Applications
S. Latif
Heriberto Cuayáhuitl
Farrukh Pervez
Fahad Shamshad
Hafiz Shehbaz Ali
Erik Cambria
OffRL
32
73
0
01 Jan 2021
Shortformer: Better Language Modeling using Shorter Inputs
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
219
87
0
31 Dec 2020
Reservoir Transformers
Reservoir Transformers
Sheng Shen
Alexei Baevski
Ari S. Morcos
Kurt Keutzer
Michael Auli
Douwe Kiela
12
17
0
30 Dec 2020
RealFormer: Transformer Likes Residual Attention
RealFormer: Transformer Likes Residual Attention
Ruining He
Anirudh Ravula
Bhargav Kanagal
Joshua Ainslie
11
106
0
21 Dec 2020
Sub-Linear Memory: How to Make Performers SLiM
Sub-Linear Memory: How to Make Performers SLiM
Valerii Likhosherstov
K. Choromanski
Jared Davis
Xingyou Song
Adrian Weller
12
19
0
21 Dec 2020
Noise-Robust End-to-End Quantum Control using Deep Autoregressive Policy
  Networks
Noise-Robust End-to-End Quantum Control using Deep Autoregressive Policy Networks
Jiahao Yao
Paul Köttering
Hans Gundlach
Lin Lin
Marin Bukov
12
14
0
12 Dec 2020
Long Range Arena: A Benchmark for Efficient Transformers
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Yikang Shen
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
19
689
0
08 Nov 2020
Recent Developments on ESPnet Toolkit Boosted by Conformer
Recent Developments on ESPnet Toolkit Boosted by Conformer
Pengcheng Guo
Florian Boyer
Xuankai Chang
Tomoki Hayashi
Yosuke Higuchi
...
Jing Shi
Shinji Watanabe
Kun Wei
Wangyou Zhang
Yuekai Zhang
24
261
0
26 Oct 2020
An Improved Event-Independent Network for Polyphonic Sound Event
  Localization and Detection
An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection
Yin Cao
Turab Iqbal
Qiuqiang Kong
Y. Zhong
Wenwu Wang
Mark D. Plumbley
9
75
0
25 Oct 2020
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime
  with Search
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search
Gyuwan Kim
Kyunghyun Cho
21
92
0
14 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
35
4,882
0
08 Oct 2020
My Body is a Cage: the Role of Morphology in Graph-Based Incompatible
  Control
My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control
Vitaly Kurin
Maximilian Igl
Tim Rocktaschel
Wendelin Boehmer
Shimon Whiteson
AI4CE
11
84
0
05 Oct 2020
Attention Meets Perturbations: Robust and Interpretable Attention with
  Adversarial Training
Attention Meets Perturbations: Robust and Interpretable Attention with Adversarial Training
Shunsuke Kitada
Hitoshi Iyatomi
OOD
AAML
6
25
0
25 Sep 2020
Cluster-Former: Clustering-based Sparse Transformer for Long-Range
  Dependency Encoding
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
Shuohang Wang
Luowei Zhou
Zhe Gan
Yen-Chun Chen
Yuwei Fang
S. Sun
Yu Cheng
Jingjing Liu
24
27
0
13 Sep 2020
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
249
1,982
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
238
578
0
12 Mar 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
225
571
0
12 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
Previous
123...111213