ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.06732
  4. Cited By
Efficient Transformers: A Survey

Efficient Transformers: A Survey

14 September 2020
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
    VLM
ArXivPDFHTML

Papers citing "Efficient Transformers: A Survey"

50 / 633 papers shown
Title
E-Branchformer: Branchformer with Enhanced merging for speech
  recognition
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
50
105
0
30 Sep 2022
Transformers for Object Detection in Large Point Clouds
Transformers for Object Detection in Large Point Clouds
Felicia Ruppel
F. Faion
Claudius Gläser
Klaus C. J. Dietmayer
ViT
25
5
0
30 Sep 2022
From One to Many: Dynamic Cross Attention Networks for LiDAR and Camera
  Fusion
From One to Many: Dynamic Cross Attention Networks for LiDAR and Camera Fusion
Rui Wan
Shuangjie Xu
Wei Wu
Xiaoyi Zou
Tongyi Cao
3DPC
12
4
0
25 Sep 2022
Efficient Long Sequential User Data Modeling for Click-Through Rate
  Prediction
Efficient Long Sequential User Data Modeling for Click-Through Rate Prediction
Qiwei Chen
Yue Xu
Changhua Pei
Shanshan Lv
Tao Zhuang
Junfeng Ge
3DV
6
3
0
25 Sep 2022
Mega: Moving Average Equipped Gated Attention
Mega: Moving Average Equipped Gated Attention
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
12
182
0
21 Sep 2022
Adapting Pretrained Text-to-Text Models for Long Text Sequences
Adapting Pretrained Text-to-Text Models for Long Text Sequences
Wenhan Xiong
Anchit Gupta
Shubham Toshniwal
Yashar Mehdad
Wen-tau Yih
RALM
VLM
49
30
0
21 Sep 2022
Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and
  Algorithm Co-design
Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design
Hongxiang Fan
Thomas C. P. Chau
Stylianos I. Venieris
Royson Lee
Alexandros Kouris
Wayne Luk
Nicholas D. Lane
Mohamed S. Abdelfattah
22
56
0
20 Sep 2022
Quantum Vision Transformers
Quantum Vision Transformers
El Amine Cherrat
Iordanis Kerenidis
Natansh Mathur
Jonas Landman
M. Strahm
Yun. Y Li
ViT
34
54
0
16 Sep 2022
BERT-based Ensemble Approaches for Hate Speech Detection
BERT-based Ensemble Approaches for Hate Speech Detection
Khouloud Mnassri
P. Rajapaksha
R. Farahbakhsh
Noel Crespi
9
18
0
14 Sep 2022
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
D. Fox
LM&Ro
155
453
0
12 Sep 2022
Pre-Training a Graph Recurrent Network for Language Representation
Pre-Training a Graph Recurrent Network for Language Representation
Yile Wang
Linyi Yang
Zhiyang Teng
M. Zhou
Yue Zhang
GNN
15
1
0
08 Sep 2022
A Review of Sparse Expert Models in Deep Learning
A Review of Sparse Expert Models in Deep Learning
W. Fedus
J. Dean
Barret Zoph
MoE
8
144
0
04 Sep 2022
Deep Sparse Conformer for Speech Recognition
Deep Sparse Conformer for Speech Recognition
Xianchao Wu
14
2
0
01 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
28
109
0
31 Aug 2022
Persistence Initialization: A novel adaptation of the Transformer
  architecture for Time Series Forecasting
Persistence Initialization: A novel adaptation of the Transformer architecture for Time Series Forecasting
Espen Haugsdal
Erlend Aune
M. Ruocco
AI4TS
AI4CE
14
14
0
30 Aug 2022
Expressing Multivariate Time Series as Graphs with Time Series Attention
  Transformer
Expressing Multivariate Time Series as Graphs with Time Series Attention Transformer
W. Ng
K. Siu
Albert C. Cheung
Michael K. Ng
AI4TS
10
7
0
19 Aug 2022
Treeformer: Dense Gradient Trees for Efficient Attention Computation
Treeformer: Dense Gradient Trees for Efficient Attention Computation
Lovish Madaan
Srinadh Bhojanapalli
Himanshu Jain
Prateek Jain
19
6
0
18 Aug 2022
Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End
  Speech Recognition
Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition
A. Andrusenko
R. Nasretdinov
A. Romanenko
8
18
0
16 Aug 2022
An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse
  Transformers
An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers
Chao Fang
Aojun Zhou
Zhongfeng Wang
MoE
17
52
0
12 Aug 2022
Deep is a Luxury We Don't Have
Deep is a Luxury We Don't Have
Ahmed Taha
Yen Nhi Truong Vu
Brent Mombourquette
Thomas P. Matthews
Jason Su
Sadanand Singh
ViT
MedIm
16
2
0
11 Aug 2022
Investigating Efficiently Extending Transformers for Long Input
  Summarization
Investigating Efficiently Extending Transformers for Long Input Summarization
Jason Phang
Yao-Min Zhao
Peter J. Liu
RALM
LLMAG
23
63
0
08 Aug 2022
3D Vision with Transformers: A Survey
3D Vision with Transformers: A Survey
Jean Lahoud
Jiale Cao
F. Khan
Hisham Cholakkal
Rao Muhammad Anwer
Salman Khan
Ming Yang
ViT
MedIm
27
32
0
08 Aug 2022
Global Hierarchical Attention for 3D Point Cloud Analysis
Global Hierarchical Attention for 3D Point Cloud Analysis
Dan Jia
Alexander Hermans
Bastian Leibe
3DPC
21
0
0
07 Aug 2022
Robust RGB-D Fusion for Saliency Detection
Robust RGB-D Fusion for Saliency Detection
Zongwei Wu
Shriarulmozhivarman Gobichettipalayam
Brahim Tamadazte
Guillaume Allibert
D. Paudel
C. Demonceaux
16
26
0
02 Aug 2022
Efficient Long-Text Understanding with Short-Text Models
Efficient Long-Text Understanding with Short-Text Models
Maor Ivgi
Uri Shaham
Jonathan Berant
VLM
22
75
0
01 Aug 2022
Momentum Transformer: Closing the Performance Gap Between Self-attention
  and Its Linearization
Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization
T. Nguyen
Richard G. Baraniuk
Robert M. Kirby
Stanley J. Osher
Bao Wang
21
9
0
01 Aug 2022
A Survey of Learning on Small Data: Generalization, Optimization, and
  Challenge
A Survey of Learning on Small Data: Generalization, Optimization, and Challenge
Xiaofeng Cao
Weixin Bu
Sheng-Jun Huang
Minling Zhang
Ivor W. Tsang
Yew-Soon Ong
James T. Kwok
30
1
0
29 Jul 2022
Neural Architecture Search on Efficient Transformers and Beyond
Neural Architecture Search on Efficient Transformers and Beyond
Zexiang Liu
Dong Li
Kaiyue Lu
Zhen Qin
Weixuan Sun
Jiacheng Xu
Yiran Zhong
25
19
0
28 Jul 2022
Efficient High-Resolution Deep Learning: A Survey
Efficient High-Resolution Deep Learning: A Survey
Arian Bakhtiarnia
Qi Zhang
Alexandros Iosifidis
MedIm
11
17
0
26 Jul 2022
Scaling Laws vs Model Architectures: How does Inductive Bias Influence
  Scaling?
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Yi Tay
Mostafa Dehghani
Samira Abnar
Hyung Won Chung
W. Fedus
J. Rao
Sharan Narang
Vinh Q. Tran
Dani Yogatama
Donald Metzler
AI4CE
22
100
0
21 Jul 2022
TaDaa: real time Ticket Assignment Deep learning Auto Advisor for
  customer support, help desk, and issue ticketing systems
TaDaa: real time Ticket Assignment Deep learning Auto Advisor for customer support, help desk, and issue ticketing systems
Leon Feng
J. Senapati
Bill Liu
13
6
0
18 Jul 2022
Mobile Keystroke Biometrics Using Transformers
Mobile Keystroke Biometrics Using Transformers
Giuseppe Stragapede
Paula Delgado-Santos
Ruben Tolosana
R. Vera-Rodríguez
R. Guest
Aythami Morales
11
16
0
15 Jul 2022
Confident Adaptive Language Modeling
Confident Adaptive Language Modeling
Tal Schuster
Adam Fisch
Jai Gupta
Mostafa Dehghani
Dara Bahri
Vinh Q. Tran
Yi Tay
Donald Metzler
43
159
0
14 Jul 2022
Rethinking Attention Mechanism in Time Series Classification
Rethinking Attention Mechanism in Time Series Classification
Bowen Zhao
Huanlai Xing
Xinhan Wang
Fuhong Song
Zhiwen Xiao
AI4TS
28
30
0
14 Jul 2022
Transformer-based Context Condensation for Boosting Feature Pyramids in
  Object Detection
Transformer-based Context Condensation for Boosting Feature Pyramids in Object Detection
Zhe Chen
Jing Zhang
Yufei Xu
Dacheng Tao
ViT
10
11
0
14 Jul 2022
Wayformer: Motion Forecasting via Simple & Efficient Attention Networks
Wayformer: Motion Forecasting via Simple & Efficient Attention Networks
Nigamaa Nayakanti
Rami Al-Rfou
Aurick Zhou
Kratarth Goel
Khaled S. Refaat
Benjamin Sapp
AI4TS
40
234
0
12 Jul 2022
STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining
STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining
Liwei Guo
Wonkyo Choe
F. Lin
11
14
0
11 Jul 2022
Attention and Self-Attention in Random Forests
Attention and Self-Attention in Random Forests
Lev V. Utkin
A. Konstantinov
21
3
0
09 Jul 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and
  Global Context for Speech Recognition and Understanding
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
19
142
0
06 Jul 2022
Pure Transformers are Powerful Graph Learners
Pure Transformers are Powerful Graph Learners
Jinwoo Kim
Tien Dat Nguyen
Seonwoo Min
Sungjun Cho
Moontae Lee
Honglak Lee
Seunghoon Hong
19
187
0
06 Jul 2022
Softmax-free Linear Transformers
Softmax-free Linear Transformers
Jiachen Lu
Junge Zhang
Xiatian Zhu
Jianfeng Feng
Tao Xiang
Li Zhang
ViT
11
7
0
05 Jul 2022
Compute Cost Amortized Transformer for Streaming ASR
Compute Cost Amortized Transformer for Streaming ASR
Yifan Xie
J. Macoskey
Martin H. Radfar
Feng-Ju Chang
Brian King
Ariya Rastrow
Athanasios Mouchtaris
Grant P. Strimel
17
7
0
05 Jul 2022
Understanding Performance of Long-Document Ranking Models through
  Comprehensive Evaluation and Leaderboarding
Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and Leaderboarding
Leonid Boytsov
David Akinpelu
Tianyi Lin
Fangwei Gao
Yutian Zhao
Jeffrey Huang
Nipun Katyal
Eric Nyberg
31
9
0
04 Jul 2022
An Empirical Survey on Long Document Summarization: Datasets, Models and
  Metrics
An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics
Huan Yee Koh
Jiaxin Ju
Ming Liu
Shirui Pan
73
122
0
03 Jul 2022
Long Range Language Modeling via Gated State Spaces
Long Range Language Modeling via Gated State Spaces
Harsh Mehta
Ankit Gupta
Ashok Cutkosky
Behnam Neyshabur
Mamba
26
231
0
27 Jun 2022
Vicinity Vision Transformer
Vicinity Vision Transformer
Weixuan Sun
Zhen Qin
Huiyuan Deng
Jianyuan Wang
Yi Zhang
Kaihao Zhang
Nick Barnes
Stan Birchfield
Lingpeng Kong
Yiran Zhong
ViT
34
31
0
21 Jun 2022
Square One Bias in NLP: Towards a Multi-Dimensional Exploration of the
  Research Manifold
Square One Bias in NLP: Towards a Multi-Dimensional Exploration of the Research Manifold
Sebastian Ruder
Ivan Vulić
Anders Søgaard
30
25
0
20 Jun 2022
Resource-Efficient Separation Transformer
Resource-Efficient Separation Transformer
Luca Della Libera
Cem Subakan
Mirco Ravanelli
Samuele Cornell
Frédéric Lepoutre
François Grondin
VLM
35
15
0
19 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
41
522
0
13 Jun 2022
ChordMixer: A Scalable Neural Attention Model for Sequences with
  Different Lengths
ChordMixer: A Scalable Neural Attention Model for Sequences with Different Lengths
Ruslan Khalitov
Tong Yu
Lei Cheng
Zhirong Yang
22
12
0
12 Jun 2022
Previous
123...789...111213
Next