ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.06732
  4. Cited By
Efficient Transformers: A Survey

Efficient Transformers: A Survey

14 September 2020
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
    VLM
ArXivPDFHTML

Papers citing "Efficient Transformers: A Survey"

50 / 633 papers shown
Title
Hardness of Low Rank Approximation of Entrywise Transformed Matrix
  Products
Hardness of Low Rank Approximation of Entrywise Transformed Matrix Products
Tamás Sarlós
Xingyou Song
David P. Woodruff
Qiuyi
Qiuyi Zhang
26
3
0
03 Nov 2023
ForecastPFN: Synthetically-Trained Zero-Shot Forecasting
ForecastPFN: Synthetically-Trained Zero-Shot Forecasting
Samuel Dooley
Gurnoor Singh Khurana
Chirag Mohapatra
Siddartha Naidu
Colin White
AI4TS
79
56
0
03 Nov 2023
PAUMER: Patch Pausing Transformer for Semantic Segmentation
PAUMER: Patch Pausing Transformer for Semantic Segmentation
Evann Courdier
Prabhu Teja Sivaprasad
F. Fleuret
24
2
0
01 Nov 2023
SpecTr: Fast Speculative Decoding via Optimal Transport
SpecTr: Fast Speculative Decoding via Optimal Transport
Ziteng Sun
A. Suresh
Jae Hun Ro
Ahmad Beirami
Himanshu Jain
Felix X. Yu
40
64
0
23 Oct 2023
Transformers for Trajectory Optimization with Application to Spacecraft
  Rendezvous
Transformers for Trajectory Optimization with Application to Spacecraft Rendezvous
T. Guffanti
Daniele Gammelli
Simone DÁmico
Marco Pavone
32
14
0
20 Oct 2023
Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for
  Long Sequences
Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences
Yanming Kang
Giang Tran
H. Sterck
16
3
0
18 Oct 2023
Recasting Continual Learning as Sequence Modeling
Recasting Continual Learning as Sequence Modeling
Soochan Lee
Jaehyeon Son
Gunhee Kim
CLL
16
9
0
18 Oct 2023
Learning to Rank Context for Named Entity Recognition Using a Synthetic
  Dataset
Learning to Rank Context for Named Entity Recognition Using a Synthetic Dataset
Arthur Amalvy
Vincent Labatut
Richard Dufour
23
9
0
16 Oct 2023
Is attention required for ICL? Exploring the Relationship Between Model
  Architecture and In-Context Learning Ability
Is attention required for ICL? Exploring the Relationship Between Model Architecture and In-Context Learning Ability
Ivan Lee
Nan Jiang
Taylor Berg-Kirkpatrick
23
12
0
12 Oct 2023
Pit One Against Many: Leveraging Attention-head Embeddings for
  Parameter-efficient Multi-head Attention
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention
Huiyin Xue
Nikolaos Aletras
23
0
0
11 Oct 2023
Task-Adaptive Tokenization: Enhancing Long-Form Text Generation Efficacy
  in Mental Health and Beyond
Task-Adaptive Tokenization: Enhancing Long-Form Text Generation Efficacy in Mental Health and Beyond
Siyang Liu
Naihao Deng
Sahand Sabour
Yilin Jia
Minlie Huang
Rada Mihalcea
20
18
0
09 Oct 2023
Retrieval meets Long Context Large Language Models
Retrieval meets Long Context Large Language Models
Peng-Tao Xu
Wei Ping
Xianchao Wu
Lawrence C. McAfee
Chen Zhu
Zihan Liu
Sandeep Subramanian
Evelina Bakhturina
M. Shoeybi
Bryan Catanzaro
RALM
LRM
9
79
0
04 Oct 2023
Never Train from Scratch: Fair Comparison of Long-Sequence Models
  Requires Data-Driven Priors
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
Ido Amos
Jonathan Berant
Ankit Gupta
20
24
0
04 Oct 2023
Nugget: Neural Agglomerative Embeddings of Text
Nugget: Neural Agglomerative Embeddings of Text
Guanghui Qin
Benjamin Van Durme
22
18
0
03 Oct 2023
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
Praneeth Kacham
Vahab Mirrokni
Peilin Zhong
25
7
0
02 Oct 2023
Efficient Streaming Language Models with Attention Sinks
Efficient Streaming Language Models with Attention Sinks
Michel Lang
Yuandong Tian
Beidi Chen
Song Han
Mike Lewis
AI4TS
RALM
16
629
0
29 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
24
15
0
28 Sep 2023
Only 5\% Attention Is All You Need: Efficient Long-range Document-level
  Neural Machine Translation
Only 5\% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation
Zihan Liu
Zewei Sun
Shanbo Cheng
Shujian Huang
Mingxuan Wang
18
1
0
25 Sep 2023
On Sparse Modern Hopfield Model
On Sparse Modern Hopfield Model
Jerry Yao-Chieh Hu
Donglin Yang
Dennis Wu
Chenwei Xu
Bo-Yu Chen
Han Liu
VLM
28
28
0
22 Sep 2023
SPION: Layer-Wise Sparse Training of Transformer via Convolutional Flood
  Filling
SPION: Layer-Wise Sparse Training of Transformer via Convolutional Flood Filling
Bokyeong Yoon
Yoonsang Han
Gordon Euhyun Moon
11
0
0
22 Sep 2023
TiBGL: Template-induced Brain Graph Learning for Functional Neuroimaging
  Analysis
TiBGL: Template-induced Brain Graph Learning for Functional Neuroimaging Analysis
Xiangzhu Meng
Wei Wei
Qiang Liu
Shu Wu
Liang Wang
14
0
0
14 Sep 2023
RT-LM: Uncertainty-Aware Resource Management for Real-Time Inference of
  Language Models
RT-LM: Uncertainty-Aware Resource Management for Real-Time Inference of Language Models
Yufei Li
Zexin Li
Wei Yang
Cong Liu
16
6
0
12 Sep 2023
Content Reduction, Surprisal and Information Density Estimation for Long
  Documents
Content Reduction, Surprisal and Information Density Estimation for Long Documents
Shaoxiong Ji
Wei Sun
Pekka Marttinen
12
2
0
12 Sep 2023
Mobile Vision Transformer-based Visual Object Tracking
Mobile Vision Transformer-based Visual Object Tracking
Goutam Yelluru Gopal
Maria A. Amer
16
5
0
11 Sep 2023
Curve Your Attention: Mixed-Curvature Transformers for Graph
  Representation Learning
Curve Your Attention: Mixed-Curvature Transformers for Graph Representation Learning
Sungjun Cho
Seunghyuk Cho
Sungwoo Park
Hankook Lee
Ho Hin Lee
Moontae Lee
12
6
0
08 Sep 2023
Language Models for Novelty Detection in System Call Traces
Language Models for Novelty Detection in System Call Traces
Quentin Fournier
Daniel Aloise
Leandro R. Costa
AI4TS
11
4
0
05 Sep 2023
A survey on efficient vision transformers: algorithms, techniques, and
  performance benchmarking
A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking
Lorenzo Papa
Paolo Russo
Irene Amerini
Luping Zhou
14
39
0
05 Sep 2023
Patient-specific, mechanistic models of tumor growth incorporating
  artificial intelligence and big data
Patient-specific, mechanistic models of tumor growth incorporating artificial intelligence and big data
G. Lorenzo
Syed Rakin Ahmed
D. Hormuth
Brenna Vaughn
Jayashree Kalpathy-Cramer
Luis Solorio
T. Yankeelov
H. Gómez
AI4CE
19
13
0
28 Aug 2023
LongBench: A Bilingual, Multitask Benchmark for Long Context
  Understanding
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
Yushi Bai
Xin Lv
Jiajie Zhang
Hong Lyu
Jiankai Tang
...
Aohan Zeng
Lei Hou
Yuxiao Dong
Jie Tang
Juanzi Li
LLMAG
RALM
26
486
0
28 Aug 2023
Benchmarking Data Efficiency and Computational Efficiency of Temporal
  Action Localization Models
Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models
Jan Warchocki
Teodor Oprescu
Yunhan Wang
Alexandru Damacus
Paul Misterka
Robert-Jan Bruintjes
A. Lengyel
Ombretta Strafforello
J. C. V. Gemert
11
2
0
24 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Erik Cambria
Björn W. Schuller
LM&MA
AuLLM
29
36
0
24 Aug 2023
How Much Temporal Long-Term Context is Needed for Action Segmentation?
How Much Temporal Long-Term Context is Needed for Action Segmentation?
Emad Bahrami Rad
Gianpiero Francesca
Juergen Gall
ViT
11
24
0
22 Aug 2023
A Lightweight Transformer for Faster and Robust EBSD Data Collection
A Lightweight Transformer for Faster and Robust EBSD Data Collection
Harry Dong
S. Donegan
M. Shah
Yuejie Chi
11
2
0
18 Aug 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
35
3
0
18 Aug 2023
Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals
Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals
Running Zhao
Jiang-Tao Luca Yu
H. Zhao
Edith C. H. Ngai
11
4
0
16 Aug 2023
Attention Is Not All You Need Anymore
Attention Is Not All You Need Anymore
Zhe Chen
14
3
0
15 Aug 2023
Optimizing a Transformer-based network for a deep learning seismic
  processing workflow
Optimizing a Transformer-based network for a deep learning seismic processing workflow
R. Harsuko
T. Alkhalifah
17
8
0
09 Aug 2023
Sparse Binary Transformers for Multivariate Time Series Modeling
Sparse Binary Transformers for Multivariate Time Series Modeling
Matt Gorbett
Hossein Shirazi
I. Ray
AI4TS
17
13
0
09 Aug 2023
LGViT: Dynamic Early Exiting for Accelerating Vision Transformer
LGViT: Dynamic Early Exiting for Accelerating Vision Transformer
Guanyu Xu
Jiawei Hao
Li Shen
Han Hu
Yong Luo
Hui Lin
J. Shen
16
15
0
01 Aug 2023
Language models as master equation solvers
Language models as master equation solvers
Chuanbo Liu
Jin Wang
18
0
0
29 Jul 2023
Fading memory as inductive bias in residual recurrent networks
Fading memory as inductive bias in residual recurrent networks
I. Dubinin
Felix Effenberger
36
4
0
27 Jul 2023
Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation
Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation
Hao Peng
Qingqing Cao
Jesse Dodge
Matthew E. Peters
Jared Fernandez
...
Darrell Plessas
Iz Beltagy
Evan Pete Walsh
Noah A. Smith
Hannaneh Hajishirzi
24
7
0
19 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
13
41
0
12 Jul 2023
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and
  Resolution
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Mostafa Dehghani
Basil Mustafa
Josip Djolonga
Jonathan Heek
Matthias Minderer
...
Avital Oliver
Piotr Padlewski
A. Gritsenko
Mario Luvcić
N. Houlsby
ViT
18
102
0
12 Jul 2023
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for
  Speech Recognition and Understanding
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding
Titouan Parcollet
Rogier van Dalen
Shucong Zhang
S. Bhattacharya
16
6
0
12 Jul 2023
ResMatch: Residual Attention Learning for Local Feature Matching
ResMatch: Residual Attention Learning for Local Feature Matching
Yu-Chih Deng
Jiayi Ma
3DPC
9
1
0
11 Jul 2023
Lost in the Middle: How Language Models Use Long Contexts
Lost in the Middle: How Language Models Use Long Contexts
Nelson F. Liu
Kevin Lin
John Hewitt
Ashwin Paranjape
Michele Bevilacqua
Fabio Petroni
Percy Liang
RALM
27
1,380
0
06 Jul 2023
Scaling In-Context Demonstrations with Structured Attention
Scaling In-Context Demonstrations with Structured Attention
Tianle Cai
Kaixuan Huang
Jason D. Lee
Mengdi Wang
LRM
23
8
0
05 Jul 2023
MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers
MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers
Jakob Drachmann Havtorn
Amelie Royer
Tijmen Blankevoort
B. Bejnordi
17
8
0
05 Jul 2023
Sumformer: Universal Approximation for Efficient Transformers
Sumformer: Universal Approximation for Efficient Transformers
Silas Alberti
Niclas Dern
L. Thesing
Gitta Kutyniok
11
16
0
05 Jul 2023
Previous
12345...111213
Next