Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.06732
Cited By
Efficient Transformers: A Survey
14 September 2020
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficient Transformers: A Survey"
50 / 633 papers shown
Title
A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts
Suyu Ge
Xihui Lin
Yunan Zhang
Jiawei Han
Hao Peng
31
4
0
02 Oct 2024
Tracking objects that change in appearance with phase synchrony
Sabine Muzellec
Drew Linsley
A. Ashok
E. Mingolla
Girik Malik
Rufin VanRullen
Thomas Serre
21
1
0
02 Oct 2024
Cottention: Linear Transformers With Cosine Attention
Gabriel Mongaras
Trevor Dohm
Eric C. Larson
24
0
0
27 Sep 2024
Efficiently Dispatching Flash Attention For Partially Filled Attention Masks
Agniv Sharma
Jonas Geiping
14
0
0
23 Sep 2024
A framework for measuring the training efficiency of a neural architecture
Eduardo Cueto-Mendoza
John D. Kelleher
38
0
0
12 Sep 2024
Synthetic continued pretraining
Zitong Yang
Neil Band
Shuangping Li
Emmanuel Candès
Tatsunori Hashimoto
CLL
SyDa
36
11
0
11 Sep 2024
In Defense of RAG in the Era of Long-Context Language Models
Tan Yu
Anbang Xu
Rama Akkiraju
RALM
3DV
19
24
0
03 Sep 2024
A Dual-Path neural network model to construct the flame nonlinear thermoacoustic response in the time domain
Jiawei Wu
Teng Wang
Jiaqi Nan
Lijun Yang
Jingxuan Li
AI4CE
14
0
0
26 Aug 2024
Domain-specific long text classification from sparse relevant information
Célia DĆruz
J. Bereder
Frédéric Precioso
Michel Riveill
16
0
0
23 Aug 2024
ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement
Eashan Adhikarla
Kai Zhang
John Nicholson
Brian D. Davison
Mamba
29
3
0
19 Aug 2024
HySem: A context length optimized LLM pipeline for unstructured tabular extraction
Narayanan PP
A. P. N. Iyer
36
0
0
18 Aug 2024
Post-Training Sparse Attention with Double Sparsity
Shuo Yang
Ying Sheng
Joseph E. Gonzalez
Ion Stoica
Lianmin Zheng
28
7
0
11 Aug 2024
Sampling Foundational Transformer: A Theoretical Perspective
Viet Anh Nguyen
Minh Lenhat
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong Son-Hy
42
0
0
11 Aug 2024
SAMSA: Efficient Transformer for Many Data Modalities
Minh Lenhat
Viet Anh Nguyen
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong Son-Hy
38
0
0
10 Aug 2024
NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time
Yilong Chen
Guoxia Wang
Junyuan Shang
Shiyao Cui
Zhenyu Zhang
Tingwen Liu
Shuohuan Wang
Yu Sun
Dianhai Yu
Hua-Hong Wu
24
14
0
07 Aug 2024
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Gagan Jain
Nidhi Hegde
Aditya Kusupati
Arsha Nagrani
Shyamal Buch
Prateek Jain
Anurag Arnab
Sujoy Paul
MoE
33
7
0
29 Jul 2024
Hybrid Dynamic Pruning: A Pathway to Efficient Transformer Inference
Ghadeer Jaradat
M. Tolba
Ghada Alsuhli
Hani Saleh
Mahmoud Al-Qutayri
Thanos Stouraitis
Baker Mohammad
29
0
0
17 Jul 2024
Exploring Quantization for Efficient Pre-Training of Transformer Language Models
Kamran Chitsaz
Quentin Fournier
Gonccalo Mordido
Sarath Chandar
MQ
41
3
0
16 Jul 2024
DeepGate3: Towards Scalable Circuit Representation Learning
Zhengyuan Shi
Ziyang Zheng
Sadaf Khan
Jianyuan Zhong
Min Li
Qiang Xu
GNN
AI4CE
24
8
0
15 Jul 2024
Graph Transformers: A Survey
Ahsan Shehzad
Feng Xia
Shagufta Abid
Ciyuan Peng
Shuo Yu
Dongyu Zhang
Karin Verspoor
AI4CE
29
9
0
13 Jul 2024
Accelerating the inference of string generation-based chemical reaction models for industrial applications
Mikhail Andronov
Natalia Andronova
Michael Wand
Jürgen Schmidhuber
Djork-Arné Clevert
AI4CE
23
3
0
12 Jul 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah
Ganesh Bikshandi
Ying Zhang
Vijay Thakkar
Pradeep Ramani
Tri Dao
48
112
0
11 Jul 2024
HDT: Hierarchical Document Transformer
Haoyu He
Markus Flicke
Jan Buchmann
Iryna Gurevych
Andreas Geiger
35
0
0
11 Jul 2024
Integer-only Quantized Transformers for Embedded FPGA-based Time-series Forecasting in AIoT
Tianheng Ling
Chao Qian
Gregor Schiele
AI4TS
MQ
16
1
0
06 Jul 2024
Wavelets Are All You Need for Autoregressive Image Generation
Wael Mattar
Idan Levy
Nir Sharon
S. Dekel
30
3
0
28 Jun 2024
Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads
Ali Khaleghi Rahimian
Manish Kumar Govind
Subhajit Maity
Dominick Reilly
Christian Kummerle
Srijan Das
A. Dutta
36
1
0
27 Jun 2024
Temporally Multi-Scale Sparse Self-Attention for Physical Activity Data Imputation
Hui Wei
Maxwell A. Xu
Colin Samplawski
James M. Rehg
Santosh Kumar
Benjamin M. Marlin
27
0
0
27 Jun 2024
Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings
Andrea Posada
Daniel Rueckert
Felix Meissen
Philip Muller
LM&MA
ELM
29
0
0
24 Jun 2024
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
Tianyu Fu
Haofeng Huang
Xuefei Ning
Genghan Zhang
Boju Chen
...
Shiyao Li
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MQ
44
16
0
21 Jun 2024
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Assaf Ben-Kish
Itamar Zimerman
Shady Abu Hussein
Nadav Cohen
Amir Globerson
Lior Wolf
Raja Giryes
Mamba
67
13
0
20 Jun 2024
Elliptical Attention
Stefan K. Nielsen
Laziz U. Abdullaev
R. Teo
Tan M. Nguyen
21
3
0
19 Jun 2024
From Text to Life: On the Reciprocal Relationship between Artificial Life and Large Language Models
Eleni Nisioti
Claire Glanois
Elias Najarro
Andrew Dai
Elliot Meyerson
J. Pedersen
Laetitia Teodorescu
Conor F. Hayes
Shyam Sudhakaran
Sebastian Risi
AI4CE
LM&Ro
35
2
0
14 Jun 2024
An Empirical Study of Mamba-based Language Models
R. Waleffe
Wonmin Byeon
Duncan Riach
Brandon Norick
V. Korthikanti
...
Vartika Singh
Jared Casper
Jan Kautz
M. Shoeybi
Bryan Catanzaro
54
62
0
12 Jun 2024
LoCoCo: Dropping In Convolutions for Long Context Compression
Ruisi Cai
Yuandong Tian
Zhangyang Wang
Beidi Chen
33
9
0
08 Jun 2024
Cryptocurrency Frauds for Dummies: How ChatGPT introduces us to fraud?
Wail Zellagui
Abdessamad Imine
Yamina Tadjeddine
26
0
0
05 Jun 2024
SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM
Quandong Wang
Yuxuan Yuan
Xiaoyu Yang
Ruike Zhang
Kang Zhao
Wei Liu
Jian Luan
Daniel Povey
Bin Wang
41
0
0
03 Jun 2024
ARCH2S: Dataset, Benchmark and Challenges for Learning Exterior Architectural Structures from Point Clouds
Ka Lung Cheung
Chi Chung Lee
3DV
3DPC
20
0
0
03 Jun 2024
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion
Yilong Chen
Linhao Zhang
Junyuan Shang
Zhenyu Zhang
Tingwen Liu
Shuohuan Wang
Yu Sun
25
1
0
03 Jun 2024
Automatic Graph Topology-Aware Transformer
Chao Wang
Jiaxuan Zhao
Lingling Li
Licheng Jiao
Fang Liu
Shuyuan Yang
ViT
22
2
0
30 May 2024
Learning to Continually Learn with the Bayesian Principle
Soochan Lee
Hyeonseong Jeon
Jaehyeon Son
Gunhee Kim
BDL
CLL
24
2
0
29 May 2024
Benchmarking General-Purpose In-Context Learning
Fan Wang
Chuan Lin
Yang Cao
Yu Kang
30
1
0
27 May 2024
Activator: GLU Activation Function as the Core Component of a Vision Transformer
Abdullah Nazhat Abdullah
Tarkan Aydin
ViT
28
0
0
24 May 2024
Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges
Jonas Becker
Jan Philip Wahle
Bela Gipp
Terry Ruas
18
9
0
24 May 2024
Spectraformer: A Unified Random Feature Framework for Transformer
Duke Nguyen
Aditya Joshi
Flora D. Salim
29
0
0
24 May 2024
A review on the use of large language models as virtual tutors
Silvia García-Méndez
Francisco de Arriba-Pérez
Maria del Carmen Lopez-Perez
LLMAG
3DV
AI4Ed
VLM
KELM
19
16
0
20 May 2024
LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions
Victor Agostinelli
Sanghyun Hong
Lizhong Chen
KELM
27
1
0
18 May 2024
A Survey on Transformers in NLP with Focus on Efficiency
Wazib Ansar
Saptarsi Goswami
Amlan Chakrabarti
MedIm
27
2
0
15 May 2024
MambaOut: Do We Really Need Mamba for Vision?
Weihao Yu
Xinchao Wang
Mamba
39
46
0
13 May 2024
What makes Models Compositional? A Theoretical View: With Supplement
Parikshit Ram
Tim Klinger
Alexander G. Gray
CoGe
34
6
0
02 May 2024
Optimizing BioTac Simulation for Realistic Tactile Perception
W. Z. E. Amri
Nicolás Navarro
19
2
0
16 Apr 2024
Previous
1
2
3
4
5
...
11
12
13
Next