Efficient Transformers: A Survey

14 September 2020

Papers citing "Efficient Transformers: A Survey"

50 / 633 papers shown

Title
Hardness of Low Rank Approximation of Entrywise Transformed Matrix Products Tamás Sarlós Xingyou Song David P. Woodruff Qiuyi Qiuyi Zhang 26 3 0 03 Nov 2023
ForecastPFN: Synthetically-Trained Zero-Shot Forecasting Samuel Dooley Gurnoor Singh Khurana Chirag Mohapatra Siddartha Naidu Colin White AI4TS 79 56 0 03 Nov 2023
PAUMER: Patch Pausing Transformer for Semantic Segmentation Evann Courdier Prabhu Teja Sivaprasad F. Fleuret 24 2 0 01 Nov 2023
SpecTr: Fast Speculative Decoding via Optimal Transport Ziteng Sun A. Suresh Jae Hun Ro Ahmad Beirami Himanshu Jain Felix X. Yu 40 64 0 23 Oct 2023
Transformers for Trajectory Optimization with Application to Spacecraft Rendezvous T. Guffanti Daniele Gammelli Simone DÁmico Marco Pavone 32 14 0 20 Oct 2023
Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences Yanming Kang Giang Tran H. Sterck 16 3 0 18 Oct 2023
Recasting Continual Learning as Sequence Modeling Soochan Lee Jaehyeon Son Gunhee Kim CLL 16 9 0 18 Oct 2023
Learning to Rank Context for Named Entity Recognition Using a Synthetic Dataset Arthur Amalvy Vincent Labatut Richard Dufour 23 9 0 16 Oct 2023
Is attention required for ICL? Exploring the Relationship Between Model Architecture and In-Context Learning Ability Ivan Lee Nan Jiang Taylor Berg-Kirkpatrick 23 12 0 12 Oct 2023
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention Huiyin Xue Nikolaos Aletras 23 0 0 11 Oct 2023
Task-Adaptive Tokenization: Enhancing Long-Form Text Generation Efficacy in Mental Health and Beyond Siyang Liu Naihao Deng Sahand Sabour Yilin Jia Minlie Huang Rada Mihalcea 20 18 0 09 Oct 2023
Retrieval meets Long Context Large Language Models Peng-Tao Xu Wei Ping Xianchao Wu Lawrence C. McAfee Chen Zhu Zihan Liu Sandeep Subramanian Evelina Bakhturina M. Shoeybi Bryan Catanzaro RALM LRM 9 79 0 04 Oct 2023
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors Ido Amos Jonathan Berant Ankit Gupta 20 24 0 04 Oct 2023
Nugget: Neural Agglomerative Embeddings of Text Guanghui Qin Benjamin Van Durme 22 18 0 03 Oct 2023
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels Praneeth Kacham Vahab Mirrokni Peilin Zhong 25 7 0 02 Oct 2023
Efficient Streaming Language Models with Attention Sinks Michel Lang Yuandong Tian Beidi Chen Song Han Mike Lewis AI4TS RALM 16 629 0 29 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization Albert Mohwald 24 15 0 28 Sep 2023
$Only 5\% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation$ Only 5\% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation Zihan Liu Zewei Sun Shanbo Cheng Shujian Huang Mingxuan Wang 18 1 0 25 Sep 2023
On Sparse Modern Hopfield Model Jerry Yao-Chieh Hu Donglin Yang Dennis Wu Chenwei Xu Bo-Yu Chen Han Liu VLM 28 28 0 22 Sep 2023
SPION: Layer-Wise Sparse Training of Transformer via Convolutional Flood Filling Bokyeong Yoon Yoonsang Han Gordon Euhyun Moon 11 0 0 22 Sep 2023
TiBGL: Template-induced Brain Graph Learning for Functional Neuroimaging Analysis Xiangzhu Meng Wei Wei Qiang Liu Shu Wu Liang Wang 14 0 0 14 Sep 2023
RT-LM: Uncertainty-Aware Resource Management for Real-Time Inference of Language Models Yufei Li Zexin Li Wei Yang Cong Liu 16 6 0 12 Sep 2023
Content Reduction, Surprisal and Information Density Estimation for Long Documents Shaoxiong Ji Wei Sun Pekka Marttinen 12 2 0 12 Sep 2023
Mobile Vision Transformer-based Visual Object Tracking Goutam Yelluru Gopal Maria A. Amer 16 5 0 11 Sep 2023
Curve Your Attention: Mixed-Curvature Transformers for Graph Representation Learning Sungjun Cho Seunghyuk Cho Sungwoo Park Hankook Lee Ho Hin Lee Moontae Lee 12 6 0 08 Sep 2023
Language Models for Novelty Detection in System Call Traces Quentin Fournier Daniel Aloise Leandro R. Costa AI4TS 11 4 0 05 Sep 2023
A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking Lorenzo Papa Paolo Russo Irene Amerini Luping Zhou 14 39 0 05 Sep 2023
Patient-specific, mechanistic models of tumor growth incorporating artificial intelligence and big data G. Lorenzo Syed Rakin Ahmed D. Hormuth Brenna Vaughn Jayashree Kalpathy-Cramer Luis Solorio T. Yankeelov H. Gómez AI4CE 19 13 0 28 Aug 2023
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding Yushi Bai Xin Lv Jiajie Zhang Hong Lyu Jiankai Tang ... Aohan Zeng Lei Hou Yuxiao Dong Jie Tang Juanzi Li LLMAG RALM 26 486 0 28 Aug 2023
Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models Jan Warchocki Teodor Oprescu Yunhan Wang Alexandru Damacus Paul Misterka Robert-Jan Bruintjes A. Lengyel Ombretta Strafforello J. C. V. Gemert 11 2 0 24 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook S. Latif Moazzam Shoukat Fahad Shamshad Muhammad Usama Yi Ren ... Wenwu Wang Xulong Zhang Roberto Togneri Erik Cambria Björn W. Schuller LM&MA AuLLM 29 36 0 24 Aug 2023
How Much Temporal Long-Term Context is Needed for Action Segmentation? Emad Bahrami Rad Gianpiero Francesca Juergen Gall ViT 11 24 0 22 Aug 2023
A Lightweight Transformer for Faster and Robust EBSD Data Collection Harry Dong S. Donegan M. Shah Yuejie Chi 11 2 0 18 Aug 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers Tobias Christian Nauen Sebastián M. Palacio Federico Raue Andreas Dengel 35 3 0 18 Aug 2023
Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals Running Zhao Jiang-Tao Luca Yu H. Zhao Edith C. H. Ngai 11 4 0 16 Aug 2023
Attention Is Not All You Need Anymore Zhe Chen 14 3 0 15 Aug 2023
Optimizing a Transformer-based network for a deep learning seismic processing workflow R. Harsuko T. Alkhalifah 17 8 0 09 Aug 2023
Sparse Binary Transformers for Multivariate Time Series Modeling Matt Gorbett Hossein Shirazi I. Ray AI4TS 17 13 0 09 Aug 2023
LGViT: Dynamic Early Exiting for Accelerating Vision Transformer Guanyu Xu Jiawei Hao Li Shen Han Hu Yong Luo Hui Lin J. Shen 16 15 0 01 Aug 2023
Language models as master equation solvers Chuanbo Liu Jin Wang 18 0 0 29 Jul 2023
Fading memory as inductive bias in residual recurrent networks I. Dubinin Felix Effenberger 36 4 0 27 Jul 2023
Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation Hao Peng Qingqing Cao Jesse Dodge Matthew E. Peters Jared Fernandez ... Darrell Plessas Iz Beltagy Evan Pete Walsh Noah A. Smith Hannaneh Hajishirzi 24 7 0 19 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models Jean Kaddour Oscar Key Piotr Nawrot Pasquale Minervini Matt J. Kusner 13 41 0 12 Jul 2023
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution Mostafa Dehghani Basil Mustafa Josip Djolonga Jonathan Heek Matthias Minderer ... Avital Oliver Piotr Padlewski A. Gritsenko Mario Luvcić N. Houlsby ViT 18 102 0 12 Jul 2023
SummaryMixing: A Linear-Complexity Alternative to Self-Attention for Speech Recognition and Understanding Titouan Parcollet Rogier van Dalen Shucong Zhang S. Bhattacharya 16 6 0 12 Jul 2023
ResMatch: Residual Attention Learning for Local Feature Matching Yu-Chih Deng Jiayi Ma 3DPC 9 1 0 11 Jul 2023
Lost in the Middle: How Language Models Use Long Contexts Nelson F. Liu Kevin Lin John Hewitt Ashwin Paranjape Michele Bevilacqua Fabio Petroni Percy Liang RALM 27 1,380 0 06 Jul 2023
Scaling In-Context Demonstrations with Structured Attention Tianle Cai Kaixuan Huang Jason D. Lee Mengdi Wang LRM 23 8 0 05 Jul 2023
MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers Jakob Drachmann Havtorn Amelie Royer Tijmen Blankevoort B. Bejnordi 17 8 0 05 Jul 2023
Sumformer: Universal Approximation for Efficient Transformers Silas Alberti Niclas Dern L. Thesing Gitta Kutyniok 11 16 0 05 Jul 2023