v1v2 (latest)

Adaptive Attention Span in Transformers

Annual Meeting of the Association for Computational Linguistics (ACL), 2019

19 May 2019

Papers citing "Adaptive Attention Span in Transformers"

50 / 201 papers shown

A Quantitative Review on Language Model Efficiency Research

Meng Jiang

Hy Dang

Lingbo Tong

206

28 May 2023

Landmark Attention: Random-Access Infinite Context Length for TransformersNeural Information Processing Systems (NeurIPS), 2023

Amirkeivan Mohtashami

Martin Jaggi

LLMAG

344

197

25 May 2023

Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT OperatorAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

346

24 May 2023

Leveraging Synthetic Targets for Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Sarthak Mittal

Oleksii Hrinchuk

Oleksii Kuchaiev

147

07 May 2023

Leveraging BERT Language Model for Arabic Long Document Classification

Muhammad Al-Qurishi

182

04 May 2023

Improving Autoregressive NLP Tasks via Modular Linearized Attention

Victor Agostinelli

Lizhong Chen

290

17 Apr 2023

Accelerating Trajectory Generation for Quadrotors Using TransformersConference on Learning for Dynamics & Control (L4DC), 2023

Srinath Tankasala

Mitch Pryor

125

27 Mar 2023

Real-time speech enhancement with dynamic attention spanIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

187

21 Feb 2023

HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial ImagesIsprs Journal of Photogrammetry and Remote Sensing (ISPRS J. Photogramm. Remote Sens.), 2023

Kun Li

G. Vosselman

M. Yang

219

23 Jan 2023

AttMEMO : Accelerating Transformers with Memoization on Big Memory Systems

230

23 Jan 2023

Cramming: Training a Language Model on a Single GPU in One DayInternational Conference on Machine Learning (ICML), 2022

Jonas Geiping

Tom Goldstein

MoE

276

103

28 Dec 2022

EIT: Enhanced Interactive TransformerAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Jingbo Zhu

295

20 Dec 2022

Convolution-enhanced Evolving Attention NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

Yujing Wang

Xiangtai Li

Gao Huang

309

16 Dec 2022

Efficient Long Sequence Modeling via State Space Augmented Transformer

Xiaodong Liu

332

15 Dec 2022

Transformers for End-to-End InfoSec Tasks: A Feasibility Study

Ethan M. Rudd

Mohammad Saidur Rahman

Philip Tully

212

05 Dec 2022

Fast Inference from Transformers via Speculative DecodingInternational Conference on Machine Learning (ICML), 2022

Yaniv Leviathan

Matan Kalman

Yossi Matias

LRM

688

1,191

30 Nov 2022

Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating AttentionConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Wenhao Li

Xiaoyuan Yi

Jinyi Hu

Maosong Sun

Xing Xie

241

14 Nov 2022

Efficiently Scaling Transformer InferenceConference on Machine Learning and Systems (MLSys), 2022

351

492

09 Nov 2022

Conversation-oriented ASR with multi-look-ahead CBS architectureIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

247

02 Nov 2022

Salience Allocation as Guidance for Abstractive SummarizationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Fei Wang

Wenlin Yao

180

22 Oct 2022

Breaking BERT: Evaluating and Optimizing Sparsified Attention

Siddhartha Brahma

Polina Zablotskaia

David M. Mimno

163

07 Oct 2022

Document-aware Positional Encoding and Linguistic-guided Encoding for Abstractive Multi-document SummarizationIEEE International Joint Conference on Neural Network (IJCNN), 2022

Congbo Ma

Wei Emma Zhang

Pitawelayalage Dasun Dileepa Pitawela

Yutong Qu

Haojie Zhuang

Hu Wang

262

13 Sep 2022

Horizontal and Vertical Attention in Transformers

Litao Yu

Shuai Liu

ViT

148

10 Jul 2022

Efficient Representation Learning via Adaptive Context PoolingInternational Conference on Machine Learning (ICML), 2022

206

05 Jul 2022

Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction

Jun Chen

Ming Hu

Boyang Albert Li

Mohamed Elhoseiny

341

01 Jun 2022

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessNeural Information Processing Systems (NeurIPS), 2022

851

3,482

27 May 2022

X-ViT: High Performance Linear Vision Transformer without Softmax

Jeonggeun Song

Heung-Chang Lee

ViT

111

27 May 2022

Training Language Models with Memory AugmentationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

744

145

25 May 2022

Adaptable AdaptersNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

203

03 May 2022

A survey on attention mechanisms for medical applications: are we moving towards better algorithms?IEEE Access (IEEE Access), 2022

214

26 Apr 2022

Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language TasksIEEE Transactions on Image Processing (IEEE TIP), 2022

Liujuan Cao

Yongjian Wu

Feiyue Huang

Rongrong Ji

ViT

158

16 Apr 2022

LaMemo: Language Modeling with Look-Ahead MemoryNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

168

15 Apr 2022

A Call for Clarity in Beam Search: How It Works and When It StopsInternational Conference on Language Resources and Evaluation (LREC), 2022

Keisuke Sakaguchi

Yejin Choi

295

11 Apr 2022

COOL, a Context Outlooker, and its Application to Question Answering and other Natural Language Processing TasksInternational Joint Conference on Artificial Intelligence (IJCAI), 2022

Fangyi Zhu

See-Kiong Ng

S. Bressan

LRM

164

01 Apr 2022

Linearizing Transformer with Key-Value MemoryConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Yizhe Zhang

Deng Cai

326

23 Mar 2022

DuMLP-Pin: A Dual-MLP-dot-product Permutation-invariant Network for Set Feature ExtractionAAAI Conference on Artificial Intelligence (AAAI), 2022

Zhidong Deng

248

08 Mar 2022

Mukayese: Turkish NLP Strikes BackFindings (Findings), 2022

239

02 Mar 2022

Benchmark Assessment for DeepSpeed Optimization Library

G. Liang

I. Alsmadi

168

12 Feb 2022

Learning strides in convolutional neural networksInternational Conference on Learning Representations (ICLR), 2022

164

03 Feb 2022

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video RecognitionComputer Vision and Pattern Recognition (CVPR), 2022

Christoph Feichtenhofer

ViT

489

245

20 Jan 2022

SMDT: Selective Memory-Augmented Neural Document Translation

Jian Yang

121

05 Jan 2022

Adaptive Token Sampling For Efficient Vision Transformers

Mohsen Fayyaz

Soroush Abbasi Koohpayegani

F. Jafari

Sunando Sengupta

Hamid Reza Vaezi Joze

Eric Sommerlade

Hamed Pirsiavash

Juergen Gall

ViT

379

222

30 Nov 2021

Sparse is Enough in Scaling Transformers

Henryk Michalewski

160

120

24 Nov 2021

Local Multi-Head Channel Self-Attention for Facial Expression Recognition

316

14 Nov 2021

Scatterbrain: Unifying Sparse and Low-rank Attention ApproximationNeural Information Processing Systems (NeurIPS), 2021

177

152

28 Oct 2021

Hierarchical Transformers Are More Efficient Language Models

Henryk Michalewski

296

26 Oct 2021

An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASRAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021

20 Oct 2021

GNN-LM: Language Modeling based on Global Contexts via GNN

Jiwei Li

544

17 Oct 2021

Efficient Training of Audio Transformers with PatchoutInterspeech (Interspeech), 2021

552

357

11 Oct 2021

Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

126

07 Oct 2021