v1v2 (latest)

Adaptive Attention Span in Transformers

Annual Meeting of the Association for Computational Linguistics (ACL), 2019

19 May 2019

Papers citing "Adaptive Attention Span in Transformers"

50 / 201 papers shown

Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling

Tsendsuren Munkhdalai

CLL OffRL

211

03 Sep 2020

HiPPO: Recurrent Memory with Optimal Polynomial Projections

420

822

17 Aug 2020

Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size

203

16 Aug 2020

Big Bird: Transformers for Longer SequencesNeural Information Processing Systems (NeurIPS), 2020

Joshua Ainslie

...

1.3K

2,554

28 Jul 2020

Spatially Aware Multimodal Transformers for TextVQAEuropean Conference on Computer Vision (ECCV), 2020

Devi Parikh

209

23 Jul 2020

Conformer-Kernel with Query Term Independence for Document Retrieval

179

20 Jul 2020

Fast Transformers with Clustered AttentionNeural Information Processing Systems (NeurIPS), 2020

Apoorv Vyas

Angelos Katharopoulos

Franccois Fleuret

290

172

09 Jul 2020

Do Transformers Need Deep Long-Range Memory

Jack W. Rae

Ali Razavi

RALM

241

07 Jul 2020

Data Movement Is All You Need: A Case Study on Optimizing Transformers

418

169

30 Jun 2020

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

Angelos Katharopoulos

Apoorv Vyas

Nikolaos Pappas

Franccois Fleuret

735

2,350

29 Jun 2020

Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions

Jason Weston

...

237

22 Jun 2020

Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers

237

09 Jun 2020

O(n)

Connections are Expressive Enough: Universal Approximability of Sparse Transformers

Srinadh Bhojanapalli

Sanjiv Kumar

235

08 Jun 2020

HAT: Hardware-Aware Transformers for Efficient Natural Language ProcessingAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

Zhijian Liu

Chuang Gan

Song Han

270

281

28 May 2020

Adaptive Transformers for Learning Multimodal Representations

Prajjwal Bhargava

117

15 May 2020

A Mixture of

h-1

Heads is Better than

h

Hao Peng

176

13 May 2020

Multi-scale Transformer Language Models

143

01 May 2020

Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document MatchingInternational Conference on Information and Knowledge Management (CIKM), 2020

268

26 Apr 2020

Lite Transformer with Long-Short Range AttentionInternational Conference on Learning Representations (ICLR), 2020

Zhanghao Wu

Zhijian Liu

Ji Lin

Chengyue Wu

Song Han

186

367

24 Apr 2020

On Sparsifying Encoder Outputs in Sequence-to-Sequence ModelsFindings (Findings), 2020

Biao Zhang

Ivan Titov

Rico Sennrich

100

24 Apr 2020

Vector Quantized Contrastive Predictive Coding for Template-based Music Generation

Gaëtan Hadjeres

Léopold Crestel

204

21 Apr 2020

Adaptive Attention Span in Computer Vision

18 Apr 2020

ETC: Encoding Long and Structured Inputs in Transformers

Joshua Ainslie

Santiago Ontanon

Chris Alberti

Vaclav Cvicek

Zachary Kenneth Fisher

Sumit Sanghai

318

17 Apr 2020

Training with Quantization Noise for Extreme Model CompressionInternational Conference on Learning Representations (ICLR), 2020

Angela Fan

Benjamin Graham

Remi Gribonval

297

257

15 Apr 2020

Longformer: The Long-Document Transformer

Iz Beltagy

Matthew E. Peters

Arman Cohan

RALM VLM

715

4,928

10 Apr 2020

Adaptive Transformers in RL

08 Apr 2020

SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive ConnectionNeural Information Processing Systems (NeurIPS), 2020

Jiwei Li

286

22 Mar 2020

Efficient Content-Based Sparse Attention with Routing TransformersTransactions of the Association for Computational Linguistics (TACL), 2020

994

693

12 Mar 2020

Meta-Embeddings Based On Self-Attention

159

03 Mar 2020

Fixed Encoder Self-Attention Patterns in Transformer-Based Machine TranslationFindings (Findings), 2020

Alessandro Raganato

Yves Scherrer

Jörg Tiedemann

383

24 Feb 2020

Addressing Some Limitations of Transformers with Feedback Memory

Angela Fan

197

21 Feb 2020

Reformer: The Efficient TransformerInternational Conference on Learning Representations (ICLR), 2020

634

2,732

13 Jan 2020

Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection

Xuancheng Ren

192

139

25 Dec 2019

Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative ModelsComputer Vision and Pattern Recognition (CVPR), 2019

225

27 Nov 2019

Single Headed Attention RNN: Stop Thinking With Your Head

Stephen Merity

254

26 Nov 2019

Pre-Training of Deep Bidirectional Protein Sequence Representations with Structural InformationIEEE Access (IEEE Access), 2019

337

25 Nov 2019

MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning

181

17 Nov 2019

Compressive Transformers for Long-Range Sequence ModellingInternational Conference on Learning Representations (ICLR), 2019

Jack W. Rae

Anna Potapenko

Siddhant M. Jayakumar

Timothy Lillicrap

RALM VLM KELM

311

778

13 Nov 2019

BP-Transformer: Modelling Long-Range Context via Binary Partitioning

Zihao Ye

Qipeng Guo

Quan Gan

Xipeng Qiu

Zheng Zhang

221

11 Nov 2019

Two-Headed Monster And Crossed Co-Attention Networks

Yaoyiran Li

Jing Jiang

150

10 Nov 2019

Location Attention for Extrapolation to Longer SequencesAnnual Meeting of the Association for Computational Linguistics (ACL), 2019

221

10 Nov 2019

Improving Transformer Models by Reordering their SublayersAnnual Meeting of the Association for Computational Linguistics (ACL), 2019

Ofir Press

Noah A. Smith

Omer Levy

169

10 Nov 2019

Blockwise Self-Attention for Long Document UnderstandingFindings (Findings), 2019

Hao Ma

Sinong Wang

321

269

07 Nov 2019

Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document InputsConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Angela Fan

Claire Gardent

Chloé Braud

Antoine Bordes

186

107

18 Oct 2019

When and Why is Document-level Context Useful in Neural Machine Translation?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Yunsu Kim

Thanh-Hai Tran

Hermann Ney

171

01 Oct 2019

Reducing Transformer Depth on Demand with Structured DropoutInternational Conference on Learning Representations (ICLR), 2019

Angela Fan

Edouard Grave

Armand Joulin

636

662

25 Sep 2019

Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered NeuronsConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

184

04 Sep 2019

Self-Attention with Structural Position RepresentationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

188

01 Sep 2019

Adaptively Sparse TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Gonçalo M. Correia

Vlad Niculae

André F. T. Martins

352

280

30 Aug 2019

Augmenting Self-attention with Persistent Memory

231

152

02 Jul 2019