Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1905.07799
Cited By
v1
v2 (latest)
Adaptive Attention Span in Transformers
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
19 May 2019
Sainbayar Sukhbaatar
Edouard Grave
Piotr Bojanowski
Armand Joulin
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Adaptive Attention Span in Transformers"
50 / 201 papers shown
Sparse Meta Networks for Sequential Adaptation and its Application to Adaptive Language Modelling
Tsendsuren Munkhdalai
CLL
OffRL
211
5
0
03 Sep 2020
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Albert Gu
Tri Dao
Stefano Ermon
Atri Rudra
Christopher Ré
420
822
0
17 Aug 2020
Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size
Davis Yoshida
Allyson Ettinger
Kevin Gimpel
AI4CE
203
7
0
16 Aug 2020
Big Bird: Transformers for Longer Sequences
Neural Information Processing Systems (NeurIPS), 2020
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
1.3K
2,554
0
28 Jul 2020
Spatially Aware Multimodal Transformers for TextVQA
European Conference on Computer Vision (ECCV), 2020
Yash Kant
Dhruv Batra
Peter Anderson
Alex Schwing
Devi Parikh
Jiasen Lu
Harsh Agrawal
209
94
0
23 Jul 2020
Conformer-Kernel with Query Term Independence for Document Retrieval
Bhaskar Mitra
Sebastian Hofstatter
Hamed Zamani
Nick Craswell
179
22
0
20 Jul 2020
Fast Transformers with Clustered Attention
Neural Information Processing Systems (NeurIPS), 2020
Apoorv Vyas
Angelos Katharopoulos
Franccois Fleuret
290
172
0
09 Jul 2020
Do Transformers Need Deep Long-Range Memory
Jack W. Rae
Ali Razavi
RALM
241
43
0
07 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
418
169
0
30 Jun 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
735
2,350
0
29 Jun 2020
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions
Stephen Roller
Y-Lan Boureau
Jason Weston
Antoine Bordes
Emily Dinan
...
Kurt Shuster
Eric Michael Smith
Arthur Szlam
Jack Urbanek
Mary Williamson
LLMAG
AI4CE
237
60
0
22 Jun 2020
Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers
Tsung-Han Wu
Chun-Chen Hsieh
Yen-Hao Chen
Po-Han Chi
Hung-yi Lee
237
1
0
09 Jun 2020
O
(
n
)
O(n)
O
(
n
)
Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Chulhee Yun
Yin-Wen Chang
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
235
94
0
08 Jun 2020
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Hanrui Wang
Zhanghao Wu
Zhijian Liu
Han Cai
Ligeng Zhu
Chuang Gan
Song Han
270
281
0
28 May 2020
Adaptive Transformers for Learning Multimodal Representations
Prajjwal Bhargava
117
5
0
15 May 2020
A Mixture of
h
−
1
h-1
h
−
1
Heads is Better than
h
h
h
Heads
Hao Peng
Roy Schwartz
Dianqi Li
Noah A. Smith
MoE
176
35
0
13 May 2020
Multi-scale Transformer Language Models
Sandeep Subramanian
R. Collobert
MarcÁurelio Ranzato
Y-Lan Boureau
143
18
0
01 May 2020
Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching
International Conference on Information and Knowledge Management (CIKM), 2020
Liu Yang
Mingyang Zhang
Cheng Li
Michael Bendersky
Marc Najork
268
96
0
26 Apr 2020
Lite Transformer with Long-Short Range Attention
International Conference on Learning Representations (ICLR), 2020
Zhanghao Wu
Zhijian Liu
Ji Lin
Chengyue Wu
Song Han
186
367
0
24 Apr 2020
On Sparsifying Encoder Outputs in Sequence-to-Sequence Models
Findings (Findings), 2020
Biao Zhang
Ivan Titov
Rico Sennrich
100
14
0
24 Apr 2020
Vector Quantized Contrastive Predictive Coding for Template-based Music Generation
Gaëtan Hadjeres
Léopold Crestel
204
19
0
21 Apr 2020
Adaptive Attention Span in Computer Vision
Jerrod Parker
Shakti Kumar
Joe Roussy
ViT
VLM
50
2
0
18 Apr 2020
ETC: Encoding Long and Structured Inputs in Transformers
Joshua Ainslie
Santiago Ontanon
Chris Alberti
Vaclav Cvicek
Zachary Kenneth Fisher
Philip Pham
Anirudh Ravula
Sumit Sanghai
Qifan Wang
Li Yang
318
56
0
17 Apr 2020
Training with Quantization Noise for Extreme Model Compression
International Conference on Learning Representations (ICLR), 2020
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Edouard Grave
Armand Joulin
MQ
297
257
0
15 Apr 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
715
4,928
0
10 Apr 2020
Adaptive Transformers in RL
Shakti Kumar
Jerrod Parker
Panteha Naderian
OffRL
AI4CE
91
17
0
08 Apr 2020
SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection
Neural Information Processing Systems (NeurIPS), 2020
Xiaoya Li
Yuxian Meng
Mingxin Zhou
Qinghong Han
Leilei Gan
Jiwei Li
286
22
0
22 Mar 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Transactions of the Association for Computational Linguistics (TACL), 2020
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
994
693
0
12 Mar 2020
Meta-Embeddings Based On Self-Attention
Qichen Li
Xiaoke Jiang
Jun Xia
Jian Li
159
2
0
03 Mar 2020
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
Findings (Findings), 2020
Alessandro Raganato
Yves Scherrer
Jörg Tiedemann
383
96
0
24 Feb 2020
Addressing Some Limitations of Transformers with Feedback Memory
Angela Fan
Thibaut Lavril
Edouard Grave
Armand Joulin
Sainbayar Sukhbaatar
197
11
0
21 Feb 2020
Reformer: The Efficient Transformer
International Conference on Learning Representations (ICLR), 2020
Nikita Kitaev
Lukasz Kaiser
Anselm Levskaya
VLM
634
2,732
0
13 Jan 2020
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
192
139
0
25 Dec 2019
Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models
Computer Vision and Pattern Recognition (CVPR), 2019
Giannis Daras
Augustus Odena
Han Zhang
A. Dimakis
225
61
0
27 Nov 2019
Single Headed Attention RNN: Stop Thinking With Your Head
Stephen Merity
254
70
0
26 Nov 2019
Pre-Training of Deep Bidirectional Protein Sequence Representations with Structural Information
IEEE Access (IEEE Access), 2019
Seonwoo Min
Seunghyun Park
Siwon Kim
Hyun-Soo Choi
Byunghan Lee
Sungroh Yoon
SSL
337
63
0
25 Nov 2019
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
Guangxiang Zhao
Xu Sun
Jingjing Xu
Zhiyuan Zhang
Liangchen Luo
LRM
181
60
0
17 Nov 2019
Compressive Transformers for Long-Range Sequence Modelling
International Conference on Learning Representations (ICLR), 2019
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
311
778
0
13 Nov 2019
BP-Transformer: Modelling Long-Range Context via Binary Partitioning
Zihao Ye
Qipeng Guo
Quan Gan
Xipeng Qiu
Zheng Zhang
221
83
0
11 Nov 2019
Two-Headed Monster And Crossed Co-Attention Networks
Yaoyiran Li
Jing Jiang
150
0
0
10 Nov 2019
Location Attention for Extrapolation to Longer Sequences
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
Yann Dubois
Gautier Dagan
Dieuwke Hupkes
Elia Bruni
221
46
0
10 Nov 2019
Improving Transformer Models by Reordering their Sublayers
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
Ofir Press
Noah A. Smith
Omer Levy
169
93
0
10 Nov 2019
Blockwise Self-Attention for Long Document Understanding
Findings (Findings), 2019
J. Qiu
Hao Ma
Omer Levy
Scott Yih
Sinong Wang
Jie Tang
321
269
0
07 Nov 2019
Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Angela Fan
Claire Gardent
Chloé Braud
Antoine Bordes
186
107
0
18 Oct 2019
When and Why is Document-level Context Useful in Neural Machine Translation?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Yunsu Kim
Thanh-Hai Tran
Hermann Ney
171
93
0
01 Oct 2019
Reducing Transformer Depth on Demand with Structured Dropout
International Conference on Learning Representations (ICLR), 2019
Angela Fan
Edouard Grave
Armand Joulin
636
662
0
25 Sep 2019
Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Jie Hao
Xing Wang
Shuming Shi
Jinfeng Zhang
Zhaopeng Tu
184
12
0
04 Sep 2019
Self-Attention with Structural Position Representations
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Xing Wang
Zhaopeng Tu
Longyue Wang
Shuming Shi
MILM
188
75
0
01 Sep 2019
Adaptively Sparse Transformers
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Gonçalo M. Correia
Vlad Niculae
André F. T. Martins
352
280
0
30 Aug 2019
Augmenting Self-attention with Persistent Memory
Sainbayar Sukhbaatar
Edouard Grave
Guillaume Lample
Edouard Grave
Armand Joulin
RALM
KELM
231
152
0
02 Jul 2019
Previous
1
2
3
4
5
Next
Page 4 of 5