Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1904.10509
Cited By
Generating Long Sequences with Sparse Transformers
23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Generating Long Sequences with Sparse Transformers"
50 / 1,283 papers shown
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
AAML
618
3,417
0
05 Jun 2020
Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Peter Hawkins
Jared Davis
David Belanger
Lucy J. Colwell
Adrian Weller
359
93
0
05 Jun 2020
GMAT: Global Memory Augmentation for Transformers
Ankit Gupta
Jonathan Berant
RALM
175
52
0
05 Jun 2020
Language Models are Few-Shot Learners
Neural Information Processing Systems (NeurIPS), 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
2.0K
52,836
0
28 May 2020
The Cascade Transformer: an Application for Efficient Answer Sentence Selection
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Luca Soldaini
Alessandro Moschitti
172
45
0
05 May 2020
A Simple Language Model for Task-Oriented Dialogue
Neural Information Processing Systems (NeurIPS), 2020
Ehsan Hosseini-Asl
Bryan McCann
Chien-Sheng Wu
Semih Yavuz
R. Socher
613
557
0
02 May 2020
Synthesizer: Rethinking Self-Attention in Transformer Models
International Conference on Machine Learning (ICML), 2020
Yi Tay
Dara Bahri
Donald Metzler
Da-Cheng Juan
Zhe Zhao
Che Zheng
299
382
0
02 May 2020
Multi-scale Transformer Language Models
Sandeep Subramanian
R. Collobert
MarcÁurelio Ranzato
Y-Lan Boureau
142
18
0
01 May 2020
Incremental Neural Coreference Resolution in Constant Memory
Patrick Xia
João Sedoc
Benjamin Van Durme
CLL
176
3
0
30 Apr 2020
Jukebox: A Generative Model for Music
Prafulla Dhariwal
Heewoo Jun
Christine Payne
Jong Wook Kim
Alec Radford
Ilya Sutskever
VLM
560
902
0
30 Apr 2020
Multiresolution and Multimodal Speech Recognition with Transformers
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Georgios Paraskevopoulos
Srinivas Parthasarathy
Aparna Khare
Shiva Sundaram
216
29
0
29 Apr 2020
Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching
International Conference on Information and Knowledge Management (CIKM), 2020
Liu Yang
Mingyang Zhang
Cheng Li
Michael Bendersky
Marc Najork
265
95
0
26 Apr 2020
Lite Transformer with Long-Short Range Attention
International Conference on Learning Representations (ICLR), 2020
Zhanghao Wu
Zhijian Liu
Ji Lin
Chengyue Wu
Song Han
181
364
0
24 Apr 2020
On Sparsifying Encoder Outputs in Sequence-to-Sequence Models
Findings (Findings), 2020
Biao Zhang
Ivan Titov
Rico Sennrich
97
14
0
24 Apr 2020
Vector Quantized Contrastive Predictive Coding for Template-based Music Generation
Gaëtan Hadjeres
Léopold Crestel
197
19
0
21 Apr 2020
A Spatio-temporal Transformer for 3D Human Motion Prediction
International Conference on 3D Vision (3DV), 2020
Emre Aksan
Manuel Kaufmann
Peng Cao
Otmar Hilliges
ViT
395
278
0
18 Apr 2020
ETC: Encoding Long and Structured Inputs in Transformers
Joshua Ainslie
Santiago Ontanon
Chris Alberti
Vaclav Cvicek
Zachary Kenneth Fisher
Philip Pham
Anirudh Ravula
Sumit Sanghai
Qifan Wang
Li Yang
318
56
0
17 Apr 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
695
4,928
0
10 Apr 2020
Hierarchical Opacity Propagation for Image Matting
Yaoyi Li
Qin Xu
Hongtao Lu
181
14
0
07 Apr 2020
Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences
AAAI Conference on Artificial Intelligence (AAAI), 2020
Andis Draguns
Emīls Ozoliņš
A. Sostaks
Matiss Apinis
Kārlis Freivalds
288
10
0
06 Apr 2020
SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection
Neural Information Processing Systems (NeurIPS), 2020
Xiaoya Li
Yuxian Meng
Mingxin Zhou
Qinghong Han
Leilei Gan
Jiwei Li
283
22
0
22 Mar 2020
Cross-Shape Attention for Part Segmentation of 3D Point Clouds
Marios Loizou
Siddhant Garg
Dmitry Petrov
Melinos Averkiou
E. Kalogerakis
3DPC
379
4
0
20 Mar 2020
Transformer Networks for Trajectory Forecasting
International Conference on Pattern Recognition (ICPR), 2020
Francesco Giuliari
Irtiza Hasan
Marco Cristani
Fabio Galasso
387
478
0
18 Mar 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Transactions of the Association for Computational Linguistics (TACL), 2020
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
971
686
0
12 Mar 2020
ProGen: Language Modeling for Protein Generation
bioRxiv (bioRxiv), 2020
Ali Madani
Bryan McCann
Nikhil Naik
N. Keskar
N. Anand
Raphael R. Eguchi
Po-Ssu Huang
R. Socher
225
317
0
08 Mar 2020
Meta-Embeddings Based On Self-Attention
Qichen Li
Xiaoke Jiang
Jun Xia
Jian Li
150
2
0
03 Mar 2020
Sparse Sinkhorn Attention
International Conference on Machine Learning (ICML), 2020
Yi Tay
Dara Bahri
Liu Yang
Donald Metzler
Da-Cheng Juan
219
373
0
26 Feb 2020
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
Findings (Findings), 2020
Alessandro Raganato
Yves Scherrer
Jörg Tiedemann
371
96
0
24 Feb 2020
PolyGen: An Autoregressive Generative Model of 3D Meshes
International Conference on Machine Learning (ICML), 2020
C. Nash
Yaroslav Ganin
A. Eslami
Peter W. Battaglia
AI4CE
294
306
0
23 Feb 2020
Predictive Sampling with Forecasting Autoregressive Models
International Conference on Machine Learning (ICML), 2020
Auke Wiggers
Emiel Hoogeboom
BDL
193
17
0
23 Feb 2020
Addressing Some Limitations of Transformers with Feedback Memory
Angela Fan
Thibaut Lavril
Edouard Grave
Armand Joulin
Sainbayar Sukhbaatar
182
11
0
21 Feb 2020
Low-Rank Bottleneck in Multi-head Attention Models
International Conference on Machine Learning (ICML), 2020
Srinadh Bhojanapalli
Chulhee Yun
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
189
122
0
17 Feb 2020
On Layer Normalization in the Transformer Architecture
International Conference on Machine Learning (ICML), 2020
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
420
1,238
0
12 Feb 2020
Closing the Dequantization Gap: PixelCNN as a Single-Layer Flow
Neural Information Processing Systems (NeurIPS), 2020
Didrik Nielsen
Ole Winther
MQ
433
13
0
06 Feb 2020
Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions
Yu-Siang Huang
Yi-Hsuan Yang
ViT
232
39
0
01 Feb 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
1.8K
6,759
0
23 Jan 2020
Reformer: The Efficient Transformer
International Conference on Learning Representations (ICLR), 2020
Nikita Kitaev
Lukasz Kaiser
Anselm Levskaya
VLM
634
2,712
0
13 Jan 2020
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
184
136
0
25 Dec 2019
Axial Attention in Multidimensional Transformers
Jonathan Ho
Nal Kalchbrenner
Dirk Weissenborn
Tim Salimans
266
618
0
20 Dec 2019
Not All Attention Is Needed: Gated Attention Network for Sequence Data
AAAI Conference on Artificial Intelligence (AAAI), 2019
Lanqing Xue
Xiaopeng Li
Ningyu Zhang
149
41
0
01 Dec 2019
Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models
Computer Vision and Pattern Recognition (CVPR), 2019
Giannis Daras
Augustus Odena
Han Zhang
A. Dimakis
222
61
0
27 Nov 2019
Single Headed Attention RNN: Stop Thinking With Your Head
Stephen Merity
253
70
0
26 Nov 2019
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
Guangxiang Zhao
Xu Sun
Jingjing Xu
Zhiyuan Zhang
Liangchen Luo
LRM
168
60
0
17 Nov 2019
Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling
International Joint Conference on Artificial Intelligence (IJCAI), 2019
Ruizhe Zhao
Brian K. Vogel
Tanvir Ahmed
Wayne Luk
147
39
0
14 Nov 2019
Compressive Transformers for Long-Range Sequence Modelling
International Conference on Learning Representations (ICLR), 2019
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
297
774
0
13 Nov 2019
word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement
International Conference on Learning Representations (ICLR), 2019
Ali (Aliakbar) Panahi
Seyran Saeedi
Tom Arodz
130
42
0
12 Nov 2019
BP-Transformer: Modelling Long-Range Context via Binary Partitioning
Zihao Ye
Qipeng Guo
Quan Gan
Xipeng Qiu
Zheng Zhang
215
83
0
11 Nov 2019
Blockwise Self-Attention for Long Document Understanding
Findings (Findings), 2019
J. Qiu
Hao Ma
Omer Levy
Scott Yih
Sinong Wang
Jie Tang
306
269
0
07 Nov 2019
Improving Generalization of Transformer for Speech Recognition with Parallel Schedule Sampling and Relative Positional Embedding
Pan Zhou
Ruchao Fan
Wei Chen
Jia Jia
299
26
0
01 Nov 2019
Injecting Hierarchy with U-Net Transformers
David Donahue
Vladislav Lialin
Anna Rumshisky
AI4CE
139
2
0
16 Oct 2019
Previous
1
2
3
...
24
25
26
Next
Page 25 of 26
Page
of 26
Go