Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1904.10509
Cited By
Generating Long Sequences with Sparse Transformers
23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Generating Long Sequences with Sparse Transformers"
32 / 1,282 papers shown
Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization
Conference on Machine Learning and Systems (MLSys), 2019
Paras Jain
Ajay Jain
Aniruddha Nrusimha
A. Gholami
Pieter Abbeel
Kurt Keutzer
Ion Stoica
Joseph E. Gonzalez
240
230
0
07 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
International Conference on Learning Representations (ICLR), 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
1.1K
7,141
0
26 Sep 2019
Exascale Deep Learning for Scientific Inverse Problems
N. Laanait
Josh Romero
Junqi Yin
M. T. Young
Sean Treichler
V. Starchenko
A. Borisevich
Alexander Sergeev
Michael A. Matheson
FedML
BDL
163
29
0
24 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
1.2K
2,425
0
17 Sep 2019
CTRL: A Conditional Transformer Language Model for Controllable Generation
N. Keskar
Bryan McCann
Lav Varshney
Caiming Xiong
R. Socher
AI4CE
869
1,362
0
11 Sep 2019
Forecaster: A Graph Transformer for Forecasting Spatial and Time-Dependent Data
European Conference on Artificial Intelligence (ECAI), 2019
Yongqian Li
J. M. F. Moura
AI4TS
235
39
0
09 Sep 2019
Deep Equilibrium Models
Neural Information Processing Systems (NeurIPS), 2019
Shaojie Bai
J. Zico Kolter
V. Koltun
221
773
0
03 Sep 2019
Logic and the
2
2
2
-Simplicial Transformer
International Conference on Learning Representations (ICLR), 2019
James Clift
D. Doryn
Daniel Murfet
James Wallbridge
NAI
146
6
0
02 Sep 2019
Adaptively Sparse Transformers
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Gonçalo M. Correia
Vlad Niculae
André F. T. Martins
341
277
0
30 Aug 2019
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Yifan Hao
Shaojie Bai
M. Yamada
Louis-Philippe Morency
Ruslan Salakhutdinov
490
297
0
30 Aug 2019
Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Biao Zhang
Ivan Titov
Rico Sennrich
184
115
0
29 Aug 2019
BERT for Coreference Resolution: Baselines and Analysis
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Mandar Joshi
Omer Levy
Daniel S. Weld
Luke Zettlemoyer
349
339
0
24 Aug 2019
Interlaced Sparse Self-Attention for Semantic Segmentation
Lang Huang
Yuhui Yuan
Jianyuan Guo
Chao Zhang
Xilin Chen
Jingdong Wang
222
174
0
29 Jul 2019
Self-Attentional Credit Assignment for Transfer in Reinforcement Learning
Johan Ferret
Raphaël Marinier
Matthieu Geist
Olivier Pietquin
OffRL
186
6
0
18 Jul 2019
Agglomerative Attention
Matthew Spellings
77
0
0
15 Jul 2019
Adversarial Video Generation on Complex Datasets
Aidan Clark
Jeff Donahue
Karen Simonyan
VGen
GAN
231
80
0
15 Jul 2019
Sparse Networks from Scratch: Faster Training without Losing Performance
Tim Dettmers
Luke Zettlemoyer
304
357
0
10 Jul 2019
Augmenting Self-attention with Persistent Memory
Sainbayar Sukhbaatar
Edouard Grave
Guillaume Lample
Edouard Grave
Armand Joulin
RALM
KELM
223
149
0
02 Jul 2019
The University of Sydney's Machine Translation System for WMT19
Conference on Machine Translation (WMT), 2019
Liang Ding
Dacheng Tao
93
13
0
30 Jun 2019
Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting
Neural Information Processing Systems (NeurIPS), 2019
Shiyang Li
Xiaoyong Jin
Yao Xuan
Xiyou Zhou
Wenhu Chen
Yu Wang
Xifeng Yan
AI4TS
612
1,768
0
29 Jun 2019
A Tensorized Transformer for Language Modeling
Neural Information Processing Systems (NeurIPS), 2019
Xindian Ma
Peng Zhang
Shuai Zhang
Nan Duan
Yuexian Hou
D. Song
M. Zhou
354
186
0
24 Jun 2019
Learning Set-equivariant Functions with SWARM Mappings
Roland Vollgraf
108
3
0
22 Jun 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models
Transactions of the Association for Computational Linguistics (TACL), 2019
Michael Hahn
346
337
0
16 Jun 2019
One Epoch Is All You Need
Aran Komatsuzaki
139
58
0
16 Jun 2019
Analyzing the Structure of Attention in a Transformer Language Model
Jesse Vig
Yonatan Belinkov
265
427
0
07 Jun 2019
Scaling Autoregressive Video Models
International Conference on Learning Representations (ICLR), 2019
Dirk Weissenborn
Oscar Täckström
Jakob Uszkoreit
DiffM
VGen
396
232
0
06 Jun 2019
MelNet: A Generative Model for Audio in the Frequency Domain
Sean Vasquez
M. Lewis
DiffM
161
140
0
04 Jun 2019
Exploiting Uncertainty of Loss Landscape for Stochastic Optimization
Vineeth S. Bhaskara
S. Desai
62
1
0
30 May 2019
SCRAM: Spatially Coherent Randomized Attention Maps
D. A. Calian
P. Roelants
Jacques Calì
B. Carr
K. Dubba
John E. Reid
Dell Zhang
124
2
0
24 May 2019
Compression with Flows via Local Bits-Back Coding
Neural Information Processing Systems (NeurIPS), 2019
Jonathan Ho
Evan Lohn
Pieter Abbeel
251
60
0
21 May 2019
An Attentive Survey of Attention Models
S. Chaudhari
Varun Mithal
Gungor Polatkan
R. Ramanath
409
722
0
05 Apr 2019
OCNet: Object Context Network for Scene Parsing
Yuhui Yuan
Lang Huang
Jianyuan Guo
Chao Zhang
Xilin Chen
Jingdong Wang
349
629
0
04 Sep 2018
Previous
1
2
3
...
24
25
26