ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.10509
  4. Cited By
Generating Long Sequences with Sparse Transformers

Generating Long Sequences with Sparse Transformers

23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
ArXiv (abs)PDFHTML

Papers citing "Generating Long Sequences with Sparse Transformers"

32 / 1,282 papers shown
Checkmate: Breaking the Memory Wall with Optimal Tensor
  Rematerialization
Checkmate: Breaking the Memory Wall with Optimal Tensor RematerializationConference on Machine Learning and Systems (MLSys), 2019
Paras Jain
Ajay Jain
Aniruddha Nrusimha
A. Gholami
Pieter Abbeel
Kurt Keutzer
Ion Stoica
Joseph E. Gonzalez
240
230
0
07 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language RepresentationsInternational Conference on Learning Representations (ICLR), 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSLAIMat
1.1K
7,141
0
26 Sep 2019
Exascale Deep Learning for Scientific Inverse Problems
Exascale Deep Learning for Scientific Inverse Problems
N. Laanait
Josh Romero
Junqi Yin
M. T. Young
Sean Treichler
V. Starchenko
A. Borisevich
Alexander Sergeev
Michael A. Matheson
FedMLBDL
163
29
0
24 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
1.2K
2,425
0
17 Sep 2019
CTRL: A Conditional Transformer Language Model for Controllable
  Generation
CTRL: A Conditional Transformer Language Model for Controllable Generation
N. Keskar
Bryan McCann
Lav Varshney
Caiming Xiong
R. Socher
AI4CE
869
1,362
0
11 Sep 2019
Forecaster: A Graph Transformer for Forecasting Spatial and
  Time-Dependent Data
Forecaster: A Graph Transformer for Forecasting Spatial and Time-Dependent DataEuropean Conference on Artificial Intelligence (ECAI), 2019
Yongqian Li
J. M. F. Moura
AI4TS
235
39
0
09 Sep 2019
Deep Equilibrium Models
Deep Equilibrium ModelsNeural Information Processing Systems (NeurIPS), 2019
Shaojie Bai
J. Zico Kolter
V. Koltun
221
773
0
03 Sep 2019
Logic and the $2$-Simplicial Transformer
Logic and the 222-Simplicial TransformerInternational Conference on Learning Representations (ICLR), 2019
James Clift
D. Doryn
Daniel Murfet
James Wallbridge
NAI
146
6
0
02 Sep 2019
Adaptively Sparse Transformers
Adaptively Sparse TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Gonçalo M. Correia
Vlad Niculae
André F. T. Martins
341
277
0
30 Aug 2019
Transformer Dissection: A Unified Understanding of Transformer's
  Attention via the Lens of Kernel
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of KernelConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Yifan Hao
Shaojie Bai
M. Yamada
Louis-Philippe Morency
Ruslan Salakhutdinov
490
297
0
30 Aug 2019
Improving Deep Transformer with Depth-Scaled Initialization and Merged
  Attention
Improving Deep Transformer with Depth-Scaled Initialization and Merged AttentionConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Biao Zhang
Ivan Titov
Rico Sennrich
184
115
0
29 Aug 2019
BERT for Coreference Resolution: Baselines and Analysis
BERT for Coreference Resolution: Baselines and AnalysisConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Mandar Joshi
Omer Levy
Daniel S. Weld
Luke Zettlemoyer
349
339
0
24 Aug 2019
Interlaced Sparse Self-Attention for Semantic Segmentation
Interlaced Sparse Self-Attention for Semantic Segmentation
Lang Huang
Yuhui Yuan
Jianyuan Guo
Chao Zhang
Xilin Chen
Jingdong Wang
222
174
0
29 Jul 2019
Self-Attentional Credit Assignment for Transfer in Reinforcement
  Learning
Self-Attentional Credit Assignment for Transfer in Reinforcement Learning
Johan Ferret
Raphaël Marinier
Matthieu Geist
Olivier Pietquin
OffRL
186
6
0
18 Jul 2019
Agglomerative Attention
Agglomerative Attention
Matthew Spellings
77
0
0
15 Jul 2019
Adversarial Video Generation on Complex Datasets
Adversarial Video Generation on Complex Datasets
Aidan Clark
Jeff Donahue
Karen Simonyan
VGenGAN
231
80
0
15 Jul 2019
Sparse Networks from Scratch: Faster Training without Losing Performance
Sparse Networks from Scratch: Faster Training without Losing Performance
Tim Dettmers
Luke Zettlemoyer
304
357
0
10 Jul 2019
Augmenting Self-attention with Persistent Memory
Augmenting Self-attention with Persistent Memory
Sainbayar Sukhbaatar
Edouard Grave
Guillaume Lample
Edouard Grave
Armand Joulin
RALMKELM
223
149
0
02 Jul 2019
The University of Sydney's Machine Translation System for WMT19
The University of Sydney's Machine Translation System for WMT19Conference on Machine Translation (WMT), 2019
Liang Ding
Dacheng Tao
93
13
0
30 Jun 2019
Enhancing the Locality and Breaking the Memory Bottleneck of Transformer
  on Time Series Forecasting
Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series ForecastingNeural Information Processing Systems (NeurIPS), 2019
Shiyang Li
Xiaoyong Jin
Yao Xuan
Xiyou Zhou
Wenhu Chen
Yu Wang
Xifeng Yan
AI4TS
612
1,768
0
29 Jun 2019
A Tensorized Transformer for Language Modeling
A Tensorized Transformer for Language ModelingNeural Information Processing Systems (NeurIPS), 2019
Xindian Ma
Peng Zhang
Shuai Zhang
Nan Duan
Yuexian Hou
D. Song
M. Zhou
354
186
0
24 Jun 2019
Learning Set-equivariant Functions with SWARM Mappings
Learning Set-equivariant Functions with SWARM Mappings
Roland Vollgraf
108
3
0
22 Jun 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models
Theoretical Limitations of Self-Attention in Neural Sequence ModelsTransactions of the Association for Computational Linguistics (TACL), 2019
Michael Hahn
346
337
0
16 Jun 2019
One Epoch Is All You Need
One Epoch Is All You Need
Aran Komatsuzaki
139
58
0
16 Jun 2019
Analyzing the Structure of Attention in a Transformer Language Model
Analyzing the Structure of Attention in a Transformer Language Model
Jesse Vig
Yonatan Belinkov
265
427
0
07 Jun 2019
Scaling Autoregressive Video Models
Scaling Autoregressive Video ModelsInternational Conference on Learning Representations (ICLR), 2019
Dirk Weissenborn
Oscar Täckström
Jakob Uszkoreit
DiffMVGen
396
232
0
06 Jun 2019
MelNet: A Generative Model for Audio in the Frequency Domain
MelNet: A Generative Model for Audio in the Frequency Domain
Sean Vasquez
M. Lewis
DiffM
161
140
0
04 Jun 2019
Exploiting Uncertainty of Loss Landscape for Stochastic Optimization
Exploiting Uncertainty of Loss Landscape for Stochastic Optimization
Vineeth S. Bhaskara
S. Desai
62
1
0
30 May 2019
SCRAM: Spatially Coherent Randomized Attention Maps
SCRAM: Spatially Coherent Randomized Attention Maps
D. A. Calian
P. Roelants
Jacques Calì
B. Carr
K. Dubba
John E. Reid
Dell Zhang
124
2
0
24 May 2019
Compression with Flows via Local Bits-Back Coding
Compression with Flows via Local Bits-Back CodingNeural Information Processing Systems (NeurIPS), 2019
Jonathan Ho
Evan Lohn
Pieter Abbeel
251
60
0
21 May 2019
An Attentive Survey of Attention Models
An Attentive Survey of Attention Models
S. Chaudhari
Varun Mithal
Gungor Polatkan
R. Ramanath
409
722
0
05 Apr 2019
OCNet: Object Context Network for Scene Parsing
OCNet: Object Context Network for Scene Parsing
Yuhui Yuan
Lang Huang
Jianyuan Guo
Chao Zhang
Xilin Chen
Jingdong Wang
349
629
0
04 Sep 2018
Previous
123...242526