Generating Long Sequences with Sparse Transformers

23 April 2019

Papers citing "Generating Long Sequences with Sparse Transformers"

32 / 1,282 papers shown

Checkmate: Breaking the Memory Wall with Optimal Tensor RematerializationConference on Machine Learning and Systems (MLSys), 2019

Pieter Abbeel

240

230

07 Oct 2019

ALBERT: A Lite BERT for Self-supervised Learning of Language RepresentationsInternational Conference on Learning Representations (ICLR), 2019

1.1K

7,141

26 Sep 2019

Exascale Deep Learning for Scientific Inverse Problems

163

24 Sep 2019

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

1.2K

2,425

17 Sep 2019

CTRL: A Conditional Transformer Language Model for Controllable Generation

869

1,362

11 Sep 2019

Forecaster: A Graph Transformer for Forecasting Spatial and Time-Dependent DataEuropean Conference on Artificial Intelligence (ECAI), 2019

Yongqian Li

J. M. F. Moura

AI4TS

235

09 Sep 2019

Deep Equilibrium ModelsNeural Information Processing Systems (NeurIPS), 2019

Shaojie Bai

J. Zico Kolter

V. Koltun

221

773

03 Sep 2019

Logic and the

2

-Simplicial TransformerInternational Conference on Learning Representations (ICLR), 2019

146

02 Sep 2019

Adaptively Sparse TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Gonçalo M. Correia

Vlad Niculae

André F. T. Martins

341

277

30 Aug 2019

Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of KernelConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Yifan Hao

Shaojie Bai

M. Yamada

Louis-Philippe Morency

Ruslan Salakhutdinov

490

297

30 Aug 2019

Improving Deep Transformer with Depth-Scaled Initialization and Merged AttentionConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Biao Zhang

Ivan Titov

Rico Sennrich

184

115

29 Aug 2019

BERT for Coreference Resolution: Baselines and AnalysisConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Mandar Joshi

Omer Levy

Daniel S. Weld

Luke Zettlemoyer

349

339

24 Aug 2019

Interlaced Sparse Self-Attention for Semantic Segmentation

Jingdong Wang

222

174

29 Jul 2019

Self-Attentional Credit Assignment for Transfer in Reinforcement Learning

Olivier Pietquin

186

18 Jul 2019

Agglomerative Attention

Matthew Spellings

15 Jul 2019

Adversarial Video Generation on Complex Datasets

231

15 Jul 2019

Sparse Networks from Scratch: Faster Training without Losing Performance

Tim Dettmers

Luke Zettlemoyer

304

357

10 Jul 2019

Augmenting Self-attention with Persistent Memory

223

149

02 Jul 2019

The University of Sydney's Machine Translation System for WMT19Conference on Machine Translation (WMT), 2019

Liang Ding

Dacheng Tao

30 Jun 2019

Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series ForecastingNeural Information Processing Systems (NeurIPS), 2019

612

1,768

29 Jun 2019

A Tensorized Transformer for Language ModelingNeural Information Processing Systems (NeurIPS), 2019

354

186

24 Jun 2019

Learning Set-equivariant Functions with SWARM Mappings

Roland Vollgraf

108

22 Jun 2019

Theoretical Limitations of Self-Attention in Neural Sequence ModelsTransactions of the Association for Computational Linguistics (TACL), 2019

Michael Hahn

346

337

16 Jun 2019

One Epoch Is All You Need

Aran Komatsuzaki

139

16 Jun 2019

Analyzing the Structure of Attention in a Transformer Language Model

Jesse Vig

Yonatan Belinkov

265

427

07 Jun 2019

Scaling Autoregressive Video ModelsInternational Conference on Learning Representations (ICLR), 2019

396

232

06 Jun 2019

MelNet: A Generative Model for Audio in the Frequency Domain

Sean Vasquez

M. Lewis

DiffM

161

140

04 Jun 2019

Exploiting Uncertainty of Loss Landscape for Stochastic Optimization

Vineeth S. Bhaskara

S. Desai

30 May 2019

SCRAM: Spatially Coherent Randomized Attention Maps

124

24 May 2019

Compression with Flows via Local Bits-Back CodingNeural Information Processing Systems (NeurIPS), 2019

Jonathan Ho

Evan Lohn

Pieter Abbeel

251

21 May 2019

An Attentive Survey of Attention Models

409

722

05 Apr 2019

OCNet: Object Context Network for Scene Parsing