Generating Long Sequences with Sparse Transformers

23 April 2019

Papers citing "Generating Long Sequences with Sparse Transformers"

50 / 1,283 papers shown

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Xiaodong Liu

618

3,417

05 Jun 2020

Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

K. Choromanski

Valerii Likhosherstov

...

359

05 Jun 2020

GMAT: Global Memory Augmentation for Transformers

Ankit Gupta

Jonathan Berant

RALM

175

05 Jun 2020

Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020

...

2.0K

52,836

28 May 2020

The Cascade Transformer: an Application for Efficient Answer Sentence SelectionAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

Luca Soldaini

Alessandro Moschitti

172

05 May 2020

A Simple Language Model for Task-Oriented DialogueNeural Information Processing Systems (NeurIPS), 2020

613

557

02 May 2020

Synthesizer: Rethinking Self-Attention in Transformer ModelsInternational Conference on Machine Learning (ICML), 2020

299

382

02 May 2020

Multi-scale Transformer Language Models

142

01 May 2020

Incremental Neural Coreference Resolution in Constant Memory

176

30 Apr 2020

Jukebox: A Generative Model for Music

560

902

30 Apr 2020

Multiresolution and Multimodal Speech Recognition with TransformersAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

Georgios Paraskevopoulos

Srinivas Parthasarathy

Aparna Khare

Shiva Sundaram

216

29 Apr 2020

Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document MatchingInternational Conference on Information and Knowledge Management (CIKM), 2020

265

26 Apr 2020

Lite Transformer with Long-Short Range AttentionInternational Conference on Learning Representations (ICLR), 2020

Zhanghao Wu

Zhijian Liu

Ji Lin

Chengyue Wu

Song Han

181

364

24 Apr 2020

On Sparsifying Encoder Outputs in Sequence-to-Sequence ModelsFindings (Findings), 2020

Biao Zhang

Ivan Titov

Rico Sennrich

24 Apr 2020

Vector Quantized Contrastive Predictive Coding for Template-based Music Generation

Gaëtan Hadjeres

Léopold Crestel

197

21 Apr 2020

A Spatio-temporal Transformer for 3D Human Motion PredictionInternational Conference on 3D Vision (3DV), 2020

Emre Aksan

Manuel Kaufmann

Peng Cao

Otmar Hilliges

ViT

395

278

18 Apr 2020

ETC: Encoding Long and Structured Inputs in Transformers

Joshua Ainslie

Santiago Ontanon

Chris Alberti

Vaclav Cvicek

Zachary Kenneth Fisher

Sumit Sanghai

318

17 Apr 2020

Longformer: The Long-Document Transformer

Iz Beltagy

Matthew E. Peters

Arman Cohan

RALM VLM

695

4,928

10 Apr 2020

Hierarchical Opacity Propagation for Image Matting

Yaoyi Li

Qin Xu

Hongtao Lu

181

07 Apr 2020

Residual Shuffle-Exchange Networks for Fast Processing of Long SequencesAAAI Conference on Artificial Intelligence (AAAI), 2020

288

06 Apr 2020

SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive ConnectionNeural Information Processing Systems (NeurIPS), 2020

Jiwei Li

283

22 Mar 2020

Cross-Shape Attention for Part Segmentation of 3D Point Clouds

379

20 Mar 2020

Transformer Networks for Trajectory ForecastingInternational Conference on Pattern Recognition (ICPR), 2020

387

478

18 Mar 2020

Efficient Content-Based Sparse Attention with Routing TransformersTransactions of the Association for Computational Linguistics (TACL), 2020

971

686

12 Mar 2020

ProGen: Language Modeling for Protein GenerationbioRxiv (bioRxiv), 2020

225

317

08 Mar 2020

Meta-Embeddings Based On Self-Attention

150

03 Mar 2020

Sparse Sinkhorn AttentionInternational Conference on Machine Learning (ICML), 2020

219

373

26 Feb 2020

Fixed Encoder Self-Attention Patterns in Transformer-Based Machine TranslationFindings (Findings), 2020

Alessandro Raganato

Yves Scherrer

Jörg Tiedemann

371

24 Feb 2020

PolyGen: An Autoregressive Generative Model of 3D MeshesInternational Conference on Machine Learning (ICML), 2020

294

306

23 Feb 2020

Predictive Sampling with Forecasting Autoregressive ModelsInternational Conference on Machine Learning (ICML), 2020

Auke Wiggers

Emiel Hoogeboom

BDL

193

23 Feb 2020

Addressing Some Limitations of Transformers with Feedback Memory

Angela Fan

182

21 Feb 2020

Low-Rank Bottleneck in Multi-head Attention ModelsInternational Conference on Machine Learning (ICML), 2020

Srinadh Bhojanapalli

Chulhee Yun

A. S. Rawat

Sashank J. Reddi

Sanjiv Kumar

189

122

17 Feb 2020

On Layer Normalization in the Transformer ArchitectureInternational Conference on Machine Learning (ICML), 2020

420

1,238

12 Feb 2020

Closing the Dequantization Gap: PixelCNN as a Single-Layer FlowNeural Information Processing Systems (NeurIPS), 2020

Didrik Nielsen

Ole Winther

433

06 Feb 2020

Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions

Yu-Siang Huang

Yi-Hsuan Yang

ViT

232

01 Feb 2020

Scaling Laws for Neural Language Models

1.8K

6,759

23 Jan 2020

Reformer: The Efficient TransformerInternational Conference on Learning Representations (ICLR), 2020

634

2,712

13 Jan 2020

Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection

Xuancheng Ren

184

136

25 Dec 2019

Axial Attention in Multidimensional Transformers

266

618

20 Dec 2019

Not All Attention Is Needed: Gated Attention Network for Sequence DataAAAI Conference on Artificial Intelligence (AAAI), 2019

Lanqing Xue

Xiaopeng Li

Ningyu Zhang

149

01 Dec 2019

Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative ModelsComputer Vision and Pattern Recognition (CVPR), 2019

222

27 Nov 2019

Single Headed Attention RNN: Stop Thinking With Your Head

Stephen Merity

253

26 Nov 2019

MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning

168

17 Nov 2019

Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence ModellingInternational Joint Conference on Artificial Intelligence (IJCAI), 2019

147

14 Nov 2019

Compressive Transformers for Long-Range Sequence ModellingInternational Conference on Learning Representations (ICLR), 2019

Jack W. Rae

Anna Potapenko

Siddhant M. Jayakumar

Timothy Lillicrap

RALM VLM KELM

297

774

13 Nov 2019

word2ket: Space-efficient Word Embeddings inspired by Quantum EntanglementInternational Conference on Learning Representations (ICLR), 2019

Ali (Aliakbar) Panahi

Seyran Saeedi

Tom Arodz

130

12 Nov 2019

BP-Transformer: Modelling Long-Range Context via Binary Partitioning

Zihao Ye

Qipeng Guo

Quan Gan

Xipeng Qiu

Zheng Zhang

215

11 Nov 2019

Blockwise Self-Attention for Long Document UnderstandingFindings (Findings), 2019

Hao Ma

Sinong Wang

306

269

07 Nov 2019

Improving Generalization of Transformer for Speech Recognition with Parallel Schedule Sampling and Relative Positional Embedding

Pan Zhou

Ruchao Fan

Wei Chen

Jia Jia

299

01 Nov 2019

Injecting Hierarchy with U-Net Transformers

139

16 Oct 2019