ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.10509
  4. Cited By
Generating Long Sequences with Sparse Transformers

Generating Long Sequences with Sparse Transformers

23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
ArXiv (abs)PDFHTML

Papers citing "Generating Long Sequences with Sparse Transformers"

50 / 1,283 papers shown
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
AAML
618
3,417
0
05 Jun 2020
Masked Language Modeling for Proteins via Linearly Scalable Long-Context
  Transformers
Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Peter Hawkins
Jared Davis
David Belanger
Lucy J. Colwell
Adrian Weller
359
93
0
05 Jun 2020
GMAT: Global Memory Augmentation for Transformers
GMAT: Global Memory Augmentation for Transformers
Ankit Gupta
Jonathan Berant
RALM
175
52
0
05 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
2.0K
52,836
0
28 May 2020
The Cascade Transformer: an Application for Efficient Answer Sentence
  Selection
The Cascade Transformer: an Application for Efficient Answer Sentence SelectionAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Luca Soldaini
Alessandro Moschitti
172
45
0
05 May 2020
A Simple Language Model for Task-Oriented Dialogue
A Simple Language Model for Task-Oriented DialogueNeural Information Processing Systems (NeurIPS), 2020
Ehsan Hosseini-Asl
Bryan McCann
Chien-Sheng Wu
Semih Yavuz
R. Socher
613
557
0
02 May 2020
Synthesizer: Rethinking Self-Attention in Transformer Models
Synthesizer: Rethinking Self-Attention in Transformer ModelsInternational Conference on Machine Learning (ICML), 2020
Yi Tay
Dara Bahri
Donald Metzler
Da-Cheng Juan
Zhe Zhao
Che Zheng
299
382
0
02 May 2020
Multi-scale Transformer Language Models
Multi-scale Transformer Language Models
Sandeep Subramanian
R. Collobert
MarcÁurelio Ranzato
Y-Lan Boureau
142
18
0
01 May 2020
Incremental Neural Coreference Resolution in Constant Memory
Incremental Neural Coreference Resolution in Constant Memory
Patrick Xia
João Sedoc
Benjamin Van Durme
CLL
176
3
0
30 Apr 2020
Jukebox: A Generative Model for Music
Jukebox: A Generative Model for Music
Prafulla Dhariwal
Heewoo Jun
Christine Payne
Jong Wook Kim
Alec Radford
Ilya Sutskever
VLM
560
902
0
30 Apr 2020
Multiresolution and Multimodal Speech Recognition with Transformers
Multiresolution and Multimodal Speech Recognition with TransformersAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Georgios Paraskevopoulos
Srinivas Parthasarathy
Aparna Khare
Shiva Sundaram
216
29
0
29 Apr 2020
Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical
  Encoder for Long-Form Document Matching
Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document MatchingInternational Conference on Information and Knowledge Management (CIKM), 2020
Liu Yang
Mingyang Zhang
Cheng Li
Michael Bendersky
Marc Najork
265
95
0
26 Apr 2020
Lite Transformer with Long-Short Range Attention
Lite Transformer with Long-Short Range AttentionInternational Conference on Learning Representations (ICLR), 2020
Zhanghao Wu
Zhijian Liu
Ji Lin
Chengyue Wu
Song Han
181
364
0
24 Apr 2020
On Sparsifying Encoder Outputs in Sequence-to-Sequence Models
On Sparsifying Encoder Outputs in Sequence-to-Sequence ModelsFindings (Findings), 2020
Biao Zhang
Ivan Titov
Rico Sennrich
97
14
0
24 Apr 2020
Vector Quantized Contrastive Predictive Coding for Template-based Music
  Generation
Vector Quantized Contrastive Predictive Coding for Template-based Music Generation
Gaëtan Hadjeres
Léopold Crestel
197
19
0
21 Apr 2020
A Spatio-temporal Transformer for 3D Human Motion Prediction
A Spatio-temporal Transformer for 3D Human Motion PredictionInternational Conference on 3D Vision (3DV), 2020
Emre Aksan
Manuel Kaufmann
Peng Cao
Otmar Hilliges
ViT
395
278
0
18 Apr 2020
ETC: Encoding Long and Structured Inputs in Transformers
ETC: Encoding Long and Structured Inputs in Transformers
Joshua Ainslie
Santiago Ontanon
Chris Alberti
Vaclav Cvicek
Zachary Kenneth Fisher
Philip Pham
Anirudh Ravula
Sumit Sanghai
Qifan Wang
Li Yang
318
56
0
17 Apr 2020
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALMVLM
695
4,928
0
10 Apr 2020
Hierarchical Opacity Propagation for Image Matting
Hierarchical Opacity Propagation for Image Matting
Yaoyi Li
Qin Xu
Hongtao Lu
181
14
0
07 Apr 2020
Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences
Residual Shuffle-Exchange Networks for Fast Processing of Long SequencesAAAI Conference on Artificial Intelligence (AAAI), 2020
Andis Draguns
Emīls Ozoliņš
A. Sostaks
Matiss Apinis
Kārlis Freivalds
288
10
0
06 Apr 2020
SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive
  Connection
SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive ConnectionNeural Information Processing Systems (NeurIPS), 2020
Xiaoya Li
Yuxian Meng
Mingxin Zhou
Qinghong Han
Leilei Gan
Jiwei Li
283
22
0
22 Mar 2020
Cross-Shape Attention for Part Segmentation of 3D Point Clouds
Cross-Shape Attention for Part Segmentation of 3D Point Clouds
Marios Loizou
Siddhant Garg
Dmitry Petrov
Melinos Averkiou
E. Kalogerakis
3DPC
379
4
0
20 Mar 2020
Transformer Networks for Trajectory Forecasting
Transformer Networks for Trajectory ForecastingInternational Conference on Pattern Recognition (ICPR), 2020
Francesco Giuliari
Irtiza Hasan
Marco Cristani
Fabio Galasso
387
478
0
18 Mar 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Efficient Content-Based Sparse Attention with Routing TransformersTransactions of the Association for Computational Linguistics (TACL), 2020
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
971
686
0
12 Mar 2020
ProGen: Language Modeling for Protein Generation
ProGen: Language Modeling for Protein GenerationbioRxiv (bioRxiv), 2020
Ali Madani
Bryan McCann
Nikhil Naik
N. Keskar
N. Anand
Raphael R. Eguchi
Po-Ssu Huang
R. Socher
225
317
0
08 Mar 2020
Meta-Embeddings Based On Self-Attention
Meta-Embeddings Based On Self-Attention
Qichen Li
Xiaoke Jiang
Jun Xia
Jian Li
150
2
0
03 Mar 2020
Sparse Sinkhorn Attention
Sparse Sinkhorn AttentionInternational Conference on Machine Learning (ICML), 2020
Yi Tay
Dara Bahri
Liu Yang
Donald Metzler
Da-Cheng Juan
219
373
0
26 Feb 2020
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine
  Translation
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine TranslationFindings (Findings), 2020
Alessandro Raganato
Yves Scherrer
Jörg Tiedemann
371
96
0
24 Feb 2020
PolyGen: An Autoregressive Generative Model of 3D Meshes
PolyGen: An Autoregressive Generative Model of 3D MeshesInternational Conference on Machine Learning (ICML), 2020
C. Nash
Yaroslav Ganin
A. Eslami
Peter W. Battaglia
AI4CE
294
306
0
23 Feb 2020
Predictive Sampling with Forecasting Autoregressive Models
Predictive Sampling with Forecasting Autoregressive ModelsInternational Conference on Machine Learning (ICML), 2020
Auke Wiggers
Emiel Hoogeboom
BDL
193
17
0
23 Feb 2020
Addressing Some Limitations of Transformers with Feedback Memory
Addressing Some Limitations of Transformers with Feedback Memory
Angela Fan
Thibaut Lavril
Edouard Grave
Armand Joulin
Sainbayar Sukhbaatar
182
11
0
21 Feb 2020
Low-Rank Bottleneck in Multi-head Attention Models
Low-Rank Bottleneck in Multi-head Attention ModelsInternational Conference on Machine Learning (ICML), 2020
Srinadh Bhojanapalli
Chulhee Yun
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
189
122
0
17 Feb 2020
On Layer Normalization in the Transformer Architecture
On Layer Normalization in the Transformer ArchitectureInternational Conference on Machine Learning (ICML), 2020
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
420
1,238
0
12 Feb 2020
Closing the Dequantization Gap: PixelCNN as a Single-Layer Flow
Closing the Dequantization Gap: PixelCNN as a Single-Layer FlowNeural Information Processing Systems (NeurIPS), 2020
Didrik Nielsen
Ole Winther
MQ
433
13
0
06 Feb 2020
Pop Music Transformer: Beat-based Modeling and Generation of Expressive
  Pop Piano Compositions
Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions
Yu-Siang Huang
Yi-Hsuan Yang
ViT
232
39
0
01 Feb 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
1.8K
6,759
0
23 Jan 2020
Reformer: The Efficient Transformer
Reformer: The Efficient TransformerInternational Conference on Learning Representations (ICLR), 2020
Nikita Kitaev
Lukasz Kaiser
Anselm Levskaya
VLM
634
2,712
0
13 Jan 2020
Explicit Sparse Transformer: Concentrated Attention Through Explicit
  Selection
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
184
136
0
25 Dec 2019
Axial Attention in Multidimensional Transformers
Axial Attention in Multidimensional Transformers
Jonathan Ho
Nal Kalchbrenner
Dirk Weissenborn
Tim Salimans
266
618
0
20 Dec 2019
Not All Attention Is Needed: Gated Attention Network for Sequence Data
Not All Attention Is Needed: Gated Attention Network for Sequence DataAAAI Conference on Artificial Intelligence (AAAI), 2019
Lanqing Xue
Xiaopeng Li
Ningyu Zhang
149
41
0
01 Dec 2019
Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for
  Generative Models
Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative ModelsComputer Vision and Pattern Recognition (CVPR), 2019
Giannis Daras
Augustus Odena
Han Zhang
A. Dimakis
222
61
0
27 Nov 2019
Single Headed Attention RNN: Stop Thinking With Your Head
Single Headed Attention RNN: Stop Thinking With Your Head
Stephen Merity
253
70
0
26 Nov 2019
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
Guangxiang Zhao
Xu Sun
Jingjing Xu
Zhiyuan Zhang
Liangchen Luo
LRM
168
60
0
17 Nov 2019
Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence
  Modelling
Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence ModellingInternational Joint Conference on Artificial Intelligence (IJCAI), 2019
Ruizhe Zhao
Brian K. Vogel
Tanvir Ahmed
Wayne Luk
147
39
0
14 Nov 2019
Compressive Transformers for Long-Range Sequence Modelling
Compressive Transformers for Long-Range Sequence ModellingInternational Conference on Learning Representations (ICLR), 2019
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALMVLMKELM
297
774
0
13 Nov 2019
word2ket: Space-efficient Word Embeddings inspired by Quantum
  Entanglement
word2ket: Space-efficient Word Embeddings inspired by Quantum EntanglementInternational Conference on Learning Representations (ICLR), 2019
Ali (Aliakbar) Panahi
Seyran Saeedi
Tom Arodz
130
42
0
12 Nov 2019
BP-Transformer: Modelling Long-Range Context via Binary Partitioning
BP-Transformer: Modelling Long-Range Context via Binary Partitioning
Zihao Ye
Qipeng Guo
Quan Gan
Xipeng Qiu
Zheng Zhang
215
83
0
11 Nov 2019
Blockwise Self-Attention for Long Document Understanding
Blockwise Self-Attention for Long Document UnderstandingFindings (Findings), 2019
J. Qiu
Hao Ma
Omer Levy
Scott Yih
Sinong Wang
Jie Tang
306
269
0
07 Nov 2019
Improving Generalization of Transformer for Speech Recognition with
  Parallel Schedule Sampling and Relative Positional Embedding
Improving Generalization of Transformer for Speech Recognition with Parallel Schedule Sampling and Relative Positional Embedding
Pan Zhou
Ruchao Fan
Wei Chen
Jia Jia
299
26
0
01 Nov 2019
Injecting Hierarchy with U-Net Transformers
Injecting Hierarchy with U-Net Transformers
David Donahue
Vladislav Lialin
Anna Rumshisky
AI4CE
139
2
0
16 Oct 2019
Previous
123...242526
Next
Page 25 of 26
Pageof 26