Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1904.10509
Cited By
Generating Long Sequences with Sparse Transformers
23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Generating Long Sequences with Sparse Transformers"
50 / 1,283 papers shown
Stepwise Extractive Summarization and Planning with Structured Transformers
Shashi Narayan
Joshua Maynez
Jakub Adamek
Daniele Pighin
Blavz Bratanivc
Ryan T. McDonald
179
33
0
06 Oct 2020
Scene Graph Modification Based on Natural Language Commands
Findings (Findings), 2020
Xuanli He
Quan Hung Tran
Gholamreza Haffari
Walter Chang
Trung Bui
Zhe Lin
Franck Dernoncourt
Nhan Dam
GNN
203
9
0
06 Oct 2020
Guiding Attention for Self-Supervised Learning with Transformers
Findings (Findings), 2020
Ameet Deshpande
Karthik Narasimhan
157
22
0
06 Oct 2020
Which *BERT? A Survey Organizing Contextualized Encoders
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Patrick Xia
Shijie Wu
Benjamin Van Durme
223
53
0
02 Oct 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
781
1,979
0
30 Sep 2020
Learning Hard Retrieval Decoder Attention for Transformers
Hongfei Xu
Qiuhui Liu
Josef van Genabith
Deyi Xiong
118
1
0
30 Sep 2020
Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems
Andrea Madotto
Samuel Cahyawijaya
Genta Indra Winata
Yan Xu
Zihan Liu
Mohammad Kachuee
Pascale Fung
296
66
0
28 Sep 2020
Current Limitations of Language Models: What You Need is Retrieval
Aran Komatsuzaki
LRM
124
3
0
15 Sep 2020
Efficient Transformers: A Survey
ACM Computing Surveys (ACM CSUR), 2020
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
866
1,362
0
14 Sep 2020
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
Findings (Findings), 2020
Shuohang Wang
Luowei Zhou
Zhe Gan
Yen-Chun Chen
Yuwei Fang
S. Sun
Yu Cheng
Jingjing Liu
249
32
0
13 Sep 2020
Sparsifying Transformer Models with Trainable Representation Pooling
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Michal Pietruszka
Łukasz Borchmann
Lukasz Garncarek
259
13
0
10 Sep 2020
Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images
IEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2020
Rui Li
Shunyi Zheng
Chenxi Duan
Ce Zhang
Jianlin Su
P. M. Atkinson
SSeg
422
496
0
03 Sep 2020
Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020
Cong Guo
B. Hsueh
Jingwen Leng
Yuxian Qiu
Yue Guan
Zehuan Wang
Xiaoying Jia
Xipeng Li
Minyi Guo
Yuhao Zhu
167
90
0
29 Aug 2020
Deep Spatial Transformation for Pose-Guided Person Image Generation and Animation
IEEE Transactions on Image Processing (TIP), 2020
Yurui Ren
Ge Li
Shan Liu
Thomas H. Li
3DH
270
75
0
27 Aug 2020
AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization
Findings (Findings), 2020
Xinsong Zhang
Pengshuai Li
Hang Li
391
56
0
27 Aug 2020
Generating Music with a Self-Correcting Non-Chronological Autoregressive Model
Wayne Chi
Prachi Kumar
Suri Yaddanapudi
Rahul Suresh
Umut Isik
KELM
216
10
0
18 Aug 2020
PopMAG: Pop Music Accompaniment Generation
Yi Ren
Jinzheng He
Xu Tan
Tao Qin
Zhou Zhao
Tie-Yan Liu
225
132
0
18 Aug 2020
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Albert Gu
Tri Dao
Stefano Ermon
Atri Rudra
Christopher Ré
407
813
0
17 Aug 2020
Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size
Davis Yoshida
Allyson Ettinger
Kevin Gimpel
AI4CE
200
7
0
16 Aug 2020
Compression of Deep Learning Models for Text: A Survey
ACM Transactions on Knowledge Discovery from Data (TKDD), 2020
Manish Gupta
Puneet Agrawal
VLM
MedIm
AI4CE
511
134
0
12 Aug 2020
DeLighT: Deep and Light-weight Transformer
Sachin Mehta
Marjan Ghazvininejad
Srini Iyer
Luke Zettlemoyer
Hannaneh Hajishirzi
VLM
249
34
0
03 Aug 2020
The Chess Transformer: Mastering Play using Generative Language Models
David Noever
Matt Ciolino
Josh Kalin
574
45
0
02 Aug 2020
Neural Language Generation: Formulation, Methods, and Evaluation
Cristina Garbacea
Qiaozhu Mei
362
30
0
31 Jul 2020
Linear Attention Mechanism: An Efficient Attention for Semantic Segmentation
Rui Li
Jianlin Su
Chenxi Duan
Shunyi Zheng
3DV
172
47
0
29 Jul 2020
TensorCoder: Dimension-Wise Attention via Tensor Representation for Natural Language Modeling
Shuai Zhang
Peng Zhang
Xindian Ma
Junqiu Wei
Ning Wang
Qun Liu
122
5
0
28 Jul 2020
Big Bird: Transformers for Longer Sequences
Neural Information Processing Systems (NeurIPS), 2020
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
1.3K
2,532
0
28 Jul 2020
Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks
Kirill Mazur
Victor Lempitsky
3DPC
400
49
0
22 Jul 2020
DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation
Alexandre Carlier
Martin Danelljan
Alexandre Alahi
Radu Timofte
517
178
0
22 Jul 2020
Conformer-Kernel with Query Term Independence for Document Retrieval
Bhaskar Mitra
Sebastian Hofstatter
Hamed Zamani
Nick Craswell
172
22
0
20 Jul 2020
Autoregressive Unsupervised Image Segmentation
European Conference on Computer Vision (ECCV), 2020
Yassine Ouali
C´eline Hudelot
Myriam Tami
SSL
236
89
0
16 Jul 2020
ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing
bioRxiv (bioRxiv), 2020
Ahmed Elnaggar
M. Heinzinger
Christian Dallago
Ghalia Rehawi
Yu Wang
...
Tamas B. Fehér
Christoph Angerer
Martin Steinegger
D. Bhowmik
B. Rost
DRL
472
1,156
0
13 Jul 2020
Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation
Aditya Mogadala
Marius Mosbach
Dietrich Klakow
VLM
725
0
0
12 Jul 2020
Variable Skipping for Autoregressive Range Density Estimation
International Conference on Machine Learning (ICML), 2020
Eric Liang
Zongheng Yang
Ion Stoica
Pieter Abbeel
Yan Duan
Xi Chen
193
4
0
10 Jul 2020
Fast Transformers with Clustered Attention
Neural Information Processing Systems (NeurIPS), 2020
Apoorv Vyas
Angelos Katharopoulos
Franccois Fleuret
281
171
0
09 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
418
168
0
30 Jun 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
724
2,328
0
29 Jun 2020
Matrix Shuffle-Exchange Networks for Hard 2D Tasks
Emīls Ozoliņš
Kārlis Freivalds
A. Sostaks
99
0
0
29 Jun 2020
Streaming Transformer ASR with Blockwise Synchronous Beam Search
E. Tsunoo
Yosuke Kashiwagi
Shinji Watanabe
313
11
0
25 Jun 2020
Locally Masked Convolution for Autoregressive Models
Ajay Jain
Pieter Abbeel
Deepak Pathak
DiffM
OffRL
203
32
0
22 Jun 2020
Memory Transformer
Andrey Kravchenko
Yuri Kuratov
Anton Peganov
Grigory V. Sapunov
RALM
248
85
0
20 Jun 2020
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
5.1K
26,105
0
19 Jun 2020
Sparse GPU Kernels for Deep Learning
Trevor Gale
Matei A. Zaharia
C. Young
Erich Elsen
270
265
0
18 Jun 2020
A Tutorial on VAEs: From Bayes' Rule to Lossless Compression
Ronald Yu
BDL
160
26
0
18 Jun 2020
SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization
Yao-Min Zhao
Mohammad Saleh
Peter J. Liu
RALM
167
27
0
18 Jun 2020
Untangling tradeoffs between recurrence and self-attention in neural networks
Giancarlo Kerg
Bhargav Kanuparthi
Anirudh Goyal
Kyle Goyette
Yoshua Bengio
Guillaume Lajoie
170
9
0
16 Jun 2020
AlgebraNets
Jordan Hoffmann
Simon Schmitt
Simon Osindero
Karen Simonyan
Erich Elsen
MoE
410
6
0
12 Jun 2020
Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning
International Conference on Learning Representations (ICLR), 2020
Ruozi Huang
Huang Hu
Wei Wu
Kei Sawada
Mi Zhang
Daxin Jiang
515
133
0
11 Jun 2020
Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers
Tsung-Han Wu
Chun-Chen Hsieh
Yen-Hao Chen
Po-Han Chi
Hung-yi Lee
223
1
0
09 Jun 2020
O
(
n
)
O(n)
O
(
n
)
Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Chulhee Yun
Yin-Wen Chang
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
230
95
0
08 Jun 2020
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
450
2,075
0
08 Jun 2020
Previous
1
2
3
...
23
24
25
26
Next
Page 24 of 26
Page
of 26
Go