Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1904.10509
Cited By
Generating Long Sequences with Sparse Transformers
23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Generating Long Sequences with Sparse Transformers"
50 / 1,283 papers shown
TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up
Neural Information Processing Systems (NeurIPS), 2021
Lezhi Li
Shiyu Chang
Zinan Lin
ViT
603
464
0
14 Feb 2021
Transformer Language Models with LSTM-based Cross-utterance Information Representation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
G. Sun
Chuxu Zhang
P. Woodland
225
35
0
12 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
International Conference on Machine Learning (ICML), 2021
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
1.1K
2,656
0
09 Feb 2021
Colorization Transformer
International Conference on Learning Representations (ICLR), 2021
Manoj Kumar
Dirk Weissenborn
Nal Kalchbrenner
ViT
625
164
0
08 Feb 2021
TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation
Jieneng Chen
Yongyi Lu
Qihang Yu
Xiangde Luo
Ehsan Adeli
Yan Wang
Le Lu
Alan Yuille
Yuyin Zhou
ViT
MedIm
485
4,880
0
08 Feb 2021
Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention
AAAI Conference on Artificial Intelligence (AAAI), 2021
Yunyang Xiong
Zhanpeng Zeng
Rudrasis Chakraborty
Mingxing Tan
G. Fung
Yin Li
Vikas Singh
413
621
0
07 Feb 2021
Mind the Gap: Assessing Temporal Generalization in Neural Language Models
Neural Information Processing Systems (NeurIPS), 2021
Angeliki Lazaridou
A. Kuncoro
E. Gribovskaya
Devang Agrawal
Adam Liska
...
Sebastian Ruder
Dani Yogatama
Kris Cao
Susannah Young
Phil Blunsom
VLM
446
251
0
03 Feb 2021
TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models
Conference on Machine Learning and Systems (MLSys), 2021
Chunxing Yin
Bilge Acun
Xing Liu
Carole-Jean Wu
273
115
0
25 Jan 2021
Maximum Likelihood Training of Score-Based Diffusion Models
Neural Information Processing Systems (NeurIPS), 2021
Yang Song
Conor Durkan
Iain Murray
Stefano Ermon
DiffM
787
804
0
22 Jan 2021
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Computer Vision and Pattern Recognition (CVPR), 2021
Brendan Duke
Abdalla Ahmed
Christian Wolf
P. Aarabi
Graham W. Taylor
VOS
242
190
0
21 Jan 2021
PGT: Pseudo Relevance Feedback Using a Graph-Based Transformer
European Conference on Information Retrieval (ECIR), 2021
HongChien Yu
Zhuyun Dai
Jamie Callan
119
24
0
20 Jan 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Journal of machine learning research (JMLR), 2021
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
577
3,139
0
11 Jan 2021
Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs
AAAI Conference on Artificial Intelligence (AAAI), 2021
Wen-Yi Hsiao
Jen-Yu Liu
Yin-Cheng Yeh
Yi-Hsuan Yang
272
228
0
07 Jan 2021
Transformers in Vision: A Survey
ACM Computing Surveys (CSUR), 2021
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
924
3,176
0
04 Jan 2021
Shortformer: Better Language Modeling using Shorter Inputs
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Ofir Press
Noah A. Smith
M. Lewis
664
96
0
31 Dec 2020
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Siyu Ding
Junyuan Shang
Shuohuan Wang
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
270
62
0
31 Dec 2020
RealFormer: Transformer Likes Residual Attention
Findings (Findings), 2020
Ruining He
Anirudh Ravula
Bhargav Kanagal
Joshua Ainslie
326
127
0
21 Dec 2020
Sub-Linear Memory: How to Make Performers SLiM
Neural Information Processing Systems (NeurIPS), 2020
Valerii Likhosherstov
K. Choromanski
Jared Davis
Xingyou Song
Adrian Weller
235
21
0
21 Dec 2020
Taming Transformers for High-Resolution Image Synthesis
Computer Vision and Pattern Recognition (CVPR), 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
ViT
728
3,800
0
17 Dec 2020
SceneFormer: Indoor Scene Generation with Transformers
International Conference on 3D Vision (3DV), 2020
Xinpeng Wang
Chandan Yeshwanth
Matthias Nießner
ViT
3DPC
246
188
0
17 Dec 2020
Revisiting Linformer with a modified self-attention with linear complexity
Madhusudan Verma
117
8
0
16 Dec 2020
Learning Energy-Based Models by Diffusion Recovery Likelihood
International Conference on Learning Representations (ICLR), 2020
Ruiqi Gao
Yang Song
Ben Poole
Ying Nian Wu
Diederik P. Kingma
DiffM
338
137
0
15 Dec 2020
MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
Computer Vision and Pattern Recognition (CVPR), 2020
Huiyu Wang
Yukun Zhu
Hartwig Adam
Alan Yuille
Liang-Chieh Chen
ViT
665
592
0
01 Dec 2020
Multi-stage Attention ResU-Net for Semantic Segmentation of Fine-Resolution Remote Sensing Images
IEEE Geoscience and Remote Sensing Letters (GRSL), 2020
Rui Li
Shunyi Zheng
Chenxi Duan
Jianlin Su
Ce Zhang
256
252
0
29 Nov 2020
Direct Evolutionary Optimization of Variational Autoencoders With Binary Latents
E. Guiraud
Jakob Drefs
Jörg Lücke
DRL
247
3
0
27 Nov 2020
A Survey of Deep Learning Approaches for OCR and Document Understanding
Nishant Subramani
Alexandre Matton
Malcolm Greaves
Adrian Lam
167
76
0
27 Nov 2020
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
International Conference on Learning Representations (ICLR), 2020
R. Child
BDL
VLM
453
381
0
20 Nov 2020
Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks
International Conference on Language Resources and Evaluation (LREC), 2020
Ileana Rugina
Rumen Dangovski
L. Jing
Preslav Nakov
Marin Soljacic
277
0
0
20 Nov 2020
EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform for NLP Applications
International Conference on Information and Knowledge Management (CIKM), 2020
Minghui Qiu
Peng Li
Chengyu Wang
Hanjie Pan
Yaliang Li
...
Jun Yang
Yaliang Li
Yanjie Liang
Deng Cai
Jialin Li
VLM
SyDa
362
20
0
18 Nov 2020
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
Xi Wang
Huaiping Ming
Lei He
Frank Soong
108
5
0
17 Nov 2020
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
383
835
0
08 Nov 2020
Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers
Zhaoshuo Li
Xingtong Liu
Nathan G. Drenkow
Andy S Ding
Francis X. Creighton
Russell H. Taylor
Mathias Unberath
MDE
ViT
579
351
0
05 Nov 2020
Deep Learning in Computer-Aided Diagnosis and Treatment of Tumors: A Survey
Dan Zhao
Guizhi Xu
Xu Zhenghua
Thomas Lukasiewicz
Minmin Xue
Zhigang Fu
OOD
228
4
0
02 Nov 2020
Scaling Laws for Autoregressive Generative Modeling
T. Henighan
Jared Kaplan
Mor Katz
Mark Chen
Christopher Hesse
...
Nick Ryder
Daniel M. Ziegler
John Schulman
Dario Amodei
Sam McCandlish
474
558
0
28 Oct 2020
Memory Optimization for Deep Networks
International Conference on Learning Representations (ICLR), 2020
Aashaka Shah
Chaoxia Wu
Jayashree Mohan
Vijay Chidambaram
Philipp Krahenbuhl
160
27
0
27 Oct 2020
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Neural Information Processing Systems (NeurIPS), 2020
Minjia Zhang
Yuxiong He
AI4CE
149
119
0
26 Oct 2020
Long Document Ranking with Query-Directed Sparse Transformer
Findings (Findings), 2020
Jyun-Yu Jiang
Chenyan Xiong
Chia-Jung Lee
Wei Wang
182
27
0
23 Oct 2020
Limitations of Autoregressive Models and Their Alternatives
Chu-cheng Lin
Aaron Jaech
Xin Li
Matthew R. Gormley
Jason Eisner
174
73
0
22 Oct 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
1.4K
55,389
0
22 Oct 2020
N-ODE Transformer: A Depth-Adaptive Variant of the Transformer Using Neural Ordinary Differential Equations
Aaron Baier-Reinio
H. Sterck
148
11
0
22 Oct 2020
Open Question Answering over Tables and Text
International Conference on Learning Representations (ICLR), 2020
Wenhu Chen
Ming-Wei Chang
Eva Schlinger
Wenjie Wang
William W. Cohen
LMTD
RALM
308
227
0
20 Oct 2020
Rethinking Document-level Neural Machine Translation
Findings (Findings), 2020
Zewei Sun
Mingxuan Wang
Hao Zhou
Chengqi Zhao
Shujian Huang
Jiajun Chen
Lei Li
VLM
360
53
0
18 Oct 2020
Adaptive Feature Selection for End-to-End Speech Translation
Findings (Findings), 2020
Biao Zhang
Ivan Titov
Barry Haddow
Rico Sennrich
170
41
0
16 Oct 2020
Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers
Alex Lamb
Anirudh Goyal
A. Slowik
Michael C. Mozer
Philippe Beaudoin
Yoshua Bengio
215
3
0
15 Oct 2020
Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries
Xiaofei Sun
Zijun Sun
Yuxian Meng
Jiwei Li
Chun Fan
215
24
0
14 Oct 2020
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Qingyang Wu
Zhenzhong Lan
Kun Qian
Jing Gu
A. Geramifard
Zhou Yu
221
68
0
14 Oct 2020
Zero-shot Entity Linking with Efficient Long Range Sequence Modeling
Zonghai Yao
Liangliang Cao
Huapu Pan
VLM
226
24
0
12 Oct 2020
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
238
49
0
11 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
International Conference on Learning Representations (ICLR), 2020
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
765
6,703
0
08 Oct 2020
Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications
Matthew Khoury
Rumen Dangovski
L. Ou
Preslav Nakov
Yichen Shen
L. Jing
105
0
0
06 Oct 2020
Previous
1
2
3
...
22
23
24
25
26
Next
Page 23 of 26
Page
of 26
Go