ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.10509
  4. Cited By
Generating Long Sequences with Sparse Transformers

Generating Long Sequences with Sparse Transformers

23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
ArXiv (abs)PDFHTML

Papers citing "Generating Long Sequences with Sparse Transformers"

50 / 1,283 papers shown
TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can
  Scale Up
TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale UpNeural Information Processing Systems (NeurIPS), 2021
Lezhi Li
Shiyu Chang
Zinan Lin
ViT
603
464
0
14 Feb 2021
Transformer Language Models with LSTM-based Cross-utterance Information
  Representation
Transformer Language Models with LSTM-based Cross-utterance Information RepresentationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
G. Sun
Chuxu Zhang
P. Woodland
225
35
0
12 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?International Conference on Machine Learning (ICML), 2021
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
1.1K
2,656
0
09 Feb 2021
Colorization Transformer
Colorization TransformerInternational Conference on Learning Representations (ICLR), 2021
Manoj Kumar
Dirk Weissenborn
Nal Kalchbrenner
ViT
625
164
0
08 Feb 2021
TransUNet: Transformers Make Strong Encoders for Medical Image
  Segmentation
TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation
Jieneng Chen
Yongyi Lu
Qihang Yu
Xiangde Luo
Ehsan Adeli
Yan Wang
Le Lu
Alan Yuille
Yuyin Zhou
ViTMedIm
485
4,880
0
08 Feb 2021
Nyströmformer: A Nyström-Based Algorithm for Approximating
  Self-Attention
Nyströmformer: A Nyström-Based Algorithm for Approximating Self-AttentionAAAI Conference on Artificial Intelligence (AAAI), 2021
Yunyang Xiong
Zhanpeng Zeng
Rudrasis Chakraborty
Mingxing Tan
G. Fung
Yin Li
Vikas Singh
413
621
0
07 Feb 2021
Mind the Gap: Assessing Temporal Generalization in Neural Language
  Models
Mind the Gap: Assessing Temporal Generalization in Neural Language ModelsNeural Information Processing Systems (NeurIPS), 2021
Angeliki Lazaridou
A. Kuncoro
E. Gribovskaya
Devang Agrawal
Adam Liska
...
Sebastian Ruder
Dani Yogatama
Kris Cao
Susannah Young
Phil Blunsom
VLM
446
251
0
03 Feb 2021
TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models
TT-Rec: Tensor Train Compression for Deep Learning Recommendation ModelsConference on Machine Learning and Systems (MLSys), 2021
Chunxing Yin
Bilge Acun
Xing Liu
Carole-Jean Wu
273
115
0
25 Jan 2021
Maximum Likelihood Training of Score-Based Diffusion Models
Maximum Likelihood Training of Score-Based Diffusion ModelsNeural Information Processing Systems (NeurIPS), 2021
Yang Song
Conor Durkan
Iain Murray
Stefano Ermon
DiffM
787
804
0
22 Jan 2021
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
SSTVOS: Sparse Spatiotemporal Transformers for Video Object SegmentationComputer Vision and Pattern Recognition (CVPR), 2021
Brendan Duke
Abdalla Ahmed
Christian Wolf
P. Aarabi
Graham W. Taylor
VOS
242
190
0
21 Jan 2021
PGT: Pseudo Relevance Feedback Using a Graph-Based Transformer
PGT: Pseudo Relevance Feedback Using a Graph-Based TransformerEuropean Conference on Information Retrieval (ECIR), 2021
HongChien Yu
Zhuyun Dai
Jamie Callan
119
24
0
20 Jan 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple
  and Efficient Sparsity
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient SparsityJournal of machine learning research (JMLR), 2021
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
577
3,139
0
11 Jan 2021
Compound Word Transformer: Learning to Compose Full-Song Music over
  Dynamic Directed Hypergraphs
Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed HypergraphsAAAI Conference on Artificial Intelligence (AAAI), 2021
Wen-Yi Hsiao
Jen-Yu Liu
Yin-Cheng Yeh
Yi-Hsuan Yang
272
228
0
07 Jan 2021
Transformers in Vision: A Survey
Transformers in Vision: A SurveyACM Computing Surveys (CSUR), 2021
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
924
3,176
0
04 Jan 2021
Shortformer: Better Language Modeling using Shorter Inputs
Shortformer: Better Language Modeling using Shorter InputsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Ofir Press
Noah A. Smith
M. Lewis
664
96
0
31 Dec 2020
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
ERNIE-Doc: A Retrospective Long-Document Modeling TransformerAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Siyu Ding
Junyuan Shang
Shuohuan Wang
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
270
62
0
31 Dec 2020
RealFormer: Transformer Likes Residual Attention
RealFormer: Transformer Likes Residual AttentionFindings (Findings), 2020
Ruining He
Anirudh Ravula
Bhargav Kanagal
Joshua Ainslie
326
127
0
21 Dec 2020
Sub-Linear Memory: How to Make Performers SLiM
Sub-Linear Memory: How to Make Performers SLiMNeural Information Processing Systems (NeurIPS), 2020
Valerii Likhosherstov
K. Choromanski
Jared Davis
Xingyou Song
Adrian Weller
235
21
0
21 Dec 2020
Taming Transformers for High-Resolution Image Synthesis
Taming Transformers for High-Resolution Image SynthesisComputer Vision and Pattern Recognition (CVPR), 2020
Patrick Esser
Robin Rombach
Bjorn Ommer
ViT
728
3,800
0
17 Dec 2020
SceneFormer: Indoor Scene Generation with Transformers
SceneFormer: Indoor Scene Generation with TransformersInternational Conference on 3D Vision (3DV), 2020
Xinpeng Wang
Chandan Yeshwanth
Matthias Nießner
ViT3DPC
246
188
0
17 Dec 2020
Revisiting Linformer with a modified self-attention with linear
  complexity
Revisiting Linformer with a modified self-attention with linear complexity
Madhusudan Verma
117
8
0
16 Dec 2020
Learning Energy-Based Models by Diffusion Recovery Likelihood
Learning Energy-Based Models by Diffusion Recovery LikelihoodInternational Conference on Learning Representations (ICLR), 2020
Ruiqi Gao
Yang Song
Ben Poole
Ying Nian Wu
Diederik P. Kingma
DiffM
338
137
0
15 Dec 2020
MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
MaX-DeepLab: End-to-End Panoptic Segmentation with Mask TransformersComputer Vision and Pattern Recognition (CVPR), 2020
Huiyu Wang
Yukun Zhu
Hartwig Adam
Alan Yuille
Liang-Chieh Chen
ViT
665
592
0
01 Dec 2020
Multi-stage Attention ResU-Net for Semantic Segmentation of
  Fine-Resolution Remote Sensing Images
Multi-stage Attention ResU-Net for Semantic Segmentation of Fine-Resolution Remote Sensing ImagesIEEE Geoscience and Remote Sensing Letters (GRSL), 2020
Rui Li
Shunyi Zheng
Chenxi Duan
Jianlin Su
Ce Zhang
256
252
0
29 Nov 2020
Direct Evolutionary Optimization of Variational Autoencoders With Binary
  Latents
Direct Evolutionary Optimization of Variational Autoencoders With Binary Latents
E. Guiraud
Jakob Drefs
Jörg Lücke
DRL
247
3
0
27 Nov 2020
A Survey of Deep Learning Approaches for OCR and Document Understanding
A Survey of Deep Learning Approaches for OCR and Document Understanding
Nishant Subramani
Alexandre Matton
Malcolm Greaves
Adrian Lam
167
76
0
27 Nov 2020
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them
  on Images
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on ImagesInternational Conference on Learning Representations (ICLR), 2020
R. Child
BDLVLM
453
381
0
20 Nov 2020
Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural
  Networks
Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural NetworksInternational Conference on Language Resources and Evaluation (LREC), 2020
Ileana Rugina
Rumen Dangovski
L. Jing
Preslav Nakov
Marin Soljacic
277
0
0
20 Nov 2020
EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform
  for NLP Applications
EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform for NLP ApplicationsInternational Conference on Information and Knowledge Management (CIKM), 2020
Minghui Qiu
Peng Li
Chengyu Wang
Hanjie Pan
Yaliang Li
...
Jun Yang
Yaliang Li
Yanjie Liang
Deng Cai
Jialin Li
VLMSyDa
362
20
0
18 Nov 2020
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
Xi Wang
Huaiping Ming
Lei He
Frank Soong
108
5
0
17 Nov 2020
Long Range Arena: A Benchmark for Efficient Transformers
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
383
835
0
08 Nov 2020
Revisiting Stereo Depth Estimation From a Sequence-to-Sequence
  Perspective with Transformers
Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers
Zhaoshuo Li
Xingtong Liu
Nathan G. Drenkow
Andy S Ding
Francis X. Creighton
Russell H. Taylor
Mathias Unberath
MDEViT
579
351
0
05 Nov 2020
Deep Learning in Computer-Aided Diagnosis and Treatment of Tumors: A
  Survey
Deep Learning in Computer-Aided Diagnosis and Treatment of Tumors: A Survey
Dan Zhao
Guizhi Xu
Xu Zhenghua
Thomas Lukasiewicz
Minmin Xue
Zhigang Fu
OOD
228
4
0
02 Nov 2020
Scaling Laws for Autoregressive Generative Modeling
Scaling Laws for Autoregressive Generative Modeling
T. Henighan
Jared Kaplan
Mor Katz
Mark Chen
Christopher Hesse
...
Nick Ryder
Daniel M. Ziegler
John Schulman
Dario Amodei
Sam McCandlish
474
558
0
28 Oct 2020
Memory Optimization for Deep Networks
Memory Optimization for Deep NetworksInternational Conference on Learning Representations (ICLR), 2020
Aashaka Shah
Chaoxia Wu
Jayashree Mohan
Vijay Chidambaram
Philipp Krahenbuhl
160
27
0
27 Oct 2020
Accelerating Training of Transformer-Based Language Models with
  Progressive Layer Dropping
Accelerating Training of Transformer-Based Language Models with Progressive Layer DroppingNeural Information Processing Systems (NeurIPS), 2020
Minjia Zhang
Yuxiong He
AI4CE
149
119
0
26 Oct 2020
Long Document Ranking with Query-Directed Sparse Transformer
Long Document Ranking with Query-Directed Sparse TransformerFindings (Findings), 2020
Jyun-Yu Jiang
Chenyan Xiong
Chia-Jung Lee
Wei Wang
182
27
0
23 Oct 2020
Limitations of Autoregressive Models and Their Alternatives
Limitations of Autoregressive Models and Their Alternatives
Chu-cheng Lin
Aaron Jaech
Xin Li
Matthew R. Gormley
Jason Eisner
174
73
0
22 Oct 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
1.4K
55,389
0
22 Oct 2020
N-ODE Transformer: A Depth-Adaptive Variant of the Transformer Using
  Neural Ordinary Differential Equations
N-ODE Transformer: A Depth-Adaptive Variant of the Transformer Using Neural Ordinary Differential Equations
Aaron Baier-Reinio
H. Sterck
148
11
0
22 Oct 2020
Open Question Answering over Tables and Text
Open Question Answering over Tables and TextInternational Conference on Learning Representations (ICLR), 2020
Wenhu Chen
Ming-Wei Chang
Eva Schlinger
Wenjie Wang
William W. Cohen
LMTDRALM
308
227
0
20 Oct 2020
Rethinking Document-level Neural Machine Translation
Rethinking Document-level Neural Machine TranslationFindings (Findings), 2020
Zewei Sun
Mingxuan Wang
Hao Zhou
Chengqi Zhao
Shujian Huang
Jiajun Chen
Lei Li
VLM
360
53
0
18 Oct 2020
Adaptive Feature Selection for End-to-End Speech Translation
Adaptive Feature Selection for End-to-End Speech TranslationFindings (Findings), 2020
Biao Zhang
Ivan Titov
Barry Haddow
Rico Sennrich
170
41
0
16 Oct 2020
Neural Function Modules with Sparse Arguments: A Dynamic Approach to
  Integrating Information across Layers
Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers
Alex Lamb
Anirudh Goyal
A. Slowik
Michael C. Mozer
Philippe Beaudoin
Yoshua Bengio
215
3
0
15 Oct 2020
Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical
  Supervision from Extractive Summaries
Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries
Xiaofei Sun
Zijun Sun
Yuxian Meng
Jiwei Li
Chun Fan
215
24
0
14 Oct 2020
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Qingyang Wu
Zhenzhong Lan
Kun Qian
Jing Gu
A. Geramifard
Zhou Yu
221
68
0
14 Oct 2020
Zero-shot Entity Linking with Efficient Long Range Sequence Modeling
Zero-shot Entity Linking with Efficient Long Range Sequence Modeling
Zonghai Yao
Liangliang Cao
Huapu Pan
VLM
226
24
0
12 Oct 2020
SMYRF: Efficient Attention using Asymmetric Clustering
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
238
49
0
11 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Deformable DETR: Deformable Transformers for End-to-End Object DetectionInternational Conference on Learning Representations (ICLR), 2020
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
765
6,703
0
08 Oct 2020
Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for
  Low-Latency Inference in NLP Applications
Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications
Matthew Khoury
Rumen Dangovski
L. Ou
Preslav Nakov
Yichen Shen
L. Jing
105
0
0
06 Oct 2020
Previous
123...2223242526
Next
Page 23 of 26
Pageof 26