Generating Long Sequences with Sparse Transformers

23 April 2019

Papers citing "Generating Long Sequences with Sparse Transformers"

50 / 1,283 papers shown

TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale UpNeural Information Processing Systems (NeurIPS), 2021

603

464

14 Feb 2021

Transformer Language Models with LSTM-based Cross-utterance Information RepresentationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

G. Sun

Chuxu Zhang

P. Woodland

225

12 Feb 2021

Is Space-Time Attention All You Need for Video Understanding?International Conference on Machine Learning (ICML), 2021

Gedas Bertasius

Heng Wang

Lorenzo Torresani

ViT

1.1K

2,656

09 Feb 2021

Colorization TransformerInternational Conference on Learning Representations (ICLR), 2021

625

164

08 Feb 2021

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

485

4,880

08 Feb 2021

Nyströmformer: A Nyström-Based Algorithm for Approximating Self-AttentionAAAI Conference on Artificial Intelligence (AAAI), 2021

Mingxing Tan

Yin Li

413

621

07 Feb 2021

Mind the Gap: Assessing Temporal Generalization in Neural Language ModelsNeural Information Processing Systems (NeurIPS), 2021

...

446

251

03 Feb 2021

TT-Rec: Tensor Train Compression for Deep Learning Recommendation ModelsConference on Machine Learning and Systems (MLSys), 2021

273

115

25 Jan 2021

Maximum Likelihood Training of Score-Based Diffusion ModelsNeural Information Processing Systems (NeurIPS), 2021

787

804

22 Jan 2021

SSTVOS: Sparse Spatiotemporal Transformers for Video Object SegmentationComputer Vision and Pattern Recognition (CVPR), 2021

242

190

21 Jan 2021

PGT: Pseudo Relevance Feedback Using a Graph-Based TransformerEuropean Conference on Information Retrieval (ECIR), 2021

HongChien Yu

Zhuyun Dai

Jamie Callan

119

20 Jan 2021

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient SparsityJournal of machine learning research (JMLR), 2021

577

3,139

11 Jan 2021

Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed HypergraphsAAAI Conference on Artificial Intelligence (AAAI), 2021

272

228

07 Jan 2021

Transformers in Vision: A SurveyACM Computing Surveys (CSUR), 2021

Salman Khan

924

3,176

04 Jan 2021

Shortformer: Better Language Modeling using Shorter InputsAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

Ofir Press

Noah A. Smith

M. Lewis

664

31 Dec 2020

ERNIE-Doc: A Retrospective Long-Document Modeling TransformerAnnual Meeting of the Association for Computational Linguistics (ACL), 2021

270

31 Dec 2020

RealFormer: Transformer Likes Residual AttentionFindings (Findings), 2020

Ruining He

Anirudh Ravula

Bhargav Kanagal

Joshua Ainslie

326

127

21 Dec 2020

Sub-Linear Memory: How to Make Performers SLiMNeural Information Processing Systems (NeurIPS), 2020

Valerii Likhosherstov

235

21 Dec 2020

Taming Transformers for High-Resolution Image SynthesisComputer Vision and Pattern Recognition (CVPR), 2020

728

3,800

17 Dec 2020

SceneFormer: Indoor Scene Generation with TransformersInternational Conference on 3D Vision (3DV), 2020

Xinpeng Wang

Chandan Yeshwanth

Matthias Nießner

ViT 3DPC

246

188

17 Dec 2020

Revisiting Linformer with a modified self-attention with linear complexity

Madhusudan Verma

117

16 Dec 2020

Learning Energy-Based Models by Diffusion Recovery LikelihoodInternational Conference on Learning Representations (ICLR), 2020

Ruiqi Gao

338

137

15 Dec 2020

MaX-DeepLab: End-to-End Panoptic Segmentation with Mask TransformersComputer Vision and Pattern Recognition (CVPR), 2020

665

592

01 Dec 2020

Multi-stage Attention ResU-Net for Semantic Segmentation of Fine-Resolution Remote Sensing ImagesIEEE Geoscience and Remote Sensing Letters (GRSL), 2020

256

252

29 Nov 2020

Direct Evolutionary Optimization of Variational Autoencoders With Binary Latents

247

27 Nov 2020

A Survey of Deep Learning Approaches for OCR and Document Understanding

167

27 Nov 2020

Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on ImagesInternational Conference on Learning Representations (ICLR), 2020

R. Child

BDL VLM

453

381

20 Nov 2020

Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural NetworksInternational Conference on Language Resources and Evaluation (LREC), 2020

277

20 Nov 2020

EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform for NLP ApplicationsInternational Conference on Information and Knowledge Management (CIKM), 2020

Minghui Qiu

Peng Li

Chengyu Wang

...

Yaliang Li

362

18 Nov 2020

s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis

108

17 Nov 2020

Long Range Arena: A Benchmark for Efficient Transformers

383

835

08 Nov 2020

Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers

579

351

05 Nov 2020

Deep Learning in Computer-Aided Diagnosis and Treatment of Tumors: A Survey

228

02 Nov 2020

Scaling Laws for Autoregressive Generative Modeling

...

474

558

28 Oct 2020

Memory Optimization for Deep NetworksInternational Conference on Learning Representations (ICLR), 2020

160

27 Oct 2020

Accelerating Training of Transformer-Based Language Models with Progressive Layer DroppingNeural Information Processing Systems (NeurIPS), 2020

Minjia Zhang

Yuxiong He

AI4CE

149

119

26 Oct 2020

Long Document Ranking with Query-Directed Sparse TransformerFindings (Findings), 2020

182

23 Oct 2020

Limitations of Autoregressive Models and Their Alternatives

174

22 Oct 2020

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy

...

1.4K

55,389

22 Oct 2020

N-ODE Transformer: A Depth-Adaptive Variant of the Transformer Using Neural Ordinary Differential Equations

Aaron Baier-Reinio

H. Sterck

148

22 Oct 2020

Open Question Answering over Tables and TextInternational Conference on Learning Representations (ICLR), 2020

308

227

20 Oct 2020

Rethinking Document-level Neural Machine TranslationFindings (Findings), 2020

Lei Li

360

18 Oct 2020

Adaptive Feature Selection for End-to-End Speech TranslationFindings (Findings), 2020

170

16 Oct 2020

Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers

215

15 Oct 2020

Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries

Jiwei Li

215

14 Oct 2020

Memformer: A Memory-Augmented Transformer for Sequence Modeling

Kun Qian

221

14 Oct 2020

Zero-shot Entity Linking with Efficient Long Range Sequence Modeling

226

12 Oct 2020

SMYRF: Efficient Attention using Asymmetric Clustering

238

11 Oct 2020

Deformable DETR: Deformable Transformers for End-to-End Object DetectionInternational Conference on Learning Representations (ICLR), 2020

Weijie Su

765

6,703

08 Oct 2020

Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications

105

06 Oct 2020