Do Transformers Need Deep Long-Range Memory

7 July 2020

Jack W. Rae

Ali Razavi

RALM

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Do Transformers Need Deep Long-Range Memory"

28 / 28 papers shown

StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns

Luanbo Wan

Weizhi Ma

LLMAG KELM

257

16 Jun 2025

What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?

233

28 May 2025

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

587

08 Apr 2025

Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers

Zilong Zheng

285

24 Jun 2024

Are queries and keys always relevant? A case study on Transformer wave functions

Riccardo Rende

Luciano Loris Viteritti

327

29 May 2024

SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget

Zihao Wang

Shaoduo Gan

310

07 Apr 2024

Masked Audio Generation using a Single Non-Autoregressive TransformerInternational Conference on Learning Representations (ICLR), 2024

Yossi Adi

478

09 Jan 2024

Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention

Dong Yu

300

14 Dec 2023

AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents

Jake Grigsby

Linxi Fan

Yuke Zhu

OffRL LM&Ro

408

15 Oct 2023

Long-range Language Modeling with Self-retrievalTransactions of the Association for Computational Linguistics (TACL), 2023

Ohad Rubin

Jonathan Berant

RALM KELM

273

23 Jun 2023

DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion ModelsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023

Zhiyong Wu

240

109

08 May 2023

Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length ExtrapolationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Ta-Chung Chi

Ting-Han Fan

Alexander I. Rudnicky

Peter J. Ramadge

LRM

186

05 May 2023

What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum EntanglementNeural Information Processing Systems (NeurIPS), 2023

426

20 Mar 2023

Dissociating language and thought in large language models

Nancy Kanwisher

404

233

16 Jan 2023

iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer

307

14 Jul 2022

Embedding Recycling for Language ModelsFindings (Findings), 2022

Jon Saad-Falcon

Amanpreet Singh

Luca Soldaini

Mike DÁrcy

Arman Cohan

Doug Downey

KELM

225

11 Jul 2022

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessNeural Information Processing Systems (NeurIPS), 2022

958

3,833

27 May 2022

The NLP Task Effectiveness of Long-Range TransformersConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022

Guanghui Qin

Yukun Feng

Benjamin Van Durme

309

16 Feb 2022

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video RecognitionComputer Vision and Pattern Recognition (CVPR), 2022

Christoph Feichtenhofer

ViT

517

259

20 Jan 2022

Well Googled is Half Done: Multimodal Forecasting of New Fashion Product Sales with Image-based Google Trends

370

20 Sep 2021

Do Long-Range Language Models Actually Use Long-Range Context?

Simeng Sun

Kalpesh Krishna

Andrew Mattarella-Micke

Mohit Iyyer

RALM

291

102

19 Sep 2021

Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN

Rahma Chaabouni

Roberto Dessì

Eugene Kharitonov

293

03 Jul 2021

EchoFilter: End-to-End Neural Network for Acoustic Echo Cancellation

140

31 May 2021

Long Range Arena: A Benchmark for Efficient Transformers

577

861

08 Nov 2020

Sparsifying Transformer Models with Trainable Representation PoolingAnnual Meeting of the Association for Computational Linguistics (ACL), 2020

Michal Pietruszka

Łukasz Borchmann

Lukasz Garncarek

299

10 Sep 2020

Neural Language Generation: Formulation, Methods, and Evaluation

Cristina Garbacea

Qiaozhu Mei

406

31 Jul 2020

Efficient Content-Based Sparse Attention with Routing TransformersTransactions of the Association for Computational Linguistics (TACL), 2020

1.1K

732

12 Mar 2020

Frustratingly Short Attention Spans in Neural Language ModelingInternational Conference on Learning Representations (ICLR), 2017

374

118

15 Feb 2017