Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2007.03356
Cited By
Do Transformers Need Deep Long-Range Memory
7 July 2020
Jack W. Rae
Ali Razavi
RALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Do Transformers Need Deep Long-Range Memory"
28 / 28 papers shown
StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns
Luanbo Wan
Weizhi Ma
LLMAG
KELM
257
7
0
16 Jun 2025
What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?
Jinhong Ni
Chang-Bin Zhang
Qiang Zhang
Jing Zhang
MDE
233
7
0
28 May 2025
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Gleb Rodionov
Roman Garipov
Alina Shutova
George Yakushev
Erik Schultheis
Vage Egiazarian
Anton Sinitsin
Denis Kuznedelev
Dan Alistarh
LRM
587
26
0
08 Apr 2025
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
285
57
0
24 Jun 2024
Are queries and keys always relevant? A case study on Transformer wave functions
Riccardo Rende
Luciano Loris Viteritti
327
15
0
29 May 2024
SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget
Zihao Wang
Shaoduo Gan
310
17
0
07 Apr 2024
Masked Audio Generation using a Single Non-Autoregressive Transformer
International Conference on Learning Representations (ICLR), 2024
Alon Ziv
Itai Gat
Gaël Le Lan
Tal Remez
Felix Kreuk
Alexandre Défossez
Jade Copet
Gabriel Synnaeve
Yossi Adi
478
65
0
09 Jan 2024
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Kaiqiang Song
Xiaoyang Wang
Sangwoo Cho
Xiaoman Pan
Dong Yu
300
7
0
14 Dec 2023
AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
Jake Grigsby
Linxi Fan
Yuke Zhu
OffRL
LM&Ro
408
44
0
15 Oct 2023
Long-range Language Modeling with Self-retrieval
Transactions of the Association for Computational Linguistics (TACL), 2023
Ohad Rubin
Jonathan Berant
RALM
KELM
273
32
0
23 Jun 2023
DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Sicheng Yang
Zhiyong Wu
Minglei Li
Zhensong Zhang
Lei Hao
Weihong Bao
Ming Cheng
Long Xiao
240
109
0
08 May 2023
Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ta-Chung Chi
Ting-Han Fan
Alexander I. Rudnicky
Peter J. Ramadge
LRM
186
15
0
05 May 2023
What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum Entanglement
Neural Information Processing Systems (NeurIPS), 2023
Yotam Alexander
Nimrod De La Vega
Noam Razin
Nadav Cohen
426
6
0
20 Mar 2023
Dissociating language and thought in large language models
Kyle Mahowald
Anna A. Ivanova
I. Blank
Nancy Kanwisher
J. Tenenbaum
Evelina Fedorenko
ELM
ReLM
404
233
0
16 Jan 2023
iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer
Jooyeol Yun
Sanghyeon Lee
Minho Park
Jaegul Choo
ViT
307
2
0
14 Jul 2022
Embedding Recycling for Language Models
Findings (Findings), 2022
Jon Saad-Falcon
Amanpreet Singh
Luca Soldaini
Mike DÁrcy
Arman Cohan
Doug Downey
KELM
225
5
0
11 Jul 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Neural Information Processing Systems (NeurIPS), 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
958
3,833
0
27 May 2022
The NLP Task Effectiveness of Long-Range Transformers
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Guanghui Qin
Yukun Feng
Benjamin Van Durme
309
38
0
16 Feb 2022
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Computer Vision and Pattern Recognition (CVPR), 2022
Chao-Yuan Wu
Yanghao Li
K. Mangalam
Haoqi Fan
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
517
259
0
20 Jan 2022
Well Googled is Half Done: Multimodal Forecasting of New Fashion Product Sales with Image-based Google Trends
Geri Skenderi
Christian Joppi
Matteo Denitto
Marco Cristani
AI4TS
370
35
0
20 Sep 2021
Do Long-Range Language Models Actually Use Long-Range Context?
Simeng Sun
Kalpesh Krishna
Andrew Mattarella-Micke
Mohit Iyyer
RALM
291
102
0
19 Sep 2021
Can Transformers Jump Around Right in Natural Language? Assessing Performance Transfer from SCAN
Rahma Chaabouni
Roberto Dessì
Eugene Kharitonov
293
20
0
03 Jul 2021
EchoFilter: End-to-End Neural Network for Acoustic Echo Cancellation
Lu Ma
Song Yang
Y. Gong
Xintian Wang
Zhongqin Wu
140
15
0
31 May 2021
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
577
861
0
08 Nov 2020
Sparsifying Transformer Models with Trainable Representation Pooling
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Michal Pietruszka
Łukasz Borchmann
Lukasz Garncarek
299
13
0
10 Sep 2020
Neural Language Generation: Formulation, Methods, and Evaluation
Cristina Garbacea
Qiaozhu Mei
406
30
0
31 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Transactions of the Association for Computational Linguistics (TACL), 2020
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
1.1K
732
0
12 Mar 2020
Frustratingly Short Attention Spans in Neural Language Modeling
International Conference on Learning Representations (ICLR), 2017
Michal Daniluk
Tim Rocktaschel
Johannes Welbl
Sebastian Riedel
374
118
0
15 Feb 2017
1
Page 1 of 1