v1v2 (latest)

Memformer: A Memory-Augmented Transformer for Sequence Modeling

14 October 2020

Kun Qian

Papers citing "Memformer: A Memory-Augmented Transformer for Sequence Modeling"

41 / 41 papers shown

Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

Adam Filipek

KELM

146

03 Oct 2025

Vision encoders should be image size agnostic and task driven

101

22 Aug 2025

Memory-Augmented Transformers: A Systematic Review from Neuroscience Principles to Enhanced Model Architectures

193

14 Aug 2025

Goal-Based Vision-Language Driving

Santosh Patapati

Trisanth Srinivasan

170

30 Jul 2025

Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts

381

07 Apr 2025

LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

625

10 Feb 2025

Episodic memory in AI agents poses risks that should be studied and mitigated

Chad DeChant

463

20 Jan 2025

Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning ScenariosNeural Information Processing Systems (NeurIPS), 2024

363

20 Nov 2024

ACER: Automatic Language Model Context Extension via Retrieval

177

11 Oct 2024

Token Turing Machines are Efficient Vision ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

Purvish Jajal

Nick Eliopoulos

Benjamin Shiue-Hal Chou

George K. Thiravathukal

James C. Davis

Yung-Hsiang Lu

380

11 Sep 2024

You Only Use Reactive Attention Slice For Long Context Retrieval

214

03 Sep 2024

MambaEVT: Event Stream based Visual Object Tracking using State Space Model

Xiao Wang

Chao wang

Shiao Wang

Xixi Wang

Zhicheng Zhao

Lin Zhu

Bo Jiang

Mamba

190

20 Aug 2024

Retrieval-Enhanced Machine Learning: Synthesis and Opportunities

371

17 Jul 2024

BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-HaystackNeural Information Processing Systems (NeurIPS), 2024

Artyom Sorokin

RALM ALM LRM ReLM ELM

282

151

14 Jun 2024

Multi-Modal Retrieval For Large Language Model Based Speech RecognitionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

J. Kolehmainen

Aditya Gourav

Prashanth Gurunath Shivakumar

270

13 Jun 2024

LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned ProportionsInternational Conference on Machine Learning (ICML), 2024

242

18 May 2024

The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving

Pai Zeng

Zhenyu Ning

Jieru Zhao

Mengwei Xu

292

18 May 2024

MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

617

17 Apr 2024

On Difficulties of Attention Factorization through Shared Memory

119

31 Mar 2024

NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

Qipeng Guo

...

Yue Zhang

478

18 Mar 2024

LLM Inference Unveiled: Survey and Roofline Model Insights

Zhihang Yuan

Yuzhang Shang

Yang Zhou

Zhen Dong

Zhe Zhou

...

Yong Jae Lee

Yan Yan

Beidi Chen

Guangyu Sun

Kurt Keutzer

631

149

26 Feb 2024

In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss

381

16 Feb 2024

Sound Source Separation Using Latent Variational Block-Wise Disentanglement

306

08 Feb 2024

MEMORYLLM: Towards Self-Updatable Large Language Models

Xiusi Chen

...

Julian McAuley

219

07 Feb 2024

Investigating Recurrent Transformers with Dynamic Halt

Jishnu Ray Chowdhury

Cornelia Caragea

552

01 Feb 2024

The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey

Saurav Pawar

S.M. Towhidul Islam Tonmoy

S. M. M. Zaman

Vinija Jain

Vasu Sharma

Amitava Das

215

15 Jan 2024

Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing

Zi Yang

Nan Hua

RALM

228

10 Jan 2024

Uncertainty Guided Global Memory Improves Multi-Hop Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Alsu Sagirova

Andrey Kravchenko

RALM

285

29 Nov 2023

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

...

373

101

21 Nov 2023

From Interpolation to Extrapolation: Complete Length Generalization for Arithmetic Transformers

Shaoxiong Duan

Yining Shi

Wei Xu

281

18 Oct 2023

A Framework for Inference Inspired by Human Memory MechanismsInternational Conference on Learning Representations (ICLR), 2023

197

01 Oct 2023

Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models

599

29 Aug 2023

A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised Traffic Accident Detection in Driving Videos

258

27 Jul 2023

Extending Context Window of Large Language Models via Positional Interpolation

436

684

27 Jun 2023

Diable: Efficient Dialogue State Tracking as Operations on TablesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

304

26 May 2023

Landmark Attention: Random-Access Infinite Context Length for TransformersNeural Information Processing Systems (NeurIPS), 2023

Amirkeivan Mohtashami

Martin Jaggi

LLMAG

325

195

25 May 2023

Memory Efficient Neural Processes via Constant Memory Attention BlockInternational Conference on Machine Learning (ICML), 2023

Leo Feng

Frederick Tung

Hossein Hajimirsadeghi

Yoshua Bengio

Mohamed Osama Ahmed

318

23 May 2023

HIINT: Historical, Intra- and Inter- personal Dynamics Modeling with Cross-person Memory TransformerInternational Conference on Multimodal Interaction (ICMI), 2023

328

21 May 2023

A Lexical-aware Non-autoregressive Transformer-based ASR ModelInterspeech (Interspeech), 2023

Chong Lin

Kuan-Yu Chen

AI4TS

125

18 May 2023

Scaling Transformer to 1M tokens and beyond with RMT

339

111

19 Apr 2023

Improving Autoregressive NLP Tasks via Modular Linearized Attention

Victor Agostinelli

Lizhong Chen

286

17 Apr 2023