v1v2v3v4 (latest)

TeleMem: Building Long-Term and Multimodal Memory for Agentic AI

12 December 2025

Chunliang Chen

Ming Guan

Xiao Lin

Jiaxu Li

Luxi Lin

Qiyi Wang

Xiangyu Chen

Jixiang Luo

Changzhi Sun

Dell Zhang

Xuelong Li

LLMAG

RALM

ArXiv (abs)PDF HTML Github (335★)

Main:9 Pages

6 Figures

Bibliography:2 Pages

3 Tables

Appendix:1 Pages

Abstract

Large language models (LLMs) excel at many NLP tasks but struggle to sustain long-term interactions due to limited attention over extended dialogue histories. Retrieval-augmented generation (RAG) mitigates this issue but lacks reliable mechanisms for updating or refining stored memories, leading to schema-driven hallucinations, inefficient write operations, and minimal support for multimodalthis http URLaddress these challenges, we propose TeleMem, a unified long-term and multimodal memory system that maintains coherent user profiles through narrative dynamic extraction, ensuring that only dialogue-grounded information is preserved. TeleMem further introduces a structured writing pipeline that batches, retrieves, clusters, and consolidates memory entries, substantially improving storage efficiency, reducing token usage, and accelerating memory operations. Additionally, a multimodal memory module combined with ReAct-style reasoning equips the system with a closed-loop observe, think, and act process that enables accurate understanding of complex video content in long-term contexts. Experimental results show that TeleMem surpasses the state-of-the-art Mem0 baseline with 19% higher accuracy, 43% fewer tokens, and a 2.1x speedup on the ZH-4O long-term role-play gaming benchmark.

View on arXiv

Comments on this paper