v1v2 (latest)

MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory

6 January 2026

Shengtao Zhang

Jiaqian Wang

Ruiwen Zhou

Junwei Liao

Yuchen Feng

Zhuo Li

Yujie Zheng

Weinan Zhang

Ying Wen

Zhiyu Li

Feiyu Xiong

Yutao Qi

Bo Tang

Muning Wen

KELM

LRM

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Main:8 Pages

11 Figures

Bibliography:4 Pages

13 Tables

Appendix:29 Pages

Abstract

The hallmark of human intelligence is the self-evolving ability to master new skills by learning from past experiences. However, current AI agents struggle to emulate this self-evolution: fine-tuning is computationally expensive and prone to catastrophic forgetting, while existing memory-based methods rely on passive semantic matching that often retrieves noise. To address these challenges, we propose MemRL, a non-parametric approach that evolves via reinforcement learning on episodic memory. By decoupling stable reasoning from plastic memory, MemRL employs a Two-Phase Retrieval mechanism to filter noise and identify high-utility strategies through environmental feedback. Extensive experiments on HLE, BigCodeBench, ALFWorld, and Lifelong Agent Bench demonstrate that MemRL significantly outperforms state-of-the-art baselines, confirming that MemRL effectively reconciles the stability-plasticity dilemma, enabling continuous runtime improvement without weight updates. Code is available atthis https URL.

View on arXiv

Comments on this paper