147
v1v2 (latest)

MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory

Shengtao Zhang
Jiaqian Wang
Ruiwen Zhou
Junwei Liao
Yuchen Feng
Zhuo Li
Yujie Zheng
Weinan Zhang
Ying Wen
Zhiyu Li
Feiyu Xiong
Yutao Qi
Bo Tang
Muning Wen
Main:8 Pages
11 Figures
Bibliography:4 Pages
13 Tables
Appendix:29 Pages
Abstract

The hallmark of human intelligence is the self-evolving ability to master new skills by learning from past experiences. However, current AI agents struggle to emulate this self-evolution: fine-tuning is computationally expensive and prone to catastrophic forgetting, while existing memory-based methods rely on passive semantic matching that often retrieves noise. To address these challenges, we propose MemRL, a non-parametric approach that evolves via reinforcement learning on episodic memory. By decoupling stable reasoning from plastic memory, MemRL employs a Two-Phase Retrieval mechanism to filter noise and identify high-utility strategies through environmental feedback. Extensive experiments on HLE, BigCodeBench, ALFWorld, and Lifelong Agent Bench demonstrate that MemRL significantly outperforms state-of-the-art baselines, confirming that MemRL effectively reconciles the stability-plasticity dilemma, enabling continuous runtime improvement without weight updates. Code is available atthis https URL.

View on arXiv
Comments on this paper