Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

3 May 2025

Abstract

Conditional decision generation with diffusion models has shown powerful competitiveness in reinforcement learning (RL). Recent studies reveal the relation between energy-function-guidance diffusion models and constrained RL problems. The main challenge lies in estimating the intermediate energy, which is intractable due to the log-expectation formulation during the generation process. To address this issue, we propose the Analytic Energy-guided Policy Optimization (AEPO). Specifically, we first provide a theoretical analysis and the closed-form solution of the intermediate guidance when the diffusion model obeys the conditional Gaussian transformation. Then, we analyze the posterior Gaussian distribution in the log-expectation formulation and obtain the target estimation of the log-expectation under mild assumptions. Finally, we train an intermediate energy neural network to approach the target estimation of log-expectation formulation. We apply our method in 30+ offline RL tasks to demonstrate the effectiveness of our method. Extensive experiments illustrate that our method surpasses numerous representative baselines in D4RL offline reinforcement learning benchmarks.

View on arXiv

@article{hu2025_2505.01822,
  title={ Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning },
  author={ Jifeng Hu and Sili Huang and Zhejian Yang and Shengchao Hu and Li Shen and Hechang Chen and Lichao Sun and Yi Chang and Dacheng Tao },
  journal={arXiv preprint arXiv:2505.01822},
  year={ 2025 }
}

Comments on this paper