ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control

7 October 2024

Abstract

We consider deep deterministic policy gradient (DDPG) in the context of reinforcement learning with sparse rewards. To enhance exploration, we introduce a search procedure, \emph{ ${\epsilon}{t}$ -greedy}, which generates exploratory options for exploring less-visited states. We prove that search using $\epsilon t$ -greedy has polynomial sample complexity under mild MDP assumptions. To more efficiently use the information provided by rewarded transitions, we develop a new dual experience replay buffer framework, \emph{GDRB}, and implement \emph{longest n-step returns}. The resulting algorithm, \emph{ETGL-DDPG}, integrates all three techniques: \bm{ $\epsilon t$ }-greedy, \textbf{G}DRB, and \textbf{L}ongest $n$ -step, into DDPG. We evaluate ETGL-DDPG on standard benchmarks and demonstrate that it outperforms DDPG, as well as other state-of-the-art methods, across all tested sparse-reward continuous environments. Ablation studies further highlight how each strategy individually enhances the performance of DDPG in this setting.

View on arXiv

@article{futuhi2025_2410.05225,
  title={ ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control },
  author={ Ehsan Futuhi and Shayan Karimi and Chao Gao and Martin Müller },
  journal={arXiv preprint arXiv:2410.05225},
  year={ 2025 }
}

Comments on this paper