Evolutionary Policy Optimization

24 March 2025

Abstract

Despite its extreme sample inefficiency, on-policy reinforcement learning has become a fundamental tool in real-world applications. With recent advances in GPU-driven simulation, the ability to collect vast amounts of data for RL training has scaled exponentially. However, studies show that current on-policy methods, such as PPO, fail to fully leverage the benefits of parallelized environments, leading to performance saturation beyond a certain scale. In contrast, Evolutionary Algorithms (EAs) excel at increasing diversity through randomization, making them a natural complement to RL. However, existing EvoRL methods have struggled to gain widespread adoption due to their extreme sample inefficiency. To address these challenges, we introduce Evolutionary Policy Optimization (EPO), a novel policy gradient algorithm that combines the strengths of EA and policy gradients. We show that EPO significantly improves performance across diverse and challenging environments, demonstrating superior scalability with parallelized simulations.

View on arXiv

@article{wang2025_2503.19037,
  title={ Evolutionary Policy Optimization },
  author={ Jianren Wang and Yifan Su and Abhinav Gupta and Deepak Pathak },
  journal={arXiv preprint arXiv:2503.19037},
  year={ 2025 }
}

Comments on this paper