ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.01468
28
0

Improving Plasticity in Non-stationary Reinforcement Learning with Evidential Proximal Policy Optimization

3 March 2025
Abdullah Akgul
Gulcin Baykal
Manuel Haußmann
M. Kandemir
ArXivPDFHTML
Abstract

On-policy reinforcement learning algorithms use the most recently learned policy to interact with the environment and update it using the latest gathered trajectories, making them well-suited for adapting to non-stationary environments where dynamics change over time. However, previous studies show that they struggle to maintain plasticity\unicodex2013\unicode{x2013}\unicodex2013the ability of neural networks to adjust their synaptic connections\unicodex2013\unicode{x2013}\unicodex2013with overfitting identified as the primary cause. To address this, we present the first application of evidential learning in an on-policy reinforcement learning setting: Evidential Proximal Policy Optimization (EPPO)\textit{Evidential Proximal Policy Optimization (EPPO)}Evidential Proximal Policy Optimization (EPPO). EPPO incorporates all sources of error in the critic network's approximation\unicodex2013\unicode{x2013}\unicodex2013i.e., the baseline function in advantage calculation\unicodex2013\unicode{x2013}\unicodex2013by modeling the epistemic and aleatoric uncertainty contributions to the approximation's total variance. We achieve this by using an evidential neural network, which serves as a regularizer to prevent overfitting. The resulting probabilistic interpretation of the advantage function enables optimistic exploration, thus maintaining the plasticity. Through experiments on non-stationary continuous control tasks, where the environment dynamics change at regular intervals, we demonstrate that EPPO outperforms state-of-the-art on-policy reinforcement learning variants in both task-specific and overall return.

View on arXiv
@article{akgül2025_2503.01468,
  title={ Improving Plasticity in Non-stationary Reinforcement Learning with Evidential Proximal Policy Optimization },
  author={ Abdullah Akgül and Gulcin Baykal and Manuel Haußmann and Melih Kandemir },
  journal={arXiv preprint arXiv:2503.01468},
  year={ 2025 }
}
Comments on this paper