ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.16415
22
0

Natural Policy Gradient for Average Reward Non-Stationary RL

23 April 2025
Neharika Jali
Eshika Pathak
Pranay Sharma
Guannan Qu
Gauri Joshi
ArXivPDFHTML
Abstract

We consider the problem of non-stationary reinforcement learning (RL) in the infinite-horizon average-reward setting. We model it by a Markov Decision Process with time-varying rewards and transition probabilities, with a variation budget of ΔT\Delta_TΔT​. Existing non-stationary RL algorithms focus on model-based and model-free value-based methods. Policy-based methods despite their flexibility in practice are not theoretically well understood in non-stationary RL. We propose and analyze the first model-free policy-based algorithm, Non-Stationary Natural Actor-Critic (NS-NAC), a policy gradient method with a restart based exploration for change and a novel interpretation of learning rates as adapting factors. Further, we present a bandit-over-RL based parameter-free algorithm BORL-NS-NAC that does not require prior knowledge of the variation budget ΔT\Delta_TΔT​. We present a dynamic regret of O~(∣S∣1/2∣A∣1/2ΔT1/6T5/6)\tilde{\mathscr O}(|S|^{1/2}|A|^{1/2}\Delta_T^{1/6}T^{5/6})O~(∣S∣1/2∣A∣1/2ΔT1/6​T5/6) for both algorithms, where TTT is the time horizon, and ∣S∣|S|∣S∣, ∣A∣|A|∣A∣ are the sizes of the state and action spaces. The regret analysis leverages a novel adaptation of the Lyapunov function analysis of NAC to dynamic environments and characterizes the effects of simultaneous updates in policy, value function estimate and changes in the environment.

View on arXiv
@article{jali2025_2504.16415,
  title={ Natural Policy Gradient for Average Reward Non-Stationary RL },
  author={ Neharika Jali and Eshika Pathak and Pranay Sharma and Guannan Qu and Gauri Joshi },
  journal={arXiv preprint arXiv:2504.16415},
  year={ 2025 }
}
Comments on this paper