ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.11058
14
0

Expected Sarsa(λλλ) with Control Variate for Variance Reduction

25 June 2019
Long Yang
Yu Zhang
Jun Wen
Qian Zheng
Pengfei Li
Gang Pan
ArXivPDFHTML
Abstract

Off-policy learning is powerful for reinforcement learning. However, the high variance of off-policy evaluation is a critical challenge, which causes off-policy learning falls into an uncontrolled instability. In this paper, for reducing the variance, we introduce control variate technique to Expected\mathtt{Expected}Expected Sarsa\mathtt{Sarsa}Sarsa(λ\lambdaλ) and propose a tabular ES\mathtt{ES}ES(λ\lambdaλ)-CV\mathtt{CV}CV algorithm. We prove that if a proper estimator of value function reaches, the proposed ES\mathtt{ES}ES(λ\lambdaλ)-CV\mathtt{CV}CV enjoys a lower variance than Expected\mathtt{Expected}Expected Sarsa\mathtt{Sarsa}Sarsa(λ\lambdaλ). Furthermore, to extend ES\mathtt{ES}ES(λ\lambdaλ)-CV\mathtt{CV}CV to be a convergent algorithm with linear function approximation, we propose the GES\mathtt{GES}GES(λ\lambdaλ) algorithm under the convex-concave saddle-point formulation. We prove that the convergence rate of GES\mathtt{GES}GES(λ\lambdaλ) achieves O(1/T)\mathcal{O}(1/T)O(1/T), which matches or outperforms lots of state-of-art gradient-based algorithms, but we use a more relaxed condition. Numerical experiments show that the proposed algorithm performs better with lower variance than several state-of-art gradient-based TD learning algorithms: GQ\mathtt{GQ}GQ(λ\lambdaλ), GTB\mathtt{GTB}GTB(λ\lambdaλ) and ABQ\mathtt{ABQ}ABQ(ζ\zetaζ).

View on arXiv
Comments on this paper