ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.07328
6
7

Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning

14 December 2021
Yunhao Tang
ArXivPDFHTML
Abstract

Despite the empirical success of meta reinforcement learning (meta-RL), there are still a number poorly-understood discrepancies between theory and practice. Critically, biased gradient estimates are almost always implemented in practice, whereas prior theory on meta-RL only establishes convergence under unbiased gradient estimates. In this work, we investigate such a discrepancy. In particular, (1) We show that unbiased gradient estimates have variance Θ(N)\Theta(N)Θ(N) which linearly depends on the sample size NNN of the inner loop updates; (2) We propose linearized score function (LSF) gradient estimates, which have bias O(1/N)\mathcal{O}(1/\sqrt{N})O(1/N​) and variance O(1/N)\mathcal{O}(1/N)O(1/N); (3) We show that most empirical prior work in fact implements variants of the LSF gradient estimates. This implies that practical algorithms "accidentally" introduce bias to achieve better performance; (4) We establish theoretical guarantees for the LSF gradient estimates in meta-RL regarding its convergence to stationary points, showing better dependency on NNN than prior work when NNN is large.

View on arXiv
Comments on this paper