ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.09847
75
16

Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm

17 March 2021
Lin Chen
B. Scherrer
Peter L. Bartlett
    OffRL
ArXivPDFHTML
Abstract

In this paper, we investigate the sample complexity of policy evaluation in infinite-horizon offline reinforcement learning (also known as the off-policy evaluation problem) with linear function approximation. We identify a hard regime dγ2>1d\gamma^{2}>1dγ2>1, where ddd is the dimension of the feature vector and γ\gammaγ is the discount rate. In this regime, for any q∈[γ2,1]q\in[\gamma^{2},1]q∈[γ2,1], we can construct a hard instance such that the smallest eigenvalue of its feature covariance matrix is q/dq/dq/d and it requires Ω(dγ2(q−γ2)ε2exp⁡(Θ(dγ2)))\Omega\left(\frac{d}{\gamma^{2}\left(q-\gamma^{2}\right)\varepsilon^{2}}\exp\left(\Theta\left(d\gamma^{2}\right)\right)\right)Ω(γ2(q−γ2)ε2d​exp(Θ(dγ2))) samples to approximate the value function up to an additive error ε\varepsilonε. Note that the lower bound of the sample complexity is exponential in ddd. If q=γ2q=\gamma^{2}q=γ2, even infinite data cannot suffice. Under the low distribution shift assumption, we show that there is an algorithm that needs at most O(max⁡{∥θπ∥24ε4log⁡dδ,1ε2(d+log⁡1δ)})O\left(\max\left\{ \frac{\left\Vert \theta^{\pi}\right\Vert _{2}^{4}}{\varepsilon^{4}}\log\frac{d}{\delta},\frac{1}{\varepsilon^{2}}\left(d+\log\frac{1}{\delta}\right)\right\} \right)O(max{ε4∥θπ∥24​​logδd​,ε21​(d+logδ1​)}) samples (θπ\theta^{\pi}θπ is the parameter of the policy in linear function approximation) and guarantees approximation to the value function up to an additive error of ε\varepsilonε with probability at least 1−δ1-\delta1−δ.

View on arXiv
Comments on this paper