ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.10486
25
3

On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization

18 June 2023
Mudit Gaur
Amrit Singh Bedi
Di-di Wang
Vaneet Aggarwal
ArXivPDFHTML
Abstract

Actor-critic algorithms have shown remarkable success in solving state-of-the-art decision-making problems. However, despite their empirical effectiveness, their theoretical underpinnings remain relatively unexplored, especially with neural network parametrization. In this paper, we delve into the study of a natural actor-critic algorithm that utilizes neural networks to represent the critic. Our aim is to establish sample complexity guarantees for this algorithm, achieving a deeper understanding of its performance characteristics. To achieve that, we propose a Natural Actor-Critic algorithm with 2-Layer critic parametrization (NAC2L). Our approach involves estimating the QQQ-function in each iteration through a convex optimization problem. We establish that our proposed approach attains a sample complexity of O~(1ϵ4(1−γ)4)\tilde{\mathcal{O}}\left(\frac{1}{\epsilon^{4}(1-\gamma)^{4}}\right)O~(ϵ4(1−γ)41​). In contrast, the existing sample complexity results in the literature only hold for a tabular or linear MDP. Our result, on the other hand, holds for countable state spaces and does not require a linear or low-rank structure on the MDP.

View on arXiv
Comments on this paper