ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.02997
  4. Cited By
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

4 November 2021
Shangtong Zhang
Rémi Tachet des Combes
Romain Laroche
ArXivPDFHTML

Papers citing "Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch"

11 / 11 papers shown
Title
A Finite-Sample Analysis of Payoff-Based Independent Learning in
  Zero-Sum Stochastic Games
A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games
Zaiwei Chen
Kaipeng Zhang
Eric Mazumdar
Asuman Ozdaglar
Adam Wierman
48
6
0
03 Mar 2023
Global Convergence of Localized Policy Iteration in Networked
  Multi-Agent Reinforcement Learning
Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning
Yizhou Zhang
Guannan Qu
Pan Xu
Yiheng Lin
Zaiwei Chen
Adam Wierman
34
25
0
30 Nov 2022
Robust Constrained Reinforcement Learning
Robust Constrained Reinforcement Learning
Yue Wang
Fei Miao
Shaofeng Zou
37
12
0
14 Sep 2022
Policy Gradient Method For Robust Reinforcement Learning
Policy Gradient Method For Robust Reinforcement Learning
Yue Wang
Shaofeng Zou
81
67
0
15 May 2022
Beyond the Policy Gradient Theorem for Efficient Policy Updates in
  Actor-Critic Algorithms
Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms
Romain Laroche
Rémi Tachet des Combes
40
2
0
15 Feb 2022
On the Convergence of SARSA with Linear Function Approximation
On the Convergence of SARSA with Linear Function Approximation
Shangtong Zhang
Rémi Tachet des Combes
Romain Laroche
11
10
0
14 Feb 2022
STOPS: Short-Term-based Volatility-controlled Policy Search and its
  Global Convergence
STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence
Liang Xu
Daoming Lyu
Yangchen Pan
Aiwen Jiang
Bo Liu
28
0
0
24 Jan 2022
Truncated Emphatic Temporal Difference Methods for Prediction and
  Control
Truncated Emphatic Temporal Difference Methods for Prediction and Control
Shangtong Zhang
Shimon Whiteson
OffRL
13
11
0
11 Aug 2021
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
S. Khodadadian
Zaiwei Chen
S. T. Maguluri
CML
OffRL
71
26
0
18 Feb 2021
A Finite Time Analysis of Two Time-Scale Actor Critic Methods
A Finite Time Analysis of Two Time-Scale Actor Critic Methods
Yue Wu
Weitong Zhang
Pan Xu
Quanquan Gu
90
146
0
04 May 2020
On the Sample Complexity of Actor-Critic Method for Reinforcement
  Learning with Function Approximation
On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation
Harshat Kumar
Alec Koppel
Alejandro Ribeiro
102
79
0
18 Oct 2019
1