ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.14156
  4. Cited By
Policy Mirror Descent with Lookahead

Policy Mirror Descent with Lookahead

21 March 2024
Kimon Protopapas
Anas Barakat
ArXivPDFHTML

Papers citing "Policy Mirror Descent with Lookahead"

10 / 10 papers shown
Title
Ordering-based Conditions for Global Convergence of Policy Gradient Methods
Ordering-based Conditions for Global Convergence of Policy Gradient Methods
Jincheng Mei
Bo Dai
Alekh Agarwal
Mohammad Ghavamzadeh
Csaba Szepesvári
Dale Schuurmans
52
4
0
02 Apr 2025
Functional Acceleration for Policy Mirror Descent
Functional Acceleration for Policy Mirror Descent
Veronica Chelu
Doina Precup
23
0
0
23 Jul 2024
Policy Mirror Descent Inherently Explores Action Space
Policy Mirror Descent Inherently Explores Action Space
Yan Li
Guanghui Lan
OffRL
51
8
0
08 Mar 2023
Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted
  Markov Decision Processes
Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes
Emmeran Johnson
Ciara Pike-Burke
Patrick Rebeschini
26
11
0
22 Feb 2023
A Novel Framework for Policy Mirror Descent with General
  Parameterization and Linear Convergence
A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence
Carlo Alfano
Rui Yuan
Patrick Rebeschini
54
15
0
30 Jan 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
Improve Agents without Retraining: Parallel Tree Search with Off-Policy
  Correction
Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction
Assaf Hallak
Gal Dalal
Steven Dalton
I. Frosio
Shie Mannor
Gal Chechik
OffRL
OnRL
29
9
0
04 Jul 2021
Policy Mirror Descent for Reinforcement Learning: Linear Convergence,
  New Sampling Complexity, and Generalized Problem Classes
Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes
Guanghui Lan
87
135
0
30 Jan 2021
On Linear Convergence of Policy Gradient Methods for Finite MDPs
On Linear Convergence of Policy Gradient Methods for Finite MDPs
Jalaj Bhandari
Daniel Russo
48
59
0
21 Jul 2020
Beyond the One Step Greedy Approach in Reinforcement Learning
Beyond the One Step Greedy Approach in Reinforcement Learning
Yonathan Efroni
Gal Dalal
B. Scherrer
Shie Mannor
OffRL
43
48
0
10 Feb 2018
1