Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

17 July 2021

Papers citing "Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences"

10 / 10 papers shown

Title
Trust-Region Twisted Policy Improvement Joery A. de Vries Jinke He Yaniv Oren M. Spaan OffRL LRM 30 0 0 08 Apr 2025
Value Improved Actor Critic Algorithms Yaniv Oren Moritz A. Zanger Pascal R. van der Vaart M. Spaan Wendelin Bohmer Wendelin Bohmer OffRL 31 0 0 03 Jun 2024
Shared learning of powertrain control policies for vehicle fleets Lindsey Kerbel B. Ayalew Andrej Ivanco 27 0 0 27 Apr 2024
HesScale: Scalable Computation of Hessian Diagonals Mohamed Elsayed A. R. Mahmood 14 7 0 20 Oct 2022
Critic Sequential Monte Carlo Vasileios Lioutas J. Lavington Justice Sefas Matthew Niedoba Yunpeng Liu Berend Zwartsenberg Setareh Dabiri Frank D. Wood Adam Scibior 40 7 0 30 May 2022
The Geometry of Robust Value Functions Kaixin Wang Navdeep Kumar Kuangqi Zhou Bryan Hooi Jiashi Feng Shie Mannor AAML 11 7 0 30 Jan 2022
Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes Guanghui Lan 87 136 0 30 Jan 2021
Ensuring Monotonic Policy Improvement in Entropy-regularized Value-based Reinforcement Learning Lingwei Zhu Takamitsu Matsubara 17 4 0 25 Aug 2020
CAQL: Continuous Action Q-Learning Moonkyung Ryu Yinlam Chow Ross Anderson Christian Tjandraatmadja Craig Boutilier 197 42 0 26 Sep 2019
Input Convex Neural Networks Brandon Amos Lei Xu J. Zico Kolter 173 597 0 22 Sep 2016