A unified view of entropy-regularized Markov decision processes

22 May 2017

Anders Jonsson

Papers citing "A unified view of entropy-regularized Markov decision processes"

50 / 76 papers shown

Title
A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance Axel Friedrich Wolter Tobias Sutter OffRL 37 0 0 07 May 2025
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees Yongtao Wu Luca Viano Yihang Chen Zhenyu Zhu Kimon Antonakopoulos Quanquan Gu V. Cevher 56 0 0 18 Feb 2025
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning Rémy Hosseinkhan Boucher Onofrio Semeraro L. Mathelin 82 0 0 28 Jan 2025
Divergence-Augmented Policy Optimization Qing Wang Yingru Li Jiechao Xiong Tong Zhang OffRL 47 16 0 28 Jan 2025
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF Heyang Zhao Chenlu Ye Quanquan Gu Tong Zhang OffRL 57 3 0 07 Nov 2024
Embedding Safety into RL: A New Take on Trust Region Methods Nikola Milosevic Johannes Müller Nico Scherf 25 1 0 05 Nov 2024
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization Timofei Gritsaev Nikita Morozov S. Samsonov D. Tiapkin 21 0 0 20 Oct 2024
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints Bram De Cooman Johan A. K. Suykens 38 0 0 25 Apr 2024
Predictable Reinforcement Learning Dynamics through Entropy Rate Minimization Daniel Jarne Ornia Giannis Delimpaltadakis Jens Kober Javier Alonso-Mora 28 2 0 30 Nov 2023
A Large Deviations Perspective on Policy Gradient Algorithms Wouter Jongeneel Daniel Kuhn Mengmeng Li 31 1 0 13 Nov 2023
Multi-Player Zero-Sum Markov Games with Networked Separable Interactions Chanwoo Park Kaipeng Zhang Asuman Ozdaglar 30 8 0 13 Jul 2023
LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning Outongyi Lv Bingxin Zhou OffRL 44 0 0 05 Jul 2023
Identifiability and Generalizability in Constrained Inverse Reinforcement Learning Andreas Schlaginhaufen Maryam Kamgarpour 29 10 0 01 Jun 2023
Offline Primal-Dual Reinforcement Learning for Linear MDPs Germano Gabbianelli Gergely Neu Nneka Okolo Matteo Papini OffRL 29 7 0 22 May 2023
Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization Haoran Xu Li Jiang Jianxiong Li Zhuoran Yang Zhaoran Wang Victor Chan Xianyuan Zhan OffRL 36 73 0 28 Mar 2023
Fast Rates for Maximum Entropy Exploration D. Tiapkin Denis Belomestny Daniele Calandriello Eric Moulines Rémi Munos A. Naumov Pierre Perrault Yunhao Tang Michal Valko Pierre Menard 44 18 0 14 Mar 2023
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage Masatoshi Uehara Nathan Kallus Jason D. Lee Wen Sun OffRL 50 5 0 05 Feb 2023
A general Markov decision process formalism for action-state entropy-regularized reward maximization D. Grytskyy Jorge Ramírez-Ruiz R. Moreno-Bote 22 3 0 02 Feb 2023
Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods Gen Li Yanxi Chen Yu Huang Yuejie Chi H. Vincent Poor Yuxin Chen OT 46 5 0 30 Jan 2023
Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization Gergely Neu Nneka Okolo 34 6 0 21 Oct 2022
RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk J. Hau Marek Petrik Mohammad Ghavamzadeh R. Russel 32 5 0 09 Sep 2022
Variational Inference for Model-Free and Model-Based Reinforcement Learning Felix Leibfried OffRL 23 0 0 04 Sep 2022
Entropy Augmented Reinforcement Learning Jianfei Ma 36 0 0 19 Aug 2022
Bayesian regularization of empirical MDPs Samarth Gupta Daniel N. Hill Lexing Ying Inderjit Dhillon OffRL 29 0 0 03 Aug 2022
Performative Reinforcement Learning Debmalya Mandal Stelios Triantafyllou Goran Radanović 33 18 0 30 Jun 2022
Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games Sihan Zeng Thinh T. Doan Justin Romberg 71 22 0 27 May 2022
Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to Act Alexis Jacq Johan Ferret Olivier Pietquin M. Geist 32 9 0 16 Mar 2022
Accelerating Primal-dual Methods for Regularized Markov Decision Processes Haoya Li Hsiang-Fu Yu Lexing Ying Inderjit Dhillon 34 4 0 21 Feb 2022
Adversarially Trained Actor Critic for Offline Reinforcement Learning Ching-An Cheng Tengyang Xie Nan Jiang Alekh Agarwal OffRL 16 127 0 05 Feb 2022
You May Not Need Ratio Clipping in PPO Mingfei Sun Vitaly Kurin Guoqing Liu Sam Devlin Tao Qin Katja Hofmann Shimon Whiteson 13 15 0 31 Jan 2022
Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime B. Kerimkulov J. Leahy David Siska Lukasz Szpruch 30 11 0 18 Jan 2022
Mirror Learning: A Unifying Framework of Policy Optimisation J. Kuba Christian Schroeder de Witt Jakob N. Foerster 26 24 0 07 Jan 2022
Approximate Newton policy gradient algorithms Haoya Li Samarth Gupta Hsiangfu Yu Lexing Ying Inderjit Dhillon 51 2 0 05 Oct 2021
Batch size-invariance for policy optimization Jacob Hilton K. Cobbe John Schulman 17 11 0 01 Oct 2021
Divergence-Regularized Multi-Agent Actor-Critic Kefan Su Zongqing Lu 46 25 0 01 Oct 2021
Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods Xin Guo Anran Hu Junzi Zhang OffRL 28 6 0 13 Sep 2021
A Survey of Exploration Methods in Reinforcement Learning Susan Amin Maziar Gomrokchi Harsh Satija H. V. Hoof Doina Precup OffRL 32 80 0 01 Sep 2021
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences Alan Chan Hugo Silva Sungsu Lim Tadashi Kozuno A. R. Mahmood Martha White 25 29 0 17 Jul 2021
The Confluence of Networks, Games and Learning Tao Li Guanze Peng Quanyan Zhu Tamer Basar AI4CE 8 46 0 17 May 2021
Reinforcement learning of rare diffusive dynamics Avishek Das Dominic C. Rose J. P. Garrahan David T. Limmer 24 27 0 10 May 2021
Safe Chance Constrained Reinforcement Learning for Batch Process Control M. Mowbray Panagiotis Petsagkourakis Ehecatl Antonio del Rio Chanona Dongda Zhang OffRL 34 34 0 23 Apr 2021
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation Andrea Zanette Ching-An Cheng Alekh Agarwal 32 52 0 24 Mar 2021
Near Optimal Policy Optimization via REPS Aldo Pacchiano Jonathan Lee Peter L. Bartlett Ofir Nachum 23 3 0 17 Mar 2021
Maximum Entropy RL (Provably) Solves Some Robust RL Problems Benjamin Eysenbach Sergey Levine OOD 50 175 0 10 Mar 2021
Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs Jiafan He Dongruo Zhou Quanquan Gu 95 24 0 17 Feb 2021
Optimization Issues in KL-Constrained Approximate Policy Iteration N. Lazić Botao Hao Yasin Abbasi-Yadkori Dale Schuurmans Csaba Szepesvári 19 10 0 11 Feb 2021
Approximately Solving Mean Field Games via Entropy-Regularized Deep Reinforcement Learning Kai Cui Heinz Koeppl 64 91 0 02 Feb 2021
A Tutorial on Sparse Gaussian Processes and Variational Inference Felix Leibfried Vincent Dutordoir S. T. John N. Durrande GP 42 49 0 27 Dec 2020
Sample Efficient Reinforcement Learning with REINFORCE Junzi Zhang Jongho Kim Brendan O'Donoghue Stephen P. Boyd 42 101 0 22 Oct 2020
Logistic Q-Learning Joan Bas-Serrano Sebastian Curi Andreas Krause Gergely Neu 14 40 0 21 Oct 2020