ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.11275
  4. Cited By
A Theory of Regularized Markov Decision Processes

A Theory of Regularized Markov Decision Processes

31 January 2019
Matthieu Geist
B. Scherrer
Olivier Pietquin
ArXivPDFHTML

Papers citing "A Theory of Regularized Markov Decision Processes"

32 / 32 papers shown
Title
Efficient Learning for Entropy-Regularized Markov Decision Processes via Multilevel Monte Carlo
Efficient Learning for Entropy-Regularized Markov Decision Processes via Multilevel Monte Carlo
Matthieu Meunier
C. Reisinger
Yufei Zhang
75
0
0
27 Mar 2025
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
180
1
0
26 Feb 2025
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Yannis Flet-Berliac
Nathan Grinsztajn
Florian Strub
Bill Wu
Eugene Choi
...
Arash Ahmadian
Yash Chandak
M. G. Azar
Olivier Pietquin
Matthieu Geist
OffRL
128
7
0
17 Jan 2025
Bounded Rationality Equilibrium Learning in Mean Field Games
Bounded Rationality Equilibrium Learning in Mean Field Games
Yannick Eich
Christian Fabian
Kai Cui
Heinz Koeppl
50
0
0
11 Nov 2024
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
157
6
0
07 Nov 2024
Embedding Safety into RL: A New Take on Trust Region Methods
Embedding Safety into RL: A New Take on Trust Region Methods
Nikola Milosevic
Johannes Müller
Nico Scherf
73
2
0
05 Nov 2024
Last Iterate Convergence in Monotone Mean Field Games
Last Iterate Convergence in Monotone Mean Field Games
Noboru Isobe
Kenshi Abe
Kaito Ariu
58
0
0
07 Oct 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
99
2
0
30 May 2024
Regularized Q-Learning with Linear Function Approximation
Regularized Q-Learning with Linear Function Approximation
Jiachen Xi
Alfredo Garcia
P. Momcilovic
76
2
0
26 Jan 2024
Gradient Flows for Regularized Stochastic Control Problems
Gradient Flows for Regularized Stochastic Control Problems
David Siska
Lukasz Szpruch
48
21
0
10 Jun 2020
Soft Actor-Critic Algorithms and Applications
Soft Actor-Critic Algorithms and Applications
Tuomas Haarnoja
Aurick Zhou
Kristian Hartikainen
George Tucker
Sehoon Ha
...
Vikash Kumar
Henry Zhu
Abhishek Gupta
Pieter Abbeel
Sergey Levine
116
2,391
0
13 Dec 2018
Relative Entropy Regularized Policy Iteration
Relative Entropy Regularized Policy Iteration
A. Abdolmaleki
Jost Tobias Springenberg
Jonas Degrave
Steven Bohez
Yuval Tassa
Dan Belov
N. Heess
Martin Riedmiller
49
72
0
05 Dec 2018
Maximum a Posteriori Policy Optimisation
Maximum a Posteriori Policy Optimisation
A. Abdolmaleki
Jost Tobias Springenberg
Yuval Tassa
Rémi Munos
N. Heess
Martin Riedmiller
69
471
0
14 Jun 2018
Reinforcement Learning and Control as Probabilistic Inference: Tutorial
  and Review
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review
Sergey Levine
AI4CE
BDL
62
667
0
02 May 2018
Learning by Playing - Solving Sparse Reward Tasks from Scratch
Learning by Playing - Solving Sparse Reward Tasks from Scratch
Martin Riedmiller
Roland Hafner
Thomas Lampe
Michael Neunert
Jonas Degrave
T. Wiele
Volodymyr Mnih
N. Heess
Jost Tobias Springenberg
81
446
0
28 Feb 2018
Differentiable Dynamic Programming for Structured Prediction and
  Attention
Differentiable Dynamic Programming for Structured Prediction and Attention
A. Mensch
Mathieu Blondel
55
130
0
11 Feb 2018
Path Consistency Learning in Tsallis Entropy Regularized MDPs
Path Consistency Learning in Tsallis Entropy Regularized MDPs
Ofir Nachum
Yinlam Chow
Mohammad Ghavamzadeh
57
45
0
10 Feb 2018
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
  Learning with a Stochastic Actor
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
241
8,236
0
04 Jan 2018
A short variational proof of equivalence between policy gradients and
  soft Q learning
A short variational proof of equivalence between policy gradients and soft Q learning
Pierre Harvey Richemond
B. Maginnis
37
5
0
22 Dec 2017
Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
Justin Fu
Katie Z Luo
Sergey Levine
107
746
0
30 Oct 2017
Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy
  Regularization for Reinforcement Learning
Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning
Kyungjae Lee
Sungjoon Choi
Songhwai Oh
49
67
0
19 Sep 2017
A unified view of entropy-regularized Markov decision processes
A unified view of entropy-regularized Markov decision processes
Gergely Neu
Anders Jonsson
Vicencc Gómez
93
255
0
22 May 2017
Equivalence Between Policy Gradients and Soft Q-Learning
Equivalence Between Policy Gradients and Soft Q-Learning
John Schulman
Xi Chen
Pieter Abbeel
OffRL
71
344
0
21 Apr 2017
Bridging the Gap Between Value and Policy Based Reinforcement Learning
Bridging the Gap Between Value and Policy Based Reinforcement Learning
Ofir Nachum
Mohammad Norouzi
Kelvin Xu
Dale Schuurmans
131
470
0
28 Feb 2017
Reinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based Policies
Tuomas Haarnoja
Haoran Tang
Pieter Abbeel
Sergey Levine
79
1,329
0
27 Feb 2017
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
OffRL
130
611
0
08 Jun 2016
Guided Cost Learning: Deep Inverse Optimal Control via Policy
  Optimization
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
Chelsea Finn
Sergey Levine
Pieter Abbeel
97
946
0
01 Mar 2016
From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label
  Classification
From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification
André F. T. Martins
Ramón Fernández Astudillo
142
711
0
05 Feb 2016
Asynchronous Methods for Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih
Adria Puigdomenech Badia
M. Berk Mirza
Alex Graves
Timothy Lillicrap
Tim Harley
David Silver
Koray Kavukcuoglu
170
8,805
0
04 Feb 2016
Taming the Noise in Reinforcement Learning via Soft Updates
Taming the Noise in Reinforcement Learning via Soft Updates
Roy Fox
Ari Pakman
Naftali Tishby
44
336
0
28 Dec 2015
Trust Region Policy Optimization
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
254
6,722
0
19 Feb 2015
Dynamic Policy Programming
Dynamic Policy Programming
M. G. Azar
Vicencc Gómez
H. Kappen
78
123
0
12 Apr 2010
1