ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.00661
  4. Cited By
Reward is enough for convex MDPs

Reward is enough for convex MDPs

1 June 2021
Tom Zahavy
Brendan O'Donoghue
Guillaume Desjardins
Satinder Singh
ArXivPDFHTML

Papers citing "Reward is enough for convex MDPs"

21 / 21 papers shown
Title
Online Episodic Convex Reinforcement Learning
Online Episodic Convex Reinforcement Learning
B. Moreno
Khaled Eldowa
Pierre Gaillard
Margaux Brégère
Nadia Oudjane
OffRL
29
0
0
12 May 2025
Is there Value in Reinforcement Learning?
Is there Value in Reinforcement Learning?
Lior Fox
Y. Loewenstein
OffRL
64
0
0
07 May 2025
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Timofei Gritsaev
Nikita Morozov
S. Samsonov
D. Tiapkin
21
0
0
20 Oct 2024
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang
Lei Ying
OffRL
37
2
0
25 Sep 2024
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Toshinori Kitamura
Tadashi Kozuno
Wataru Kumagai
Kenta Hoshino
Y. Hosoe
Kazumi Kasaura
Masashi Hamaya
Paavo Parmas
Yutaka Matsuo
72
0
0
29 Aug 2024
Three Dogmas of Reinforcement Learning
Three Dogmas of Reinforcement Learning
David Abel
Mark K. Ho
A. Harutyunyan
38
5
0
15 Jul 2024
Global Reinforcement Learning: Beyond Linear and Convex Rewards via
  Submodular Semi-gradient Methods
Global Reinforcement Learning: Beyond Linear and Convex Rewards via Submodular Semi-gradient Methods
Ric De Santi
Manish Prajapat
Andreas Krause
36
3
0
13 Jul 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
B. Moreno
Margaux Brégère
Pierre Gaillard
Nadia Oudjane
OffRL
39
0
0
30 May 2024
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints
Bram De Cooman
Johan A. K. Suykens
35
0
0
25 Apr 2024
A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with
  Uniform PAC Guarantees
A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees
Toshinori Kitamura
Tadashi Kozuno
Masahiro Kato
Yuki Ichihara
Soichiro Nishimori
Akiyoshi Sannai
Sho Sonoda
Wataru Kumagai
Yutaka Matsuo
42
2
0
31 Jan 2024
Submodular Reinforcement Learning
Submodular Reinforcement Learning
Manish Prajapat
Mojmír Mutný
M. Zeilinger
Andreas Krause
OffRL
30
12
0
25 Jul 2023
Optimal Exploration for Model-Based RL in Nonlinear Systems
Optimal Exploration for Model-Based RL in Nonlinear Systems
Andrew Wagenmaker
Guanya Shi
Kevin G. Jamieson
36
14
0
15 Jun 2023
Fast Rates for Maximum Entropy Exploration
Fast Rates for Maximum Entropy Exploration
D. Tiapkin
Denis Belomestny
Daniele Calandriello
Eric Moulines
Rémi Munos
A. Naumov
Pierre Perrault
Yunhao Tang
Michal Valko
Pierre Menard
41
17
0
14 Mar 2023
Scalable Multi-Agent Reinforcement Learning with General Utilities
Scalable Multi-Agent Reinforcement Learning with General Utilities
Donghao Ying
Yuhao Ding
Alec Koppel
Javad Lavaei
38
1
0
15 Feb 2023
Improved Policy Optimization for Online Imitation Learning
Improved Policy Optimization for Online Imitation Learning
J. Lavington
Sharan Vaswani
Mark W. Schmidt
OffRL
18
6
0
29 Jul 2022
Active Exploration via Experiment Design in Markov Chains
Active Exploration via Experiment Design in Markov Chains
Mojmír Mutný
Tadeusz Janik
Andreas Krause
41
14
0
29 Jun 2022
Off-Beat Multi-Agent Reinforcement Learning
Off-Beat Multi-Agent Reinforcement Learning
Wei Qiu
Weixun Wang
R. Wang
Bo An
Yujing Hu
S. Obraztsova
Zinovi Rabinovich
Jianye Hao
Yingfeng Chen
Changjie Fan
OffRL
29
2
0
27 May 2022
Challenging Common Assumptions in Convex Reinforcement Learning
Challenging Common Assumptions in Convex Reinforcement Learning
Mirco Mutti
Ric De Santi
Piersilvio De Bartolomeis
Marcello Restelli
OffRL
29
21
0
03 Feb 2022
The Geometry of Memoryless Stochastic Policy Optimization in
  Infinite-Horizon POMDPs
The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs
Johannes Muller
Guido Montúfar
36
8
0
14 Oct 2021
Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint
Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint
M. Geist
Julien Pérolat
Mathieu Laurière
Romuald Elie
Sarah Perrin
Olivier Bachem
Rémi Munos
Olivier Pietquin
37
62
0
07 Jun 2021
1