Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.14309
Cited By
How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization
29 April 2020
P. DÓro
Wojciech Ja'skowski
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization"
6 / 6 papers shown
Title
Learning a Diffusion Model Policy from Rewards via Q-Score Matching
Michael Psenka
Alejandro Escontrela
Pieter Abbeel
Yi-An Ma
DiffM
91
23
0
17 Feb 2025
Compatible Gradient Approximations for Actor-Critic Algorithms
Baturay Saglam
Dionysis Kalogerias
19
0
0
02 Sep 2024
Off-Policy RL Algorithms Can be Sample-Efficient for Continuous Control via Sample Multiple Reuse
Jiafei Lyu
Le Wan
Zongqing Lu
Xiu Li
OffRL
26
9
0
29 May 2023
Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function
Ruijie Zheng
Xiyao Wang
Huazhe Xu
Furong Huang
38
13
0
02 Feb 2023
The Primacy Bias in Deep Reinforcement Learning
Evgenii Nikishin
Max Schwarzer
P. DÓro
Pierre-Luc Bacon
Aaron C. Courville
OnRL
90
178
0
16 May 2022
A case for new neural network smoothness constraints
Mihaela Rosca
T. Weber
A. Gretton
S. Mohamed
AAML
25
48
0
14 Dec 2020
1