Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.11232
Cited By
Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
22 November 2021
Yanwei Jia
X. Zhou
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms"
37 / 37 papers shown
Title
Efficient Learning for Entropy-Regularized Markov Decision Processes via Multilevel Monte Carlo
Matthieu Meunier
C. Reisinger
Yufei Zhang
39
0
0
27 Mar 2025
Accuracy of Discretely Sampled Stochastic Policies in Continuous-time Reinforcement Learning
Yanwei Jia
Du Ouyang
Yufei Zhang
40
3
0
13 Mar 2025
Learning a Diffusion Model Policy from Rewards via Q-Score Matching
Michael Psenka
Alejandro Escontrela
Pieter Abbeel
Yi-An Ma
DiffM
89
23
0
17 Feb 2025
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
Hanyang Zhao
Haoxian Chen
Ji Zhang
D. Yao
Wenpin Tang
55
0
0
03 Feb 2025
Exploratory Utility Maximization Problem with Tsallis Entropy
Chen Ziyi
Gu Jia-wen
53
0
0
03 Feb 2025
Reinforcement Learning for Jump-Diffusions, with Financial Applications
Xuefeng Gao
Lingfei Li
X. Zhou
39
1
0
08 Jan 2025
Robust Reinforcement Learning under Diffusion Models for Data with Jumps
Chenyang Jiang
Donggyu Kim
Alejandra Quintos
Yazhen Wang
72
0
0
18 Nov 2024
Regret of exploratory policy improvement and
q
q
q
-learning
Wenpin Tang
X. Zhou
39
0
0
02 Nov 2024
Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning
Harley Wiltzer
Marc G. Bellemare
D. Meger
Patrick Shafto
Yash Jhaveri
29
1
0
14 Oct 2024
On the grid-sampling limit SDE
Christian Bender
Nguyen Tran Thuan
16
1
0
10 Oct 2024
A random measure approach to reinforcement learning in continuous time
Christian Bender
Nguyen Tran Thuan
20
2
0
25 Sep 2024
Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning
Hanyang Zhao
Haoxian Chen
Ji Zhang
David D. Yao
Wenpin Tang
37
3
0
12 Sep 2024
Reward-Directed Score-Based Diffusion Models via q-Learning
Xuefeng Gao
Jiale Zha
X. Zhou
DiffM
28
2
0
07 Sep 2024
Exploratory Optimal Stopping: A Singular Control Formulation
Jodi Dianetti
Giorgio Ferrari
Renyuan Xu
26
3
0
18 Aug 2024
On Bellman equations for continuous-time policy evaluation I: discretization and approximation
Wenlong Mou
Yuhua Zhu
OffRL
29
2
0
08 Jul 2024
Reinforcement Learning for Intensity Control: An Application to Choice-Based Network Revenue Management
Huiling Meng
Ningyuan Chen
Xuefeng Gao
55
1
0
08 Jun 2024
Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching
R. Denkert
Huyen Pham
X. Warin
33
4
0
27 Apr 2024
Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty
Yanwei Jia
36
2
0
19 Apr 2024
Score-based Diffusion Models via Stochastic Differential Equations -- a Technical Tutorial
Wenpin Tang
Hanyang Zhao
DiffM
36
23
0
12 Feb 2024
Discrete-Time Mean-Variance Strategy Based on Reinforcement Learning
Xiangyu Cui
Xun Li
Yun Shi
Si Zhao
27
1
0
24 Dec 2023
Data-driven optimal stopping: A pure exploration analysis
Soren Christensen
Niklas Dexheimer
C. Strauch
36
2
0
10 Dec 2023
Fast Policy Learning for Linear Quadratic Control with Entropy Regularization
Xin Guo
Xinyu Li
Renyuan Xu
34
3
0
23 Nov 2023
Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces
Andrea Angiuli
J. Fouque
Ruimeng Hu
Alan Raydan
30
5
0
19 Sep 2023
Actor critic learning algorithms for mean-field control with moment neural networks
Huyen Pham
X. Warin
30
5
0
08 Sep 2023
Continuous-time q-learning for mean-field control problems
Xiaoli Wei
Xian Yu
29
8
0
28 Jun 2023
Policy Optimization for Continuous Reinforcement Learning
Hanyang Zhao
Wenpin Tang
D. Yao
OffRL
24
17
0
30 May 2023
Actor-Critic learning for mean-field control in continuous time
N. Frikha
Maximilien Germain
Mathieu Laurière
H. Pham
Xuan Song
30
16
0
13 Mar 2023
Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off
Zichen Zhang
Johannes Kirschner
Junxi Zhang
Francesco Zanini
Alex Ayoub
Masood Dehghan
Dale Schuurmans
OffRL
11
3
0
17 Dec 2022
Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems
Michael Giegrich
Christoph Reisinger
Yufei Zhang
14
11
0
01 Nov 2022
Square-root regret bounds for continuous-time episodic Markov decision processes
Xuefeng Gao
X. Zhou
43
6
0
03 Oct 2022
Choquet regularization for reinforcement learning
Xia Han
Ruodu Wang
X. Zhou
21
2
0
17 Aug 2022
Optimal scheduling of entropy regulariser for continuous-time linear-quadratic reinforcement learning
Lukasz Szpruch
Tanut Treetanthiploet
Yufei Zhang
11
8
0
08 Aug 2022
q-Learning in Continuous Time
Yanwei Jia
X. Zhou
OffRL
40
67
0
02 Jul 2022
Logarithmic regret bounds for continuous-time average-reward Markov decision processes
Xuefeng Gao
X. Zhou
29
8
0
23 May 2022
Linear convergence of a policy gradient method for some finite horizon continuous time control problems
C. Reisinger
Wolfgang Stockinger
Yufei Zhang
16
5
0
22 Mar 2022
Recent Advances in Reinforcement Learning in Finance
B. Hambly
Renyuan Xu
Huining Yang
OffRL
24
165
0
08 Dec 2021
Deep Reinforcement Learning for Autonomous Driving: A Survey
B. R. Kiran
Ibrahim Sobh
V. Talpaert
Patrick Mannion
A. A. Sallab
S. Yogamani
P. Pérez
143
1,628
0
02 Feb 2020
1