Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.00713
Cited By
q-Learning in Continuous Time
2 July 2022
Yanwei Jia
X. Zhou
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"q-Learning in Continuous Time"
32 / 32 papers shown
Title
Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
Yoav Wald
M. Goldstein
Yonathan Efroni
Wouter A. C. van Amsterdam
Rajesh Ranganath
CML
74
0
0
20 Mar 2025
Accuracy of Discretely Sampled Stochastic Policies in Continuous-time Reinforcement Learning
Yanwei Jia
Du Ouyang
Yufei Zhang
40
3
0
13 Mar 2025
Learning a Diffusion Model Policy from Rewards via Q-Score Matching
Michael Psenka
Alejandro Escontrela
Pieter Abbeel
Yi-An Ma
DiffM
89
23
0
17 Feb 2025
Exploratory Utility Maximization Problem with Tsallis Entropy
Chen Ziyi
Gu Jia-wen
53
0
0
03 Feb 2025
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
Hanyang Zhao
Haoxian Chen
Ji Zhang
D. Yao
Wenpin Tang
55
0
0
03 Feb 2025
Reinforcement Learning for Jump-Diffusions, with Financial Applications
Xuefeng Gao
Lingfei Li
X. Zhou
39
1
0
08 Jan 2025
Robust Reinforcement Learning under Diffusion Models for Data with Jumps
Chenyang Jiang
Donggyu Kim
Alejandra Quintos
Yazhen Wang
72
0
0
18 Nov 2024
Regret of exploratory policy improvement and
q
q
q
-learning
Wenpin Tang
X. Zhou
39
0
0
02 Nov 2024
Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning
Harley Wiltzer
Marc G. Bellemare
D. Meger
Patrick Shafto
Yash Jhaveri
29
1
0
14 Oct 2024
On the grid-sampling limit SDE
Christian Bender
Nguyen Tran Thuan
16
1
0
10 Oct 2024
A random measure approach to reinforcement learning in continuous time
Christian Bender
Nguyen Tran Thuan
20
2
0
25 Sep 2024
Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning
Hanyang Zhao
Haoxian Chen
Ji Zhang
David D. Yao
Wenpin Tang
37
3
0
12 Sep 2024
Reward-Directed Score-Based Diffusion Models via q-Learning
Xuefeng Gao
Jiale Zha
X. Zhou
DiffM
26
2
0
07 Sep 2024
Exploratory Optimal Stopping: A Singular Control Formulation
Jodi Dianetti
Giorgio Ferrari
Renyuan Xu
26
3
0
18 Aug 2024
On Bellman equations for continuous-time policy evaluation I: discretization and approximation
Wenlong Mou
Yuhua Zhu
OffRL
29
2
0
08 Jul 2024
Continuous-time q-Learning for Jump-Diffusion Models under Tsallis Entropy
Lijun Bo
Yijie Huang
Xiang Yu
Tingting Zhang
39
3
0
04 Jul 2024
Reinforcement Learning for Intensity Control: An Application to Choice-Based Network Revenue Management
Huiling Meng
Ningyuan Chen
Xuefeng Gao
55
1
0
08 Jun 2024
Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching
R. Denkert
Huyen Pham
X. Warin
33
4
0
27 Apr 2024
On the stability of Lipschitz continuous control problems and its application to reinforcement learning
Namkyeong Cho
Yeoneung Kim
21
0
0
20 Apr 2024
Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty
Yanwei Jia
36
2
0
19 Apr 2024
Score-based Diffusion Models via Stochastic Differential Equations -- a Technical Tutorial
Wenpin Tang
Hanyang Zhao
DiffM
36
23
0
12 Feb 2024
Discrete-Time Mean-Variance Strategy Based on Reinforcement Learning
Xiangyu Cui
Xun Li
Yun Shi
Si Zhao
27
1
0
24 Dec 2023
Data-Driven Merton's Strategies via Policy Randomization
Min Dai
Yuchao Dong
Yanwei Jia
Xun Yu Zhou
33
10
0
19 Dec 2023
Fast Policy Learning for Linear Quadratic Control with Entropy Regularization
Xin Guo
Xinyu Li
Renyuan Xu
34
3
0
23 Nov 2023
Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces
Andrea Angiuli
J. Fouque
Ruimeng Hu
Alan Raydan
30
5
0
19 Sep 2023
Continuous-time q-learning for mean-field control problems
Xiaoli Wei
Xian Yu
29
8
0
28 Jun 2023
Policy Optimization for Continuous Reinforcement Learning
Hanyang Zhao
Wenpin Tang
D. Yao
OffRL
24
17
0
30 May 2023
Convergence of policy gradient methods for finite-horizon exploratory linear-quadratic control problems
Michael Giegrich
Christoph Reisinger
Yufei Zhang
14
11
0
01 Nov 2022
Square-root regret bounds for continuous-time episodic Markov decision processes
Xuefeng Gao
X. Zhou
40
6
0
03 Oct 2022
Choquet regularization for reinforcement learning
Xia Han
Ruodu Wang
X. Zhou
21
2
0
17 Aug 2022
Logarithmic regret bounds for continuous-time average-reward Markov decision processes
Xuefeng Gao
X. Zhou
29
8
0
23 May 2022
Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
Yanwei Jia
X. Zhou
OffRL
16
79
0
22 Nov 2021
1