Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.05830
Cited By
Provably Efficient Exploration in Policy Optimization
12 December 2019
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Provably Efficient Exploration in Policy Optimization"
50 / 90 papers shown
Title
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
Tong Yang
Bo Dai
Lin Xiao
Yuejie Chi
OffRL
69
2
0
13 Feb 2025
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Qingyue Zhao
Kaixuan Ji
Heyang Zhao
Tong Zhang
Q. Gu
OffRL
50
0
0
09 Feb 2025
Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning
Yen-Ru Lai
Fu-Chieh Chang
Pei-Yuan Wu
OffRL
81
1
0
22 Aug 2024
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
D. Tiapkin
Evgenii Chzhen
Gilles Stoltz
74
1
0
08 Jul 2024
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Asaf B. Cassel
Haipeng Luo
Aviv A. Rosenberg
Dmitry Sotnikov
OffRL
33
3
0
13 May 2024
Imitation Learning in Discounted Linear MDPs without exploration assumptions
Luca Viano
Stratis Skoulakis
V. Cevher
32
3
0
03 May 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
63
57
0
29 Apr 2024
DPO: A Differential and Pointwise Control Approach to Reinforcement Learning
Minh Nguyen
Chandrajit Bajaj
25
0
0
24 Apr 2024
Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation
Yu Chen
Xiangcheng Zhang
Siwei Wang
Longbo Huang
44
3
0
28 Feb 2024
Reinforcement Learning from Human Feedback with Active Queries
Kaixuan Ji
Jiafan He
Quanquan Gu
29
17
0
14 Feb 2024
Settling the Sample Complexity of Online Reinforcement Learning
Zihan Zhang
Yuxin Chen
Jason D. Lee
S. Du
OffRL
98
22
0
25 Jul 2023
Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP
Jiacheng Guo
Zihao Li
Huazheng Wang
Mengdi Wang
Zhuoran Yang
Xuezhou Zhang
37
5
0
21 Jun 2023
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo
Haque Ishfaq
Qingfeng Lan
Pan Xu
A. R. Mahmood
Doina Precup
Anima Anandkumar
Kamyar Azizzadenesheli
BDL
OffRL
30
20
0
29 May 2023
A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes
Han Zhong
Tong Zhang
35
26
0
15 May 2023
Does Sparsity Help in Learning Misspecified Linear Bandits?
Jialin Dong
Lin F. Yang
25
1
0
29 Mar 2023
Provably Efficient Reinforcement Learning via Surprise Bound
Hanlin Zhu
Ruosong Wang
Jason D. Lee
OffRL
28
5
0
22 Feb 2023
Reinforcement Learning with Function Approximation: From Linear to Nonlinear
Jihao Long
Jiequn Han
39
5
0
20 Feb 2023
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
Fang-yuan Kong
Xiangcheng Zhang
Baoxiang Wang
Shuai Li
31
12
0
14 Feb 2023
Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation
Uri Sherman
Tomer Koren
Yishay Mansour
32
12
0
30 Jan 2023
Refined Regret for Adversarial MDPs with Linear Function Approximation
Yan Dai
Haipeng Luo
Chen-Yu Wei
Julian Zimmert
31
12
0
30 Jan 2023
Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information
Zuyue Fu
Zhengling Qi
Zhuoran Yang
Zhaoran Wang
Lan Wang
OffRL
25
0
0
23 Dec 2022
On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation
Thanh Nguyen-Tang
Ming Yin
Sunil R. Gupta
Svetha Venkatesh
R. Arora
OffRL
58
16
0
23 Nov 2022
Multi-armed Bandit Learning on a Graph
Tianpeng Zhang
Kasper Johansson
Na Li
33
6
0
20 Sep 2022
Dynamic Regret of Online Markov Decision Processes
Peng Zhao
Longfei Li
Zhi-Hua Zhou
OffRL
44
17
0
26 Aug 2022
Sampling Through the Lens of Sequential Decision Making
J. Dou
Alvin Pan
Runxue Bao
Haiyi Mao
Lei Luo
Zhi-Hong Mao
26
19
0
17 Aug 2022
Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning
Shuang Qiu
Lingxiao Wang
Chenjia Bai
Zhuoran Yang
Zhaoran Wang
SSL
OffRL
26
32
0
29 Jul 2022
Regret Minimization and Convergence to Equilibria in General-sum Markov Games
Liad Erez
Tal Lancewicki
Uri Sherman
Tomer Koren
Yishay Mansour
42
25
0
28 Jul 2022
Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions
Shuang Qiu
Xiaohan Wei
Jieping Ye
Zhaoran Wang
Zhuoran Yang
OffRL
35
11
0
25 Jul 2022
Offline Reinforcement Learning with Differential Privacy
Dan Qiao
Yu Wang
OffRL
44
23
0
02 Jun 2022
Provably Efficient Kernelized Q-Learning
Shuang Liu
H. Su
MLT
27
4
0
21 Apr 2022
Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies
Zihan Zhang
Xiangyang Ji
S. Du
30
21
0
24 Mar 2022
Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism
Ming Yin
Yaqi Duan
Mengdi Wang
Yu Wang
OffRL
36
66
0
11 Mar 2022
Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets
Yifei Min
Tianhao Wang
Ruitu Xu
Zhaoran Wang
Michael I. Jordan
Zhuoran Yang
35
21
0
07 Mar 2022
Provably Efficient Causal Model-Based Reinforcement Learning for Systematic Generalization
Mirco Mutti
Ric De Santi
Emanuele Rossi
J. Calderón
Michael M. Bronstein
Marcello Restelli
36
14
0
14 Feb 2022
Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory
Ruiqi Zhang
Xuezhou Zhang
Chengzhuo Ni
Mengdi Wang
OffRL
35
16
0
10 Feb 2022
Policy Optimization for Stochastic Shortest Path
Liyu Chen
Haipeng Luo
Aviv A. Rosenberg
21
12
0
07 Feb 2022
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Tiancheng Jin
Tal Lancewicki
Haipeng Luo
Yishay Mansour
Aviv A. Rosenberg
74
21
0
31 Jan 2022
Exponential Family Model-Based Reinforcement Learning via Score Matching
Gen Li
Junbo Li
Anmol Kabra
Nathan Srebro
Zhaoran Wang
Zhuoran Yang
37
4
0
28 Dec 2021
Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?
Han Zhong
Zhuoran Yang
Zhaoran Wang
Michael I. Jordan
34
30
0
27 Dec 2021
Nearly Optimal Policy Optimization with Stable at Any Time Guarantee
Tianhao Wu
Yunchang Yang
Han Zhong
Liwei Wang
S. Du
Jiantao Jiao
55
14
0
21 Dec 2021
Differentially Private Regret Minimization in Episodic Markov Decision Processes
Sayak Ray Chowdhury
Xingyu Zhou
29
21
0
20 Dec 2021
Perturbational Complexity by Distribution Mismatch: A Systematic Analysis of Reinforcement Learning in Reproducing Kernel Hilbert Space
Jihao Long
Jiequn Han
34
6
0
05 Nov 2021
Learning Stochastic Shortest Path with Linear Function Approximation
Steffen Czolbe
Jiafan He
Adrian Dalca
Quanquan Gu
44
30
0
25 Oct 2021
False Correlation Reduction for Offline Reinforcement Learning
Arvindkumar Krishnakumar
Zuyue Fu
Lingxiao Wang
Zhuoran Yang
Chenjia Bai
Tianyi Zhou
Judy Hoffman
Jing Jiang
OffRL
39
9
0
24 Oct 2021
Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes
Chonghua Liao
Jiafan He
Quanquan Gu
27
17
0
19 Oct 2021
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs
Han Zhong
Zhuoran Yang
Zhaoran Wang
Csaba Szepesvári
47
21
0
18 Oct 2021
Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain
Jianye Hao
Tianpei Yang
Hongyao Tang
Chenjia Bai
Jinyi Liu
Zhaopeng Meng
Peng Liu
Zhen Wang
OffRL
41
93
0
14 Sep 2021
Efficient Local Planning with Linear Function Approximation
Dong Yin
Botao Hao
Yasin Abbasi-Yadkori
N. Lazić
Csaba Szepesvári
32
19
0
12 Aug 2021
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
Haipeng Luo
Chen-Yu Wei
Chung-Wei Lee
38
44
0
18 Jul 2021
Variance-Aware Off-Policy Evaluation with Linear Function Approximation
Yifei Min
Tianhao Wang
Dongruo Zhou
Quanquan Gu
OffRL
39
38
0
22 Jun 2021
1
2
Next