Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.12923
Cited By
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
24 March 2021
Andrea Zanette
Ching-An Cheng
Alekh Agarwal
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation"
48 / 48 papers shown
Title
Enhancing PPO with Trajectory-Aware Hybrid Policies
Qisai Liu
Zhanhong Jiang
Hsin-Jung Yang
Mahsa Khosravi
Joshua R. Waite
S. Sarkar
40
0
0
21 Feb 2025
A Model Selection Approach for Corruption Robust Reinforcement Learning
Chen-Yu Wei
Christoph Dann
Julian Zimmert
77
44
0
31 Dec 2024
Dual Approximation Policy Optimization
Zhihan Xiong
Maryam Fazel
Lin Xiao
15
1
0
02 Oct 2024
Leveraging Unlabeled Data Sharing through Kernel Function Approximation in Offline Reinforcement Learning
Yen-Ru Lai
Fu-Chieh Chang
Pei-Yuan Wu
OffRL
64
1
0
22 Aug 2024
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
Dake Zhang
Boxiang Lyu
Shuang Qiu
Mladen Kolar
Tong Zhang
OffRL
25
0
0
10 Jul 2024
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
D. Tiapkin
Evgenii Chzhen
Gilles Stoltz
74
0
0
08 Jul 2024
Linear Bellman Completeness Suffices for Efficient Online Reinforcement Learning with Few Actions
Noah Golowich
Ankur Moitra
OffRL
25
1
0
17 Jun 2024
Corruption-Robust Offline Two-Player Zero-Sum Markov Games
Andi Nika
Debmalya Mandal
Adish Singla
Goran Radanović
OffRL
19
1
0
04 Mar 2024
Learning mirror maps in policy mirror descent
Carlo Alfano
Sebastian Towers
Silvia Sapora
Chris Xiaoxuan Lu
Patrick Rebeschini
17
0
0
07 Feb 2024
Sample Complexity Characterization for Linear Contextual MDPs
Junze Deng
Yuan-Chia Cheng
Shaofeng Zou
Yingbin Liang
25
1
0
05 Feb 2024
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
36
155
0
18 Dec 2023
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback
Canzhe Zhao
Ruofeng Yang
Baoxiang Wang
Xuezhou Zhang
Shuai Li
22
2
0
14 Nov 2023
Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback
Haolin Liu
Chen-Yu Wei
Julian Zimmert
13
6
0
17 Oct 2023
Rate-Optimal Policy Optimization for Linear Markov Decision Processes
Uri Sherman
Alon Cohen
Tomer Koren
Yishay Mansour
23
7
0
28 Aug 2023
Provably Efficient Algorithm for Nonstationary Low-Rank MDPs
Yuan-Chia Cheng
J. Yang
Yitao Liang
OOD
25
1
0
10 Aug 2023
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations
Ruiquan Huang
Yitao Liang
J. Yang
OffRL
16
5
0
01 Jul 2023
Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP
Jiacheng Guo
Zihao Li
Huazheng Wang
Mengdi Wang
Zhuoran Yang
Xuezhou Zhang
25
5
0
21 Jun 2023
On the Model-Misspecification in Reinforcement Learning
Yunfan Li
Lin F. Yang
28
5
0
19 Jun 2023
Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling
Yunfan Li
Yiran Wang
Y. Cheng
Lin F. Yang
OffRL
16
4
0
15 Jun 2023
Improving Offline RL by Blending Heuristics
Sinong Geng
Aldo Pacchiano
Andrey Kolobov
Ching-An Cheng
OffRL
17
7
0
01 Jun 2023
Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL
Qinghua Liu
Gellert Weisz
András Gyorgy
Chi Jin
Csaba Szepesvári
OffRL
16
8
0
18 May 2023
A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes
Han Zhong
Tong Zhang
17
26
0
15 May 2023
Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs
Yuan-Chia Cheng
Ruiquan Huang
J. Yang
Yitao Liang
OffRL
37
8
0
20 Mar 2023
Best of Both Worlds Policy Optimization
Christoph Dann
Chen-Yu Wei
Julian Zimmert
11
12
0
18 Feb 2023
A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence
Carlo Alfano
Rui Yuan
Patrick Rebeschini
54
15
0
30 Jan 2023
Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation
Uri Sherman
Tomer Koren
Yishay Mansour
9
12
0
30 Jan 2023
Refined Regret for Adversarial MDPs with Linear Function Approximation
Yan Dai
Haipeng Luo
Chen-Yu Wei
Julian Zimmert
16
12
0
30 Jan 2023
Sample Efficient Deep Reinforcement Learning via Local Planning
Dong Yin
S. Thiagarajan
N. Lazić
Nived Rajaraman
Botao Hao
Csaba Szepesvári
13
4
0
29 Jan 2023
Latent Variable Representation for Reinforcement Learning
Tongzheng Ren
Chenjun Xiao
Tianjun Zhang
Na Li
Zhaoran Wang
Sujay Sanghavi
Dale Schuurmans
Bo Dai
OffRL
11
10
0
17 Dec 2022
CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control
Xiang Zheng
Xingjun Ma
Cong Wang
16
1
0
28 Nov 2022
Linear Convergence for Natural Policy Gradient with Log-linear Policy Parametrization
Carlo Alfano
Patrick Rebeschini
49
13
0
30 Sep 2022
Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning
Shuang Qiu
Lingxiao Wang
Chenjia Bai
Zhuoran Yang
Zhaoran Wang
SSL
OffRL
8
32
0
29 Jul 2022
Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs
Dongsheng Ding
K. Zhang
Jiali Duan
Tamer Bacsar
Mihailo R. Jovanović
13
19
0
06 Jun 2022
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning
Andrea Zanette
Martin J. Wainwright
OOD
19
5
0
01 Jun 2022
Provable Benefits of Representational Transfer in Reinforcement Learning
Alekh Agarwal
Yuda Song
Wen Sun
Kaiwen Wang
Mengdi Wang
Xuezhou Zhang
OffRL
16
33
0
29 May 2022
Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach
Xuezhou Zhang
Yuda Song
Masatoshi Uehara
Mengdi Wang
Alekh Agarwal
Wen Sun
OffRL
11
57
0
31 Jan 2022
Nearly Optimal Policy Optimization with Stable at Any Time Guarantee
Tianhao Wu
Yunchang Yang
Han Zhong
Liwei Wang
S. Du
Jiantao Jiao
23
14
0
21 Dec 2021
On the Global Optimum Convergence of Momentum-based Policy Gradient
Yuhao Ding
Junzi Zhang
Javad Lavaei
17
16
0
19 Oct 2021
Representation Learning for Online and Offline RL in Low-rank MDPs
Masatoshi Uehara
Xuezhou Zhang
Wen Sun
OffRL
25
125
0
09 Oct 2021
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Andrea Zanette
Martin J. Wainwright
Emma Brunskill
OffRL
11
111
0
19 Aug 2021
Efficient Local Planning with Linear Function Approximation
Dong Yin
Botao Hao
Yasin Abbasi-Yadkori
N. Lazić
Csaba Szepesvári
27
18
0
12 Aug 2021
Design of Experiments for Stochastic Contextual Linear Bandits
Andrea Zanette
Kefan Dong
Jonathan Lee
Emma Brunskill
OffRL
17
17
0
21 Jul 2021
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
Haipeng Luo
Chen-Yu Wei
Chung-Wei Lee
14
44
0
18 Jul 2021
MADE: Exploration via Maximizing Deviation from Explored Regions
Tianjun Zhang
Paria Rashidinejad
Jiantao Jiao
Yuandong Tian
Joseph E. Gonzalez
Stuart J. Russell
OffRL
13
42
0
18 Jun 2021
Corruption-Robust Offline Reinforcement Learning
Xuezhou Zhang
Yiding Chen
Jerry Zhu
Wen Sun
OffRL
18
39
0
11 Jun 2021
Navigating to the Best Policy in Markov Decision Processes
Aymen Al Marjani
Aurélien Garivier
Alexandre Proutière
14
20
0
05 Jun 2021
Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints
Chi Jin
Zhuoran Yang
Zhaoran Wang
OffRL
107
166
0
06 Jan 2021
Optimism in Reinforcement Learning with Generalized Linear Function Approximation
Yining Wang
Ruosong Wang
S. Du
A. Krishnamurthy
127
135
0
09 Dec 2019
1