ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.05830
  4. Cited By
Provably Efficient Exploration in Policy Optimization

Provably Efficient Exploration in Policy Optimization

12 December 2019
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
ArXivPDFHTML

Papers citing "Provably Efficient Exploration in Policy Optimization"

40 / 90 papers shown
Title
Provably Efficient Representation Selection in Low-rank Markov Decision
  Processes: From Online to Offline RL
Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL
Weitong Zhang
Jiafan He
Dongruo Zhou
Amy Zhang
Quanquan Gu
OffRL
24
11
0
22 Jun 2021
Randomized Exploration for Reinforcement Learning with General Value
  Function Approximation
Randomized Exploration for Reinforcement Learning with General Value Function Approximation
Haque Ishfaq
Qiwen Cui
V. Nguyen
Alex Ayoub
Zhuoran Yang
Zhaoran Wang
Doina Precup
Lin F. Yang
37
43
0
15 Jun 2021
Bellman-consistent Pessimism for Offline Reinforcement Learning
Bellman-consistent Pessimism for Offline Reinforcement Learning
Tengyang Xie
Ching-An Cheng
Nan Jiang
Paul Mineiro
Alekh Agarwal
OffRL
LRM
27
271
0
13 Jun 2021
The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces
The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces
Chi Jin
Qinghua Liu
Tiancheng Yu
26
50
0
07 Jun 2021
Sublinear Least-Squares Value Iteration via Locality Sensitive Hashing
Sublinear Least-Squares Value Iteration via Locality Sensitive Hashing
Anshumali Shrivastava
Zhao Song
Zhaozhuo Xu
19
22
0
18 May 2021
Principled Exploration via Optimistic Bootstrapping and Backward
  Induction
Principled Exploration via Optimistic Bootstrapping and Backward Induction
Chenjia Bai
Lingxiao Wang
Lei Han
Jianye Hao
Animesh Garg
Peng Liu
Zhaoran Wang
OffRL
23
38
0
13 May 2021
Cautiously Optimistic Policy Optimization and Exploration with Linear
  Function Approximation
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
Andrea Zanette
Ching-An Cheng
Alekh Agarwal
32
53
0
24 Mar 2021
An Exponential Lower Bound for Linearly-Realizable MDPs with Constant
  Suboptimality Gap
An Exponential Lower Bound for Linearly-Realizable MDPs with Constant Suboptimality Gap
Yuanhao Wang
Ruosong Wang
Sham Kakade
OffRL
39
43
0
23 Mar 2021
Softmax Policy Gradient Methods Can Take Exponential Time to Converge
Softmax Policy Gradient Methods Can Take Exponential Time to Converge
Gen Li
Yuting Wei
Yuejie Chi
Yuxin Chen
31
50
0
22 Feb 2021
Instrumental Variable Value Iteration for Causal Offline Reinforcement
  Learning
Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning
Luofeng Liao
Zuyue Fu
Zhuoran Yang
Yixin Wang
Mladen Kolar
Zhaoran Wang
OffRL
20
35
0
19 Feb 2021
Near-optimal Policy Optimization Algorithms for Learning Adversarial
  Linear Mixture MDPs
Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs
Jiafan He
Dongruo Zhou
Quanquan Gu
95
24
0
17 Feb 2021
Reward Poisoning in Reinforcement Learning: Attacks Against Unknown
  Learners in Unknown Environments
Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments
Amin Rakhsha
Xuezhou Zhang
Xiaojin Zhu
Adish Singla
AAML
OffRL
44
37
0
16 Feb 2021
Online Apprenticeship Learning
Online Apprenticeship Learning
Lior Shani
Tom Zahavy
Shie Mannor
OffRL
29
25
0
13 Feb 2021
Optimization Issues in KL-Constrained Approximate Policy Iteration
Optimization Issues in KL-Constrained Approximate Policy Iteration
N. Lazić
Botao Hao
Yasin Abbasi-Yadkori
Dale Schuurmans
Csaba Szepesvári
19
10
0
11 Feb 2021
Robust Policy Gradient against Strong Data Corruption
Robust Policy Gradient against Strong Data Corruption
Xuezhou Zhang
Yiding Chen
Xiaojin Zhu
Wen Sun
AAML
40
37
0
11 Feb 2021
Bellman Eluder Dimension: New Rich Classes of RL Problems, and
  Sample-Efficient Algorithms
Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms
Chi Jin
Qinghua Liu
Sobhan Miryoosefi
OffRL
38
215
0
01 Feb 2021
Improved Variance-Aware Confidence Sets for Linear Bandits and Linear
  Mixture MDP
Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP
Zihan Zhang
Jiaqi Yang
Xiangyang Ji
S. Du
71
38
0
29 Jan 2021
Provably Efficient Reinforcement Learning with Linear Function
  Approximation Under Adaptivity Constraints
Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints
Chi Jin
Zhuoran Yang
Zhaoran Wang
OffRL
122
167
0
06 Jan 2021
Is Pessimism Provably Efficient for Offline RL?
Is Pessimism Provably Efficient for Offline RL?
Ying Jin
Zhuoran Yang
Zhaoran Wang
OffRL
27
350
0
30 Dec 2020
Learning Adversarial Markov Decision Processes with Delayed Feedback
Learning Adversarial Markov Decision Processes with Delayed Feedback
Tal Lancewicki
Aviv A. Rosenberg
Yishay Mansour
43
32
0
29 Dec 2020
Policy Optimization as Online Learning with Mediator Feedback
Policy Optimization as Online Learning with Mediator Feedback
Alberto Maria Metelli
Matteo Papini
P. DÓro
Marcello Restelli
OffRL
27
10
0
15 Dec 2020
Regret Bounds for Adaptive Nonlinear Control
Regret Bounds for Adaptive Nonlinear Control
Nicholas M. Boffi
Stephen Tu
Jean-Jacques E. Slotine
41
47
0
26 Nov 2020
On Function Approximation in Reinforcement Learning: Optimism in the
  Face of Large State Spaces
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
Zhuoran Yang
Chi Jin
Zhaoran Wang
Mengdi Wang
Michael I. Jordan
41
18
0
09 Nov 2020
Efficient Learning in Non-Stationary Linear Markov Decision Processes
Efficient Learning in Non-Stationary Linear Markov Decision Processes
Ahmed Touati
Pascal Vincent
42
29
0
24 Oct 2020
CoinDICE: Off-Policy Confidence Interval Estimation
CoinDICE: Off-Policy Confidence Interval Estimation
Bo Dai
Ofir Nachum
Yinlam Chow
Lihong Li
Csaba Szepesvári
Dale Schuurmans
OffRL
27
84
0
22 Oct 2020
Sample Efficient Reinforcement Learning with REINFORCE
Sample Efficient Reinforcement Learning with REINFORCE
Junzi Zhang
Jongho Kim
Brendan O'Donoghue
Stephen P. Boyd
42
101
0
22 Oct 2020
Logistic Q-Learning
Logistic Q-Learning
Joan Bas-Serrano
Sebastian Curi
Andreas Krause
Gergely Neu
14
40
0
21 Oct 2020
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal
  Algorithm Escaping the Curse of Horizon
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon
Zihan Zhang
Xiangyang Ji
S. Du
OffRL
34
104
0
28 Sep 2020
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy
Zuyue Fu
Zhuoran Yang
Zhaoran Wang
21
42
0
02 Aug 2020
Dynamic Regret of Policy Optimization in Non-stationary Environments
Dynamic Regret of Policy Optimization in Non-stationary Environments
Yingjie Fei
Zhuoran Yang
Zhaoran Wang
Qiaomin Xie
32
54
0
30 Jun 2020
FLAMBE: Structural Complexity and Representation Learning of Low Rank
  MDPs
FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs
Alekh Agarwal
Sham Kakade
A. Krishnamurthy
Wen Sun
OffRL
41
223
0
18 Jun 2020
Reinforcement Learning with General Value Function Approximation:
  Provably Efficient Approach via Bounded Eluder Dimension
Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension
Ruosong Wang
Ruslan Salakhutdinov
Lin F. Yang
23
55
0
21 May 2020
A Finite Time Analysis of Two Time-Scale Actor Critic Methods
A Finite Time Analysis of Two Time-Scale Actor Critic Methods
Yue Wu
Weitong Zhang
Pan Xu
Quanquan Gu
92
146
0
04 May 2020
Generative Adversarial Imitation Learning with Neural Networks: Global
  Optimality and Convergence Rate
Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate
Yufeng Zhang
Qi Cai
Zhuoran Yang
Zhaoran Wang
116
12
0
08 Mar 2020
Exploration-Exploitation in Constrained MDPs
Exploration-Exploitation in Constrained MDPs
Yonathan Efroni
Shie Mannor
Matteo Pirotta
33
171
0
04 Mar 2020
Provably Efficient Safe Exploration via Primal-Dual Policy Optimization
Provably Efficient Safe Exploration via Primal-Dual Policy Optimization
Dongsheng Ding
Xiaohan Wei
Zhuoran Yang
Zhaoran Wang
M. Jovanović
25
159
0
01 Mar 2020
Learning Zero-Sum Simultaneous-Move Markov Games Using Function
  Approximation and Correlated Equilibrium
Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium
Qiaomin Xie
Yudong Chen
Zhaoran Wang
Zhuoran Yang
41
124
0
17 Feb 2020
Adaptive Approximate Policy Iteration
Adaptive Approximate Policy Iteration
Botao Hao
N. Lazić
Yasin Abbasi-Yadkori
Pooria Joulani
Csaba Szepesvári
18
14
0
08 Feb 2020
Reward-Free Exploration for Reinforcement Learning
Reward-Free Exploration for Reinforcement Learning
Chi Jin
A. Krishnamurthy
Max Simchowitz
Tiancheng Yu
OffRL
112
194
0
07 Feb 2020
Optimism in Reinforcement Learning with Generalized Linear Function
  Approximation
Optimism in Reinforcement Learning with Generalized Linear Function Approximation
Yining Wang
Ruosong Wang
S. Du
A. Krishnamurthy
137
135
0
09 Dec 2019
Previous
12